Low-Overhead Heap Profiling

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Low-Overhead Heap Profiling

JC Beyler
Hello all,

This is a follow-up from Jeremy's initial email from last year:

I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:

I think original conversation that happened last year in that thread still holds true:

 - We have a patch at Google that we think others might be interested in
    - It provides a means to understand where the allocation hotspots are at a very low overhead
    - Since it is at a low overhead, we can leave it on by default

So I come to the mailing list with Jeremy's initial question: 
"I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

A year ago, it seemed some thought it was a good idea, is this still true?

Thanks,
Jc


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Hi all,

To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374; what it does show is parts of the implementation that is proposed and hopefully can start the conversation going as I work through the details.

For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch

Hopefully this all makes sense and thank you for all your future comments!
Jc


On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email]> wrote:
Hello all,

This is a follow-up from Jeremy's initial email from last year:

I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:

I think original conversation that happened last year in that thread still holds true:

 - We have a patch at Google that we think others might be interested in
    - It provides a means to understand where the allocation hotspots are at a very low overhead
    - Since it is at a low overhead, we can leave it on by default

So I come to the mailing list with Jeremy's initial question: 
"I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

A year ago, it seemed some thought it was a good idea, is this still true?

Thanks,
Jc



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Hi all,

I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just summarizing here for now.

- Overhead of the feature

Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.

- Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
- I want to rerun next week to ensure number stability

- Accuracy of the feature

I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace S2. The microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90% S2, with a sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

Let me know if there are any questions on the numbers and if you'd like to see some more data.

Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html does not yet contain the whole implementation and therefore would have been misleading.

Thanks,
Jc


On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email]> wrote:
Hi all,

To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374; what it does show is parts of the implementation that is proposed and hopefully can start the conversation going as I work through the details.

For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch

Hopefully this all makes sense and thank you for all your future comments!
Jc


On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email]> wrote:
Hello all,

This is a follow-up from Jeremy's initial email from last year:

I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:

I think original conversation that happened last year in that thread still holds true:

 - We have a patch at Google that we think others might be interested in
    - It provides a means to understand where the allocation hotspots are at a very low overhead
    - Since it is at a low overhead, we can leave it on by default

So I come to the mailing list with Jeremy's initial question: 
"I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

A year ago, it seemed some thought it was a good idea, is this still true?

Thanks,
Jc




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Hi all,

I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.

The new webrev.01 also adds the actual heap monitoring sampling system in files:
and

My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage. 

Thanks for your attention and let me know if there are any questions!

Have a wonderful Friday!
Jc

On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email]> wrote:
Hi all,

I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just summarizing here for now.

- Overhead of the feature

Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.

- Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
- I want to rerun next week to ensure number stability

- Accuracy of the feature

I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace S2. The microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90% S2, with a sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

Let me know if there are any questions on the numbers and if you'd like to see some more data.

Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html does not yet contain the whole implementation and therefore would have been misleading.

Thanks,
Jc


On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email]> wrote:
Hi all,

To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374; what it does show is parts of the implementation that is proposed and hopefully can start the conversation going as I work through the details.

For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch

Hopefully this all makes sense and thank you for all your future comments!
Jc


On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email]> wrote:
Hello all,

This is a follow-up from Jeremy's initial email from last year:

I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:

I think original conversation that happened last year in that thread still holds true:

 - We have a patch at Google that we think others might be interested in
    - It provides a means to understand where the allocation hotspots are at a very low overhead
    - Since it is at a low overhead, we can leave it on by default

So I come to the mailing list with Jeremy's initial question: 
"I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

A year ago, it seemed some thought it was a good idea, is this still true?

Thanks,
Jc





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
So first off: thank you so much for your diligent look at the patch. A lot of the comments are due to me either using the wrong baseline (I was using JDK9) it seems or either that I first worked on keeping things simple, so maybe there are things that are still in flight and why things might seem unclear. I will try to rectify most of those in the next webrev.

I will say again that I really do appreciate the time you took to give all those comments. I was expecting a high level review at first and not an in-depth one :)

I'll work on the assumption that my code base is wrong and figure out how to get the right one tomorrow. I've inlined my comments which mostly are questions to help me fix the code.

On Thu, May 4, 2017 at 2:13 AM, Robbin Ehn <[hidden email]> wrote:
Hi,

To me the compiler changes looks what is expected.
It would be good if someone from compiler could take a look at that.
Added compiler to mail thread.

Also adding Serguei, It would be good with his view also.

My initial take on it, read through most of the code and took it for a ride.

##############################
- Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
instead of a separate check for sampling in fast path for the final version.


Agreed, I am still working on assessing it and the final version most likely will be TLAB-based if I can prove to myself and to you the overhead is acceptable. My initial data and the patch is simple enough that it seems like it should work. I might send out a webrev with it just to show it and people can play/comment on it.
 
- For simplicity, I've removed your actual patches but most are either because I believe my JDK that I was basing off of was not the right one compared to the one you are using. If you have a second and can provide me a link or process on how you get the right mercurial repositories set up, I'll set myself the same way you are doing it to align myself.

- I'll also double check slowdebug to ensure no problems.

- Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
- The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
- In jvmtiHeapTransition.hpp you use C cast instead.


Noted, I thought I had cleaned up most of them but seemingly not. I'll go and clean it up and align myself with what I see the surrounding code is doing between code from runtime and code for the jvmti.

 
##############################
- This patch I had apply to get traces without setting an ‘unrelated’ capability
- Should this not be a new capability?

diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
--- a/src/share/vm/prims/forte.cpp      Fri Apr 28 15:15:16 2017 +0200
+++ b/src/share/vm/prims/forte.cpp      Thu May 04 10:24:25 2017 +0200
@@ -530,6 +530,6 @@

-  if (!JvmtiExport::should_post_class_load()) {
+/*  if (!JvmtiExport::should_post_class_load()) {
     trace->num_frames = ticks_no_class_load; // -1
     return;
-  }
+  }*/


So the way I've been testing all of this for now was by creating a Java agent that sets everything up. I was waiting to have this conversation for when we were a bit more into the conversations. Since you have brought it up:
  caps.can_get_line_numbers = 1;
  caps.can_get_source_file_name = 1;
And I've added some callbacks too to help set things up: 
  ernum = global_jvmti->SetEventCallbacks(&callbacks, sizeof(jvmtiEventCallbacks));
  ernum = global_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_VM_INIT, NULL);
  ernum = global_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_CLASS_LOAD, NULL);
The class loading callback me to force the creation of the jmethod ids for the class, which most likely solves your problem here:
Quick question: Does it make sense to put my agent code somewhere in the webrev for all to see how I've been calling into this?
I was waiting a bit for things to settle a bit first but I could do that easily. Is there a spot where that would make sense?

##############################
- forte.cpp: (I know this is not part of your changes but)
find_jmethod_id_or_null give me NULL for my test.
It looks like we actually want the regular jmethod_id() ?

Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
But since we might need jmethod_id() if possible to avoid getting method id NULL,
we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

Something like:

  if (jthread->has_last_Java_frame()) { // just to be safe
    vframeStream vfst(jthread);
    while (!vfst.at_end()) {
      Method* m = vfst.method();
      m->jmethod_id();
      m->line_number_from_bci(vfst.bci());
      vfst.next();
    }

I have no idea about this and will have to try :). I am open to any solution that will work and this seems to go around forte.cpp. We have a few changes in forte.cpp that I have not put here that help get more frames than the current implementation allows. Let me test this and see if it works :)
 

- This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
Line number should be m->line_number_from_bci(bci);
Do the heapMonitoring suppose to trace with bci or line number?
I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

Agreed, I just did not want to create another structure straight away because I thought it was going to be confusing. + it seems that it is not necessary automatically. For the bci instead of line number, it is possible that, because this is code that has been ported over and over, we never cleaned it up or knew that a method was now available. Let me test it out and see how it works.
 

##############################
- // TODO(jcbeyler): remove this extra code handling the extra trace for
Please fix all these TODO's :)


Will do, they were my trackers of TODOs, I prefetched a bit when I sent out the webrev. I will try to clean out most of them for the next webrev :)
 
##############################
- heapMonitoring.hpp:
// TODO(jcbeyler): is this algorithm acceptable in open source?

Why is this comment here? What is the implication?
Have you tested any simpler algorithm?

Yes sorry, this is me being paranoid and trying to figure out if the licenses are compatible. This implementation exists in the open source world but I wanted to double check via this TODO that the original code and the JDK are compatible in licenses before bringing this in.

From what I've seen here, this is a known fast log implementation/algorithm and the license seems fine so I believe it is fine but still need to do the due diligence. I will double check and ensure it is fine :)
 

##############################
- Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

Sounds good, I will look at how to do that :)
 

##############################
- monitoring_period vs HeapMonitorRate, pick rate or period.

Agreed.
 

##############################
- globals.hpp
Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

Just to be clear and complete: MaxHeapTraces will actually disappear in the future webrev:
  1) This was here only to show how the whole system will look with especially the general jvmti interface, compiler changes, and hooks
  2) In practice, we keep all live sampled objects and we free up the list via the GC events of sampled objects
  3) In our implementation and the final one that would be shown here:
      - MaxHeapTraces disappears but we would have the analoguous MaxGarbageHeapTraces
      - That would could use a flag to be settable/overridable
 

##############################
- jvmtiStackTraceData + ASGCT_CallFrame memory
Are the agent suppose to loop through and free all ASGCT_CallFrame?
Wouldn't it be better with some kinda protocol, like:
(*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
(*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

I will change it like you suggest. Our implementation did do the free in the agent but in the general case, your solution is more logical.
 

Also using another data structure that have num_traces inside it simplifies things.
So I'm not convinced using the async structure is the best way forward.

I'm not yet sure about leaving the async entirely, I will double check on that before commiting to it. I will let you know.
 


I have more questions, but I think it's better if you respond and update the code first.

I will work on the code changes and adapt to all your suggestions here. Hopefully, I've helped answer partially most of them and if you have more, please feel free to ask them now :) Or, as you said, wait until I send my next version out if you think that we should first iterate on this before opening other questions and conversations!

Thank you again for taking the time to do such a thorough review and I hope I have answered the initial questions well!
Jc
 

Thanks!

/Robbin


On 04/21/2017 11:34 PM, JC Beyler wrote:
Hi all,

I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/

The new webrev.01 also adds the actual heap monitoring sampling system in files:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
and
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch

My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

Thanks for your attention and let me know if there are any questions!

Have a wonderful Friday!
Jc

On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

    Hi all,

    I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just summarizing
    here for now.

    - Overhead of the feature

    Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.

    - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
    - I want to rerun next week to ensure number stability

    - Accuracy of the feature

    I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace S2. The
    microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90% S2, with a
    sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

    Let me know if there are any questions on the numbers and if you'd like to see some more data.

    Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
    <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does not yet contain the whole implementation and therefore would have been misleading.

    Thanks,
    Jc


    On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

        Hi all,

        To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

        This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show is parts of the implementation that is proposed and hopefully can start the conversation going
        as I work through the details.

        For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>

        Hopefully this all makes sense and thank you for all your future comments!
        Jc


        On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

            Hello all,

            This is a follow-up from Jeremy's initial email from last year:
            http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>

            I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
            https://bugs.openjdk.java.net/browse/JDK-8171119

            I think original conversation that happened last year in that thread still holds true:

              - We have a patch at Google that we think others might be interested in
                 - It provides a means to understand where the allocation hotspots are at a very low overhead
                 - Since it is at a low overhead, we can leave it on by default

            So I come to the mailing list with Jeremy's initial question:
            "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

            A year ago, it seemed some thought it was a good idea, is this still true?

            Thanks,
            Jc






Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

serguei.spitsyn@oracle.com
In reply to this post by JC Beyler
Robbin,

Thank you for forwarding!
I will review it.

Thanks,
Serguei


On 5/4/17 02:13, Robbin Ehn wrote:

> Hi,
>
> To me the compiler changes looks what is expected.
> It would be good if someone from compiler could take a look at that.
> Added compiler to mail thread.
>
> Also adding Serguei, It would be good with his view also.
>
> My initial take on it, read through most of the code and took it for a
> ride.
>
> ##############################
> - Regarding the compiler changes: I think we need the 'TLAB end'
> trickery (mentioned by Tony P)
> instead of a separate check for sampling in fast path for the final
> version.
>
> ##############################
> - This patch I had to apply to get it compile on JDK 10:
>
> diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
> --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28
> 14:31:38 2017 +0200
> +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04
> 10:22:56 2017 +0200
> @@ -87,3 +87,3 @@
>      // support for object alloc event (no-op most of the time)
> -    if (klass() != NULL && klass()->name() != NULL) {
> +    if (klass != NULL && klass->name() != NULL) {
>        Thread *base_thread = Thread::current();
> diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
> --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56
> 2017 +0200
> @@ -316,3 +316,3 @@
>    JavaThread *thread = reinterpret_cast<JavaThread
> *>(Thread::current());
> -  assert(o->size() << LogHeapWordSize == byte_size,
> +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>           "Object size is incorrect.");
>
> ##############################
> - This patch I had to apply to get it not asserting during slowdebug:
>
> --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16
> 2017 +0200
> +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25
> 2017 +0200
> @@ -32,3 +32,3 @@
>  // TODO(jcbeyler): should we make this into a JVMTI structure?
> -struct StackTraceData {
> +struct StackTraceData : CHeapObj<mtInternal> {
>    ASGCT_CallTrace *trace;
> @@ -143,3 +143,2 @@
>  StackTraceStorage::StackTraceStorage() :
> -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>      _allocated_traces_size(MaxHeapTraces),
> @@ -147,2 +146,3 @@
>      _allocated_count(0) {
> +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*,
> MaxHeapTraces, mtInternal);
>    memset(_allocated_traces, 0, sizeof(*_allocated_traces) *
> MaxHeapTraces);
> @@ -152,3 +152,3 @@
>  StackTraceStorage::~StackTraceStorage() {
> -  delete[] _allocated_traces;
> +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>  }
>
> - Classes should extend correct base class for which type of memory is
> used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
> - The style in heapMonitoring.cpp is a bit different from normal
> vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
> os::malloc and new.
> - In jvmtiHeapTransition.hpp you use C cast instead.
>
> ##############################
> - This patch I had apply to get traces without setting an ‘unrelated’
> capability
> - Should this not be a new capability?
>
> diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
> --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
> +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
> @@ -530,6 +530,6 @@
>
> -  if (!JvmtiExport::should_post_class_load()) {
> +/*  if (!JvmtiExport::should_post_class_load()) {
>      trace->num_frames = ticks_no_class_load; // -1
>      return;
> -  }
> +  }*/
>
> ##############################
> - forte.cpp: (I know this is not part of your changes but)
> find_jmethod_id_or_null give me NULL for my test.
> It looks like we actually want the regular jmethod_id() ?
>
> Since we are the thread we are talking about (and in same ucontext)
> and thread is in vm and have a last java frame,
> I think most of the checks done in AsyncGetCallTrace is irrelevant, so
> you should be-able to call forte_fill_call_trace_given_top directly.
> But since we might need jmethod_id() if possible to avoid getting
> method id NULL,
> we need some fixes in forte code, or just do the vframStream loop
> inside heapMonitoring.cpp and not use forte.cpp.
>
> Something like:
>
>   if (jthread->has_last_Java_frame()) { // just to be safe
>     vframeStream vfst(jthread);
>     while (!vfst.at_end()) {
>       Method* m = vfst.method();
>       m->jmethod_id();
>       m->line_number_from_bci(vfst.bci());
>       vfst.next();
>     }
>
> - This is a bit confusing in forte.cpp, trace->frames[count].lineno =
> bci.
> Line number should be m->line_number_from_bci(bci);
> Do the heapMonitoring suppose to trace with bci or line number?
> I would say bci, meaning we should either rename
> ASGCT_CallFrame→lineno or use another data structure which says bci.
>
> ##############################
> - // TODO(jcbeyler): remove this extra code handling the extra trace for
> Please fix all these TODO's :)
>
> ##############################
> - heapMonitoring.hpp:
> // TODO(jcbeyler): is this algorithm acceptable in open source?
>
> Why is this comment here? What is the implication?
> Have you tested any simpler algorithm?
>
> ##############################
> - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for
> building the agent)
>
> ##############################
> - monitoring_period vs HeapMonitorRate, pick rate or period.
>
> ##############################
> - globals.hpp
> Why is MaxHeapTraces not settable/overridable from jvmti interface?
> That would be handy.
>
> ##############################
> - jvmtiStackTraceData + ASGCT_CallFrame memory
> Are the agent suppose to loop through and free all ASGCT_CallFrame?
> Wouldn't it be better with some kinda protocol, like:
> (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
> (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
> Also using another data structure that have num_traces inside it
> simplifies things.
> So I'm not convinced using the async structure is the best way forward.
>
>
> I have more questions, but I think it's better if you respond and
> update the code first.
>
> Thanks!
>
> /Robbin
>
>
> On 04/21/2017 11:34 PM, JC Beyler wrote:
>> Hi all,
>>
>> I've added size information to the allocation sampling system. This
>> allows the callback to remember the size of each sampled allocation.
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/
>>
>> The new webrev.01 also adds the actual heap monitoring sampling
>> system in files:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch 
>>
>> and
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch 
>>
>>
>> My next step is to add the GC part to the webrev, which will allow
>> users to determine what objects are live and what are garbage.
>>
>> Thanks for your attention and let me know if there are any questions!
>>
>> Have a wonderful Friday!
>> Jc
>>
>> On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi all,
>>
>>     I worked on getting a few numbers for overhead and accuracy for
>> my feature. I'm unsure if here is the right place to provide the full
>> data, so I am just summarizing
>>     here for now.
>>
>>     - Overhead of the feature
>>
>>     Using the Dacapo benchmark (http://dacapobench.org/). My initial
>> results are that sampling provides 2.4% with a 512k sampling, 512k
>> being our default setting.
>>
>>     - Note: this was without the tradesoap, tradebeans and tomcat
>> benchmarks since they did not work with my JDK9 (issue between Dacapo
>> and JDK9 it seems)
>>     - I want to rerun next week to ensure number stability
>>
>>     - Accuracy of the feature
>>
>>     I wrote a small microbenchmark that allocates from two different
>> stacktraces at a given ratio. For example, 10% of stacktrace S1 and
>> 90% from stacktrace S2. The
>>     microbenchmark was run 20 times, I averaged the results and
>> looked for accuracy. It seems that statistically it is sound since if
>> I allocated10% S1 and 90% S2, with a
>>     sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>>     Let me know if there are any questions on the numbers and if
>> you'd like to see some more data.
>>
>>     Note: this was done using our internal JDK8 implementation since
>> the webrev provided by
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does
>> not yet contain the whole implementation and therefore would have
>> been misleading.
>>
>>     Thanks,
>>     Jc
>>
>>
>>     On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>         Hi all,
>>
>>         To move the discussion forward, with Chuck Rasbold's help to
>> make a webrev, we pushed this:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>         415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>>
>>         This is not a final change that does the whole proposition
>> from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>> <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show
>> is parts of the implementation that is proposed and hopefully can
>> start the conversation going
>>         as I work through the details.
>>
>>         For example, the changes to C2 are done here for the
>> allocations:
>> http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>> <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>
>>         Hopefully this all makes sense and thank you for all your
>> future comments!
>>         Jc
>>
>>
>>         On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>             Hello all,
>>
>>             This is a follow-up from Jeremy's initial email from last
>> year:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>
>>
>>             I've gone ahead and started working on preparing this and
>> Jeremy and I went down the route of actually writing it up in JEP form:
>>             https://bugs.openjdk.java.net/browse/JDK-8171119
>>
>>             I think original conversation that happened last year in
>> that thread still holds true:
>>
>>               - We have a patch at Google that we think others might
>> be interested in
>>                  - It provides a means to understand where the
>> allocation hotspots are at a very low overhead
>>                  - Since it is at a low overhead, we can leave it on
>> by default
>>
>>             So I come to the mailing list with Jeremy's initial
>> question:
>>             "I thought I would ask if there is any interest / if I
>> should write a JEP / if I should just forget it."
>>
>>             A year ago, it seemed some thought it was a good idea, is
>> this still true?
>>
>>             Thanks,
>>             Jc
>>
>>
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Dear all,

I've updated the webrev to:

Robbin,
I believe I have addressed most of your items with webrev 02:
  - I added a JTreg test to show how it works:
 - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move forward, that is still on my TODOs
 - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
 - I cleaned up other issues as well.

However, I have three questions, which are probably because I'm new in this community:
 1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 jdk10
     - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for example in collectedHeap.inline.hpp)
     - Would you know what is the right hg clone command so we are working on the same code base?

 2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be that I don't understand when one uses one or the other but I see both used around the code base?
   - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?

 3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts or C++ casts as I see both in the codebase. What is the convention I should use?

Final notes on this webrev:
  - I am still missing:
    - Putting a TLAB implementation so that we can compare both webrevs
    - Have not tried to circumvent AsyncGetCallTrace
    - Putting in the handling of GC'd objects
    - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev

I will work on integrating those items for the next webrev!

Thanks for your help,
Jc

Ps:  I tested this on a new repo:

... building it
cd test
jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/



On Thu, May 4, 2017 at 11:21 PM, [hidden email] <[hidden email]> wrote:
Robbin,

Thank you for forwarding!
I will review it.

Thanks,
Serguei



On 5/4/17 02:13, Robbin Ehn wrote:
Hi,

To me the compiler changes looks what is expected.
It would be good if someone from compiler could take a look at that.
Added compiler to mail thread.

Also adding Serguei, It would be good with his view also.

My initial take on it, read through most of the code and took it for a ride.

##############################
- Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
instead of a separate check for sampling in fast path for the final version.

##############################
- This patch I had to apply to get it compile on JDK 10:

diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
--- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
+++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
@@ -87,3 +87,3 @@
     // support for object alloc event (no-op most of the time)
-    if (klass() != NULL && klass()->name() != NULL) {
+    if (klass != NULL && klass->name() != NULL) {
       Thread *base_thread = Thread::current();
diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
--- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
+++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
@@ -316,3 +316,3 @@
   JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
-  assert(o->size() << LogHeapWordSize == byte_size,
+  assert(o->size() << LogHeapWordSize == (long)byte_size,
          "Object size is incorrect.");

##############################
- This patch I had to apply to get it not asserting during slowdebug:

--- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
+++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
@@ -32,3 +32,3 @@
 // TODO(jcbeyler): should we make this into a JVMTI structure?
-struct StackTraceData {
+struct StackTraceData : CHeapObj<mtInternal> {
   ASGCT_CallTrace *trace;
@@ -143,3 +143,2 @@
 StackTraceStorage::StackTraceStorage() :
-    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
     _allocated_traces_size(MaxHeapTraces),
@@ -147,2 +146,3 @@
     _allocated_count(0) {
+  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
   memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
@@ -152,3 +152,3 @@
 StackTraceStorage::~StackTraceStorage() {
-  delete[] _allocated_traces;
+  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
 }

- Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
- The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
- In jvmtiHeapTransition.hpp you use C cast instead.

##############################
- This patch I had apply to get traces without setting an ‘unrelated’ capability
- Should this not be a new capability?

diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
--- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
+++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
@@ -530,6 +530,6 @@

-  if (!JvmtiExport::should_post_class_load()) {
+/*  if (!JvmtiExport::should_post_class_load()) {
     trace->num_frames = ticks_no_class_load; // -1
     return;
-  }
+  }*/

##############################
- forte.cpp: (I know this is not part of your changes but)
find_jmethod_id_or_null give me NULL for my test.
It looks like we actually want the regular jmethod_id() ?

Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
But since we might need jmethod_id() if possible to avoid getting method id NULL,
we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

Something like:

  if (jthread->has_last_Java_frame()) { // just to be safe
    vframeStream vfst(jthread);
    while (!vfst.at_end()) {
      Method* m = vfst.method();
      m->jmethod_id();
      m->line_number_from_bci(vfst.bci());
      vfst.next();
    }

- This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
Line number should be m->line_number_from_bci(bci);
Do the heapMonitoring suppose to trace with bci or line number?
I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

##############################
- // TODO(jcbeyler): remove this extra code handling the extra trace for
Please fix all these TODO's :)

##############################
- heapMonitoring.hpp:
// TODO(jcbeyler): is this algorithm acceptable in open source?

Why is this comment here? What is the implication?
Have you tested any simpler algorithm?

##############################
- Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

##############################
- monitoring_period vs HeapMonitorRate, pick rate or period.

##############################
- globals.hpp
Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

##############################
- jvmtiStackTraceData + ASGCT_CallFrame memory
Are the agent suppose to loop through and free all ASGCT_CallFrame?
Wouldn't it be better with some kinda protocol, like:
(*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
(*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

Also using another data structure that have num_traces inside it simplifies things.
So I'm not convinced using the async structure is the best way forward.


I have more questions, but I think it's better if you respond and update the code first.

Thanks!

/Robbin


On 04/21/2017 11:34 PM, JC Beyler wrote:
Hi all,

I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/

The new webrev.01 also adds the actual heap monitoring sampling system in files:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
and
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch

My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

Thanks for your attention and let me know if there are any questions!

Have a wonderful Friday!
Jc

On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

    Hi all,

    I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just summarizing
    here for now.

    - Overhead of the feature

    Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.

    - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
    - I want to rerun next week to ensure number stability

    - Accuracy of the feature

    I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace S2. The
    microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90% S2, with a
    sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

    Let me know if there are any questions on the numbers and if you'd like to see some more data.

    Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does not yet contain the whole implementation and therefore would have been misleading.

    Thanks,
    Jc


    On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

        Hi all,

        To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

        This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
<https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show is parts of the implementation that is proposed and hopefully can start the conversation going
        as I work through the details.

        For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>

        Hopefully this all makes sense and thank you for all your future comments!
        Jc


        On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

            Hello all,

            This is a follow-up from Jeremy's initial email from last year:
http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>

            I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
            https://bugs.openjdk.java.net/browse/JDK-8171119

            I think original conversation that happened last year in that thread still holds true:

              - We have a patch at Google that we think others might be interested in
                 - It provides a means to understand where the allocation hotspots are at a very low overhead
                 - Since it is at a low overhead, we can leave it on by default

            So I come to the mailing list with Jeremy's initial question:
            "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

            A year ago, it seemed some thought it was a good idea, is this still true?

            Thanks,
            Jc







Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Robbin Ehn
Just a few answers,

On 05/15/2017 06:48 PM, JC Beyler wrote:
> Dear all,
>
> I've updated the webrev to:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>

I'll look at this later, thanks!

>
> Robbin,
> I believe I have addressed most of your items with webrev 02:
>    - I added a JTreg test to show how it works:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c 
> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>   - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move forward, that
> is still on my TODOs
>   - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
>   - I cleaned up other issues as well.
>
> However, I have three questions, which are probably because I'm new in this community:
>   1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>
> jdk10
>       - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for example in
> collectedHeap.inline.hpp)
>       - Would you know what is the right hg clone command so we are working on the same code base?

We use jdk10-hs, e.g.
hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs

There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...

>
>   2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be that I
> don't understand when one uses one or the other but I see both used around the code base?
>     - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?

We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
But for arrays you will need the macro NEW_C_HEAP_ARRAY.

>
>   3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts or C++
> casts as I see both in the codebase. What is the convention I should use?

Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.

>
> Final notes on this webrev:
>    - I am still missing:
>      - Putting a TLAB implementation so that we can compare both webrevs
>      - Have not tried to circumvent AsyncGetCallTrace
>      - Putting in the handling of GC'd objects
>      - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
>
> I will work on integrating those items for the next webrev!

Thanks!

>
> Thanks for your help,
> Jc
>
> Ps:  I tested this on a new repo:
>
> hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> jdk10
> ... building it
> cd test
> jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
> <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
>

I'll test it out!

/Robbin

>
>
> On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Robbin,
>
>     Thank you for forwarding!
>     I will review it.
>
>     Thanks,
>     Serguei
>
>
>
>     On 5/4/17 02:13, Robbin Ehn wrote:
>
>         Hi,
>
>         To me the compiler changes looks what is expected.
>         It would be good if someone from compiler could take a look at that.
>         Added compiler to mail thread.
>
>         Also adding Serguei, It would be good with his view also.
>
>         My initial take on it, read through most of the code and took it for a ride.
>
>         ##############################
>         - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
>         instead of a separate check for sampling in fast path for the final version.
>
>         ##############################
>         - This patch I had to apply to get it compile on JDK 10:
>
>         diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
>         --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
>         +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
>         @@ -87,3 +87,3 @@
>               // support for object alloc event (no-op most of the time)
>         -    if (klass() != NULL && klass()->name() != NULL) {
>         +    if (klass != NULL && klass->name() != NULL) {
>                 Thread *base_thread = Thread::current();
>         diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
>         @@ -316,3 +316,3 @@
>             JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
>         -  assert(o->size() << LogHeapWordSize == byte_size,
>         +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>                    "Object size is incorrect.");
>
>         ##############################
>         - This patch I had to apply to get it not asserting during slowdebug:
>
>         --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
>         +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
>         @@ -32,3 +32,3 @@
>           // TODO(jcbeyler): should we make this into a JVMTI structure?
>         -struct StackTraceData {
>         +struct StackTraceData : CHeapObj<mtInternal> {
>             ASGCT_CallTrace *trace;
>         @@ -143,3 +143,2 @@
>           StackTraceStorage::StackTraceStorage() :
>         -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>               _allocated_traces_size(MaxHeapTraces),
>         @@ -147,2 +146,3 @@
>               _allocated_count(0) {
>         +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
>             memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
>         @@ -152,3 +152,3 @@
>           StackTraceStorage::~StackTraceStorage() {
>         -  delete[] _allocated_traces;
>         +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>           }
>
>         - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>         - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
>         - In jvmtiHeapTransition.hpp you use C cast instead.
>
>         ##############################
>         - This patch I had apply to get traces without setting an ‘unrelated’ capability
>         - Should this not be a new capability?
>
>         diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>         --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
>         +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
>         @@ -530,6 +530,6 @@
>
>         -  if (!JvmtiExport::should_post_class_load()) {
>         +/*  if (!JvmtiExport::should_post_class_load()) {
>               trace->num_frames = ticks_no_class_load; // -1
>               return;
>         -  }
>         +  }*/
>
>         ##############################
>         - forte.cpp: (I know this is not part of your changes but)
>         find_jmethod_id_or_null give me NULL for my test.
>         It looks like we actually want the regular jmethod_id() ?
>
>         Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
>         I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
>         But since we might need jmethod_id() if possible to avoid getting method id NULL,
>         we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
>
>         Something like:
>
>            if (jthread->has_last_Java_frame()) { // just to be safe
>              vframeStream vfst(jthread);
>              while (!vfst.at_end()) {
>                Method* m = vfst.method();
>                m->jmethod_id();
>                m->line_number_from_bci(vfst.bci());
>                vfst.next();
>              }
>
>         - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
>         Line number should be m->line_number_from_bci(bci);
>         Do the heapMonitoring suppose to trace with bci or line number?
>         I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
>
>         ##############################
>         - // TODO(jcbeyler): remove this extra code handling the extra trace for
>         Please fix all these TODO's :)
>
>         ##############################
>         - heapMonitoring.hpp:
>         // TODO(jcbeyler): is this algorithm acceptable in open source?
>
>         Why is this comment here? What is the implication?
>         Have you tested any simpler algorithm?
>
>         ##############################
>         - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
>
>         ##############################
>         - monitoring_period vs HeapMonitorRate, pick rate or period.
>
>         ##############################
>         - globals.hpp
>         Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
>
>         ##############################
>         - jvmtiStackTraceData + ASGCT_CallFrame memory
>         Are the agent suppose to loop through and free all ASGCT_CallFrame?
>         Wouldn't it be better with some kinda protocol, like:
>         (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>         (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
>         Also using another data structure that have num_traces inside it simplifies things.
>         So I'm not convinced using the async structure is the best way forward.
>
>
>         I have more questions, but I think it's better if you respond and update the code first.
>
>         Thanks!
>
>         /Robbin
>
>
>         On 04/21/2017 11:34 PM, JC Beyler wrote:
>
>             Hi all,
>
>             I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>
>             The new webrev.01 also adds the actual heap monitoring sampling system in files:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>             and
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>
>             My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
>
>             Thanks for your attention and let me know if there are any questions!
>
>             Have a wonderful Friday!
>             Jc
>
>             On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>                  Hi all,
>
>                  I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just
>             summarizing
>                  here for now.
>
>                  - Overhead of the feature
>
>                  Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.
>
>                  - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
>                  - I want to rerun next week to ensure number stability
>
>                  - Accuracy of the feature
>
>                  I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace
>             S2. The
>                  microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90%
>             S2, with a
>                  sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>
>                  Let me know if there are any questions on the numbers and if you'd like to see some more data.
>
>                  Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>> does not yet contain the whole
>             implementation and therefore would have been misleading.
>
>                  Thanks,
>                  Jc
>
>
>                  On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>                      Hi all,
>
>                      To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>
>                      This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is parts of the implementation that is
>             proposed and hopefully can start the conversation going
>                      as I work through the details.
>
>                      For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>
>                      Hopefully this all makes sense and thank you for all your future comments!
>                      Jc
>
>
>                      On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>             wrote:
>
>                          Hello all,
>
>                          This is a follow-up from Jeremy's initial email from last year:
>             http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>
>                          I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
>             https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>
>
>                          I think original conversation that happened last year in that thread still holds true:
>
>                            - We have a patch at Google that we think others might be interested in
>                               - It provides a means to understand where the allocation hotspots are at a very low overhead
>                               - Since it is at a low overhead, we can leave it on by default
>
>                          So I come to the mailing list with Jeremy's initial question:
>                          "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
>
>                          A year ago, it seemed some thought it was a good idea, is this still true?
>
>                          Thanks,
>                          Jc
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Dear Robbin,

Thank you for the answers, much appreciated :)
Jc

On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email]> wrote:
Just a few answers,

On 05/15/2017 06:48 PM, JC Beyler wrote:
Dear all,

I've updated the webrev to:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>

I'll look at this later, thanks!


Robbin,
I believe I have addressed most of your items with webrev 02:
   - I added a JTreg test to show how it works:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
  - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move forward, that is still on my TODOs
  - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
  - I cleaned up other issues as well.

However, I have three questions, which are probably because I'm new in this community:
  1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> jdk10
      - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for example in collectedHeap.inline.hpp)
      - Would you know what is the right hg clone command so we are working on the same code base?

We use jdk10-hs, e.g.
hg tclone http://hg.openjdk.java.net/jdk10/hs 10-hs

There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...


  2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be that I don't understand when one uses one or the other but I see both used around the code base?
    - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?

We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
But for arrays you will need the macro NEW_C_HEAP_ARRAY.


  3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts or C++ casts as I see both in the codebase. What is the convention I should use?

Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.


Final notes on this webrev:
   - I am still missing:
     - Putting a TLAB implementation so that we can compare both webrevs
     - Have not tried to circumvent AsyncGetCallTrace
     - Putting in the handling of GC'd objects
     - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev

I will work on integrating those items for the next webrev!

Thanks!


Thanks for your help,
Jc

Ps:  I tested this on a new repo:

hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> jdk10
... building it
cd test
jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/


I'll test it out!

/Robbin



On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <[hidden email] <mailto:[hidden email]>> wrote:

    Robbin,

    Thank you for forwarding!
    I will review it.

    Thanks,
    Serguei



    On 5/4/17 02:13, Robbin Ehn wrote:

        Hi,

        To me the compiler changes looks what is expected.
        It would be good if someone from compiler could take a look at that.
        Added compiler to mail thread.

        Also adding Serguei, It would be good with his view also.

        My initial take on it, read through most of the code and took it for a ride.

        ##############################
        - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
        instead of a separate check for sampling in fast path for the final version.

        ##############################
        - This patch I had to apply to get it compile on JDK 10:

        diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
        --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
        +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
        @@ -87,3 +87,3 @@
              // support for object alloc event (no-op most of the time)
        -    if (klass() != NULL && klass()->name() != NULL) {
        +    if (klass != NULL && klass->name() != NULL) {
                Thread *base_thread = Thread::current();
        diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
        --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
        +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
        @@ -316,3 +316,3 @@
            JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
        -  assert(o->size() << LogHeapWordSize == byte_size,
        +  assert(o->size() << LogHeapWordSize == (long)byte_size,
                   "Object size is incorrect.");

        ##############################
        - This patch I had to apply to get it not asserting during slowdebug:

        --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
        +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
        @@ -32,3 +32,3 @@
          // TODO(jcbeyler): should we make this into a JVMTI structure?
        -struct StackTraceData {
        +struct StackTraceData : CHeapObj<mtInternal> {
            ASGCT_CallTrace *trace;
        @@ -143,3 +143,2 @@
          StackTraceStorage::StackTraceStorage() :
        -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
              _allocated_traces_size(MaxHeapTraces),
        @@ -147,2 +146,3 @@
              _allocated_count(0) {
        +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
            memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
        @@ -152,3 +152,3 @@
          StackTraceStorage::~StackTraceStorage() {
        -  delete[] _allocated_traces;
        +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
          }

        - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
        - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
        - In jvmtiHeapTransition.hpp you use C cast instead.

        ##############################
        - This patch I had apply to get traces without setting an ‘unrelated’ capability
        - Should this not be a new capability?

        diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
        --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
        +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
        @@ -530,6 +530,6 @@

        -  if (!JvmtiExport::should_post_class_load()) {
        +/*  if (!JvmtiExport::should_post_class_load()) {
              trace->num_frames = ticks_no_class_load; // -1
              return;
        -  }
        +  }*/

        ##############################
        - forte.cpp: (I know this is not part of your changes but)
        find_jmethod_id_or_null give me NULL for my test.
        It looks like we actually want the regular jmethod_id() ?

        Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
        I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
        But since we might need jmethod_id() if possible to avoid getting method id NULL,
        we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

        Something like:

           if (jthread->has_last_Java_frame()) { // just to be safe
             vframeStream vfst(jthread);
             while (!vfst.at_end()) {
               Method* m = vfst.method();
               m->jmethod_id();
               m->line_number_from_bci(vfst.bci());
               vfst.next();
             }

        - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
        Line number should be m->line_number_from_bci(bci);
        Do the heapMonitoring suppose to trace with bci or line number?
        I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

        ##############################
        - // TODO(jcbeyler): remove this extra code handling the extra trace for
        Please fix all these TODO's :)

        ##############################
        - heapMonitoring.hpp:
        // TODO(jcbeyler): is this algorithm acceptable in open source?

        Why is this comment here? What is the implication?
        Have you tested any simpler algorithm?

        ##############################
        - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

        ##############################
        - monitoring_period vs HeapMonitorRate, pick rate or period.

        ##############################
        - globals.hpp
        Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

        ##############################
        - jvmtiStackTraceData + ASGCT_CallFrame memory
        Are the agent suppose to loop through and free all ASGCT_CallFrame?
        Wouldn't it be better with some kinda protocol, like:
        (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
        (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

        Also using another data structure that have num_traces inside it simplifies things.
        So I'm not convinced using the async structure is the best way forward.


        I have more questions, but I think it's better if you respond and update the code first.

        Thanks!

        /Robbin


        On 04/21/2017 11:34 PM, JC Beyler wrote:

            Hi all,

            I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>

            The new webrev.01 also adds the actual heap monitoring sampling system in files:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
            and
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>

            My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

            Thanks for your attention and let me know if there are any questions!

            Have a wonderful Friday!
            Jc

            On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

                 Hi all,

                 I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I am just
            summarizing
                 here for now.

                 - Overhead of the feature

                 Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our default setting.

                 - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
                 - I want to rerun next week to ensure number stability

                 - Accuracy of the feature

                 I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from stacktrace
            S2. The
                 microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10% S1 and 90%
            S2, with a
                 sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

                 Let me know if there are any questions on the numbers and if you'd like to see some more data.

                 Note: this was done using our internal JDK8 implementation since the webrev provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>> does not yet contain the whole
            implementation and therefore would have been misleading.

                 Thanks,
                 Jc


                 On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

                     Hi all,

                     To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
            http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

                     This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
            <https://bugs.openjdk.java.net/browse/JDK-8177374>
            <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>; what it does show is parts of the implementation that is
            proposed and hopefully can start the conversation going
                     as I work through the details.

                     For example, the changes to C2 are done here for the allocations: http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>

                     Hopefully this all makes sense and thank you for all your future comments!
                     Jc


                     On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
            wrote:

                         Hello all,

                         This is a follow-up from Jeremy's initial email from last year:
            http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
            <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
            <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>

                         I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
            https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>

                         I think original conversation that happened last year in that thread still holds true:

                           - We have a patch at Google that we think others might be interested in
                              - It provides a means to understand where the allocation hotspots are at a very low overhead
                              - Since it is at a low overhead, we can leave it on by default

                         So I come to the mailing list with Jeremy's initial question:
                         "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

                         A year ago, it seemed some thought it was a good idea, is this still true?

                         Thanks,
                         Jc








Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Robbin Ehn
In reply to this post by Robbin Ehn
Hi Jc,

On 05/22/2017 08:47 PM, JC Beyler wrote:
> Dear all,
>
> I have a new webrev up:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/

I liked this!

Two small things:

heapMonitoring.hpp
class HeapMonitoring should extend AllStatic

heapMonitoring.cpp
class MuxLocker should extend StackObj
But I think you should skip MuxLocker or push it separate generic enhancement.

Great with the jtreg test, thanks alot!

>
> This webrev has, I hope, fixed a lot of the comments from Robbin:
>    - The casts normally are all C++ style
>    - Moved this to jdk10-hs
>       - I have not tested slowdebug yet, hopefully it does not break there
>    - Added the garbage collection system:
>       - Now live sampled allocations are tracked throughout their lifetime
>          - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
>              - Those lists use the array system that the live objects were using before but have different re-use strategies
>       - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
> - Both use the same JVMTI structures
>       - Added the calls to them for the test, though I've kept that test simple for now:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>          - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent structure. This
> is updated in my local repo and will get in the next webrev.
>
> Next steps for this work are:
>     - Putting the TLAB implementation (yes not forgotten ;-))
>     - Adding more testing and separate the current test system to check things a bit more thoroughly
>     - Have not tried to circumvent AsyncGetCallTrace yet
>     - Still have to double check the stack walker a bit more

Looking forward to this.

Could someone from compiler take a look please?

Thanks!

/Robbin

>
> Happy webrev perusal!
> Jc
>
>
> On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Just a few answers,
>
>     On 05/15/2017 06:48 PM, JC Beyler wrote:
>
>         Dear all,
>
>         I've updated the webrev to:
>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>
>
>
>     I'll look at this later, thanks!
>
>
>         Robbin,
>         I believe I have addressed most of your items with webrev 02:
>             - I added a JTreg test to show how it works:
>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
>            - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move
>         forward, that is still on my TODOs
>            - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
>            - I cleaned up other issues as well.
>
>         However, I have three questions, which are probably because I'm new in this community:
>            1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
>         <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
>                - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for
>         example in collectedHeap.inline.hpp)
>                - Would you know what is the right hg clone command so we are working on the same code base?
>
>
>     We use jdk10-hs, e.g.
>     hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> 10-hs
>
>     There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...
>
>
>            2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be
>         that I don't understand when one uses one or the other but I see both used around the code base?
>              - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
>
>
>     We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
>     But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>
>
>            3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts
>         or C++ casts as I see both in the codebase. What is the convention I should use?
>
>
>     Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
>     And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.
>
>
>         Final notes on this webrev:
>             - I am still missing:
>               - Putting a TLAB implementation so that we can compare both webrevs
>               - Have not tried to circumvent AsyncGetCallTrace
>               - Putting in the handling of GC'd objects
>               - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
>
>         I will work on integrating those items for the next webrev!
>
>
>     Thanks!
>
>
>         Thanks for your help,
>         Jc
>
>         Ps:  I tested this on a new repo:
>
>         hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
>         <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
>         ... building it
>         cd test
>         jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
>         <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
>
>
>     I'll test it out!
>
>     /Robbin
>
>
>
>         On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>> <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>>> wrote:
>
>              Robbin,
>
>              Thank you for forwarding!
>              I will review it.
>
>              Thanks,
>              Serguei
>
>
>
>              On 5/4/17 02:13, Robbin Ehn wrote:
>
>                  Hi,
>
>                  To me the compiler changes looks what is expected.
>                  It would be good if someone from compiler could take a look at that.
>                  Added compiler to mail thread.
>
>                  Also adding Serguei, It would be good with his view also.
>
>                  My initial take on it, read through most of the code and took it for a ride.
>
>                  ##############################
>                  - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
>                  instead of a separate check for sampling in fast path for the final version.
>
>                  ##############################
>                  - This patch I had to apply to get it compile on JDK 10:
>
>                  diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
>                  --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
>                  +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
>                  @@ -87,3 +87,3 @@
>                        // support for object alloc event (no-op most of the time)
>                  -    if (klass() != NULL && klass()->name() != NULL) {
>                  +    if (klass != NULL && klass->name() != NULL) {
>                          Thread *base_thread = Thread::current();
>                  diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>                  --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
>                  +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
>                  @@ -316,3 +316,3 @@
>                      JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
>                  -  assert(o->size() << LogHeapWordSize == byte_size,
>                  +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>                             "Object size is incorrect.");
>
>                  ##############################
>                  - This patch I had to apply to get it not asserting during slowdebug:
>
>                  --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
>                  +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
>                  @@ -32,3 +32,3 @@
>                    // TODO(jcbeyler): should we make this into a JVMTI structure?
>                  -struct StackTraceData {
>                  +struct StackTraceData : CHeapObj<mtInternal> {
>                      ASGCT_CallTrace *trace;
>                  @@ -143,3 +143,2 @@
>                    StackTraceStorage::StackTraceStorage() :
>                  -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>                        _allocated_traces_size(MaxHeapTraces),
>                  @@ -147,2 +146,3 @@
>                        _allocated_count(0) {
>                  +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
>                      memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
>                  @@ -152,3 +152,3 @@
>                    StackTraceStorage::~StackTraceStorage() {
>                  -  delete[] _allocated_traces;
>                  +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>                    }
>
>                  - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>                  - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
>                  - In jvmtiHeapTransition.hpp you use C cast instead.
>
>                  ##############################
>                  - This patch I had apply to get traces without setting an ‘unrelated’ capability
>                  - Should this not be a new capability?
>
>                  diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>                  --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
>                  +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
>                  @@ -530,6 +530,6 @@
>
>                  -  if (!JvmtiExport::should_post_class_load()) {
>                  +/*  if (!JvmtiExport::should_post_class_load()) {
>                        trace->num_frames = ticks_no_class_load; // -1
>                        return;
>                  -  }
>                  +  }*/
>
>                  ##############################
>                  - forte.cpp: (I know this is not part of your changes but)
>                  find_jmethod_id_or_null give me NULL for my test.
>                  It looks like we actually want the regular jmethod_id() ?
>
>                  Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
>                  I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
>                  But since we might need jmethod_id() if possible to avoid getting method id NULL,
>                  we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
>
>                  Something like:
>
>                     if (jthread->has_last_Java_frame()) { // just to be safe
>                       vframeStream vfst(jthread);
>                       while (!vfst.at_end()) {
>                         Method* m = vfst.method();
>                         m->jmethod_id();
>                         m->line_number_from_bci(vfst.bci());
>                         vfst.next();
>                       }
>
>                  - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
>                  Line number should be m->line_number_from_bci(bci);
>                  Do the heapMonitoring suppose to trace with bci or line number?
>                  I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
>
>                  ##############################
>                  - // TODO(jcbeyler): remove this extra code handling the extra trace for
>                  Please fix all these TODO's :)
>
>                  ##############################
>                  - heapMonitoring.hpp:
>                  // TODO(jcbeyler): is this algorithm acceptable in open source?
>
>                  Why is this comment here? What is the implication?
>                  Have you tested any simpler algorithm?
>
>                  ##############################
>                  - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
>
>                  ##############################
>                  - monitoring_period vs HeapMonitorRate, pick rate or period.
>
>                  ##############################
>                  - globals.hpp
>                  Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
>
>                  ##############################
>                  - jvmtiStackTraceData + ASGCT_CallFrame memory
>                  Are the agent suppose to loop through and free all ASGCT_CallFrame?
>                  Wouldn't it be better with some kinda protocol, like:
>                  (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>                  (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
>                  Also using another data structure that have num_traces inside it simplifies things.
>                  So I'm not convinced using the async structure is the best way forward.
>
>
>                  I have more questions, but I think it's better if you respond and update the code first.
>
>                  Thanks!
>
>                  /Robbin
>
>
>                  On 04/21/2017 11:34 PM, JC Beyler wrote:
>
>                      Hi all,
>
>                      I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>
>
>                      The new webrev.01 also adds the actual heap monitoring sampling system in files:
>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
>                      and
>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>
>
>                      My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
>
>                      Thanks for your attention and let me know if there are any questions!
>
>                      Have a wonderful Friday!
>                      Jc
>
>                      On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>         <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>
>                           Hi all,
>
>                           I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I
>         am just
>                      summarizing
>                           here for now.
>
>                           - Overhead of the feature
>
>                           Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our
>         default setting.
>
>                           - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
>                           - I want to rerun next week to ensure number stability
>
>                           - Accuracy of the feature
>
>                           I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from
>         stacktrace
>                      S2. The
>                           microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10%
>         S1 and 90%
>                      S2, with a
>                           sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>
>                           Let me know if there are any questions on the numbers and if you'd like to see some more data.
>
>                           Note: this was done using our internal JDK8 implementation since the webrev provided by
>         http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>> does not yet contain the whole
>                      implementation and therefore would have been misleading.
>
>                           Thanks,
>                           Jc
>
>
>                           On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>
>                               Hi all,
>
>                               To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
>         http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
>                               415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>
>                               This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>         <https://bugs.openjdk.java.net/browse/JDK-8177374>
>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374> <https://bugs.openjdk.java.net/browse/JDK-8177374
>         <https://bugs.openjdk.java.net/browse/JDK-8177374>>>; what it does show is parts of the implementation that is
>                      proposed and hopefully can start the conversation going
>                               as I work through the details.
>
>                               For example, the changes to C2 are done here for the allocations:
>         http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>         <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>
>
>                               Hopefully this all makes sense and thank you for all your future comments!
>                               Jc
>
>
>                               On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
>                      wrote:
>
>                                   Hello all,
>
>                                   This is a follow-up from Jeremy's initial email from last year:
>         http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>         <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>         <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>         <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>
>
>                                   I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
>         https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119> <https://bugs.openjdk.java.net/browse/JDK-8171119
>         <https://bugs.openjdk.java.net/browse/JDK-8171119>>
>
>                                   I think original conversation that happened last year in that thread still holds true:
>
>                                     - We have a patch at Google that we think others might be interested in
>                                        - It provides a means to understand where the allocation hotspots are at a very low overhead
>                                        - Since it is at a low overhead, we can leave it on by default
>
>                                   So I come to the mailing list with Jeremy's initial question:
>                                   "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
>
>                                   A year ago, it seemed some thought it was a good idea, is this still true?
>
>                                   Thanks,
>                                   Jc
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Dear all,

Thanks Robbin for the comments. I have left the MuxLocker for now and will look at removing it or as a future enhancement as you say. I did make the class extends and added a TODO for myself to test this in slowdebug.

I have also put a new webrev up that is still a work in progress but should allow us to talk about TLAB vs C1/C2 modifications.


What this webrev has is:
   - I have put in a TLAB implementation, it is a WIP, and I have not yet done the qualitative/quantitative study on it vs the implementation using compilation changes but the big parts are in place
       - Caveats:
               - I have not fixed the calculation in the case of a FastTLABRefill case
           - This is always on right now, there is no way to turn it off, that's also a TODO to be directed by the JVMTI API

    - I also have circumvented the AsyncGetCallTrace using the snippet of code you showed Robbin, it works for here/now 
          - But we might have to revisit this one day because it then does not try to get some of the native stacks and jumps directly to the Java stacks (I see cases where this could be an issue)
          - However, this has cleaned up quite a bit of the code and I have removed all mention of ASGCT and its structures now and use directly the JVMTI structures

   - GC is handled now, I have not yet done the qualitative/quantitative study on it but the big parts are in place

   - Due to the way the TLAB is called, the stack walker is now correct and produces the right stacks it seems (this is a bold sentence from my ONE JTREG test :))

Final notes on this webrev:
   - Have to fix that TLAB case for the FastTLABRefill
   - Implement the turn on/off system for the TLAB implementation
   - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
   - Have to check the GC elements and what that produces
   - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

As always, your comments and feedback are greatly appreciated! Happy Friday!
Jc


On Tue, May 30, 2017 at 5:33 AM, Robbin Ehn <[hidden email]> wrote:
Hi Jc,

On 05/22/2017 08:47 PM, JC Beyler wrote:
Dear all,

I have a new webrev up:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/

I liked this!

Two small things:

heapMonitoring.hpp
class HeapMonitoring should extend AllStatic

heapMonitoring.cpp
class MuxLocker should extend StackObj
But I think you should skip MuxLocker or push it separate generic enhancement.

Great with the jtreg test, thanks alot!


This webrev has, I hope, fixed a lot of the comments from Robbin:
   - The casts normally are all C++ style
   - Moved this to jdk10-hs
      - I have not tested slowdebug yet, hopefully it does not break there
   - Added the garbage collection system:
      - Now live sampled allocations are tracked throughout their lifetime
         - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
             - Those lists use the array system that the live objects were using before but have different re-use strategies
      - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
- Both use the same JVMTI structures
      - Added the calls to them for the test, though I've kept that test simple for now:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
         - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent structure. This is updated in my local repo and will get in the next webrev.

Next steps for this work are:
    - Putting the TLAB implementation (yes not forgotten ;-))
    - Adding more testing and separate the current test system to check things a bit more thoroughly
    - Have not tried to circumvent AsyncGetCallTrace yet
    - Still have to double check the stack walker a bit more

Looking forward to this.

Could someone from compiler take a look please?

Thanks!

/Robbin


Happy webrev perusal!
Jc


On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:

    Just a few answers,

    On 05/15/2017 06:48 PM, JC Beyler wrote:

        Dear all,

        I've updated the webrev to:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>


    I'll look at this later, thanks!


        Robbin,
        I believe I have addressed most of your items with webrev 02:
            - I added a JTreg test to show how it works:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
           - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move
        forward, that is still on my TODOs
           - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
           - I cleaned up other issues as well.

        However, I have three questions, which are probably because I'm new in this community:
           1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
               - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for
        example in collectedHeap.inline.hpp)
               - Would you know what is the right hg clone command so we are working on the same code base?


    We use jdk10-hs, e.g.
    hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> 10-hs

    There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...


           2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be
        that I don't understand when one uses one or the other but I see both used around the code base?
             - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?


    We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
    But for arrays you will need the macro NEW_C_HEAP_ARRAY.


           3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts
        or C++ casts as I see both in the codebase. What is the convention I should use?


    Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
    And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.


        Final notes on this webrev:
            - I am still missing:
              - Putting a TLAB implementation so that we can compare both webrevs
              - Have not tried to circumvent AsyncGetCallTrace
              - Putting in the handling of GC'd objects
              - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev

        I will work on integrating those items for the next webrev!


    Thanks!


        Thanks for your help,
        Jc

        Ps:  I tested this on a new repo:

        hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
        ... building it
        cd test
        jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
        <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/


    I'll test it out!

    /Robbin



        On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]

        <mailto:[hidden email]>>> wrote:

             Robbin,

             Thank you for forwarding!
             I will review it.

             Thanks,
             Serguei



             On 5/4/17 02:13, Robbin Ehn wrote:

                 Hi,

                 To me the compiler changes looks what is expected.
                 It would be good if someone from compiler could take a look at that.
                 Added compiler to mail thread.

                 Also adding Serguei, It would be good with his view also.

                 My initial take on it, read through most of the code and took it for a ride.

                 ##############################
                 - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
                 instead of a separate check for sampling in fast path for the final version.

                 ##############################
                 - This patch I had to apply to get it compile on JDK 10:

                 diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
                 --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
                 +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
                 @@ -87,3 +87,3 @@
                       // support for object alloc event (no-op most of the time)
                 -    if (klass() != NULL && klass()->name() != NULL) {
                 +    if (klass != NULL && klass->name() != NULL) {
                         Thread *base_thread = Thread::current();
                 diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
                 @@ -316,3 +316,3 @@
                     JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
                 -  assert(o->size() << LogHeapWordSize == byte_size,
                 +  assert(o->size() << LogHeapWordSize == (long)byte_size,
                            "Object size is incorrect.");

                 ##############################
                 - This patch I had to apply to get it not asserting during slowdebug:

                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
                 @@ -32,3 +32,3 @@
                   // TODO(jcbeyler): should we make this into a JVMTI structure?
                 -struct StackTraceData {
                 +struct StackTraceData : CHeapObj<mtInternal> {
                     ASGCT_CallTrace *trace;
                 @@ -143,3 +143,2 @@
                   StackTraceStorage::StackTraceStorage() :
                 -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
                       _allocated_traces_size(MaxHeapTraces),
                 @@ -147,2 +146,3 @@
                       _allocated_count(0) {
                 +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
                     memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
                 @@ -152,3 +152,3 @@
                   StackTraceStorage::~StackTraceStorage() {
                 -  delete[] _allocated_traces;
                 +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
                   }

                 - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
                 - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
                 - In jvmtiHeapTransition.hpp you use C cast instead.

                 ##############################
                 - This patch I had apply to get traces without setting an ‘unrelated’ capability
                 - Should this not be a new capability?

                 diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
                 --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
                 +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
                 @@ -530,6 +530,6 @@

                 -  if (!JvmtiExport::should_post_class_load()) {
                 +/*  if (!JvmtiExport::should_post_class_load()) {
                       trace->num_frames = ticks_no_class_load; // -1
                       return;
                 -  }
                 +  }*/

                 ##############################
                 - forte.cpp: (I know this is not part of your changes but)
                 find_jmethod_id_or_null give me NULL for my test.
                 It looks like we actually want the regular jmethod_id() ?

                 Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
                 I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
                 But since we might need jmethod_id() if possible to avoid getting method id NULL,
                 we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

                 Something like:

                    if (jthread->has_last_Java_frame()) { // just to be safe
                      vframeStream vfst(jthread);
                      while (!vfst.at_end()) {
                        Method* m = vfst.method();
                        m->jmethod_id();
                        m->line_number_from_bci(vfst.bci());
                        vfst.next();
                      }

                 - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
                 Line number should be m->line_number_from_bci(bci);
                 Do the heapMonitoring suppose to trace with bci or line number?
                 I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

                 ##############################
                 - // TODO(jcbeyler): remove this extra code handling the extra trace for
                 Please fix all these TODO's :)

                 ##############################
                 - heapMonitoring.hpp:
                 // TODO(jcbeyler): is this algorithm acceptable in open source?

                 Why is this comment here? What is the implication?
                 Have you tested any simpler algorithm?

                 ##############################
                 - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

                 ##############################
                 - monitoring_period vs HeapMonitorRate, pick rate or period.

                 ##############################
                 - globals.hpp
                 Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

                 ##############################
                 - jvmtiStackTraceData + ASGCT_CallFrame memory
                 Are the agent suppose to loop through and free all ASGCT_CallFrame?
                 Wouldn't it be better with some kinda protocol, like:
                 (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
                 (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

                 Also using another data structure that have num_traces inside it simplifies things.
                 So I'm not convinced using the async structure is the best way forward.


                 I have more questions, but I think it's better if you respond and update the code first.

                 Thanks!

                 /Robbin


                 On 04/21/2017 11:34 PM, JC Beyler wrote:

                     Hi all,

                     I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>

                     The new webrev.01 also adds the actual heap monitoring sampling system in files:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
                     and
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>

                     My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

                     Thanks for your attention and let me know if there are any questions!

                     Have a wonderful Friday!
                     Jc

                     On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
        <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:

                          Hi all,

                          I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I
        am just
                     summarizing
                          here for now.

                          - Overhead of the feature

                          Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our
        default setting.

                          - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
                          - I want to rerun next week to ensure number stability

                          - Accuracy of the feature

                          I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from
        stacktrace
                     S2. The
                          microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10%
        S1 and 90%
                     S2, with a
                          sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

                          Let me know if there are any questions on the numbers and if you'd like to see some more data.

                          Note: this was done using our internal JDK8 implementation since the webrev provided by
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>> does not yet contain the whole
                     implementation and therefore would have been misleading.

                          Thanks,
                          Jc


                          On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:

                              Hi all,

                              To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
                              415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

                              This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374> <https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>>>; what it does show is parts of the implementation that is
                     proposed and hopefully can start the conversation going
                              as I work through the details.

                              For example, the changes to C2 are done here for the allocations:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>

                              Hopefully this all makes sense and thank you for all your future comments!
                              Jc


                              On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
                     wrote:

                                  Hello all,

                                  This is a follow-up from Jeremy's initial email from last year:
        http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
                     <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
                     <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>

                                  I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
        https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119> <https://bugs.openjdk.java.net/browse/JDK-8171119
        <https://bugs.openjdk.java.net/browse/JDK-8171119>>

                                  I think original conversation that happened last year in that thread still holds true:

                                    - We have a patch at Google that we think others might be interested in
                                       - It provides a means to understand where the allocation hotspots are at a very low overhead
                                       - Since it is at a low overhead, we can leave it on by default

                                  So I come to the mailing list with Jeremy's initial question:
                                  "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

                                  A year ago, it seemed some thought it was a good idea, is this still true?

                                  Thanks,
                                  Jc









Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
Dear all,

I've continued working on this and have done the following webrev:

What it does:
   - Implement the turn on/off system for the TLAB implementation
      - Basically you can turn the system on/off using the JVMTI API (StartHeapProfiling and StopHeapProfiling here http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/src/share/vm/prims/jvmti.xml.patch)
         - I've currently implemented that the system resets the profiling data at the StartHeapProfiling, this allows you to start profiling, stop profiling, and then the user can query what happened during profiling as a post-processing step :)

   - I've currently, for sake of simplicity and Robbin hinted it would be better for now, removed the mutex code for the data but think that will have to come back in a next patch or in this one at some point

   - I've also really worked on the testing code to make it more expandable in this case
      http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c.patch is now a bit more generic and allows to turn on/off the sampling; it allows to query the existence or not of frames passed from Java world to JNI, which allows the test to define what should be seen and have the code in one place.
             - Basically each test can now provide an array of Frames that the native library can check the internal data for a match. This allows to have various tests have their own signatures, etc.

          - This has allowed me to add a nice test here to test the wipe out of the data:

  - Finally, I've done initial overhead testing and, though my data was a bit noisy, I saw no real overhead using the Tlab approach while off. I will try to get better data and more importantly, more stable data when turning it on.

Things I still need to do:
   - Have to fix that TLAB case for the FastTLABRefill
   - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
   - Have to check the GC elements and what that produces
   - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

Thanks for looking at the webrev and have a great week!
Jc

On Fri, Jun 2, 2017 at 8:57 AM, JC Beyler <[hidden email]> wrote:
Dear all,

Thanks Robbin for the comments. I have left the MuxLocker for now and will look at removing it or as a future enhancement as you say. I did make the class extends and added a TODO for myself to test this in slowdebug.

I have also put a new webrev up that is still a work in progress but should allow us to talk about TLAB vs C1/C2 modifications.


What this webrev has is:
   - I have put in a TLAB implementation, it is a WIP, and I have not yet done the qualitative/quantitative study on it vs the implementation using compilation changes but the big parts are in place
       - Caveats:
               - I have not fixed the calculation in the case of a FastTLABRefill case
           - This is always on right now, there is no way to turn it off, that's also a TODO to be directed by the JVMTI API

    - I also have circumvented the AsyncGetCallTrace using the snippet of code you showed Robbin, it works for here/now 
          - But we might have to revisit this one day because it then does not try to get some of the native stacks and jumps directly to the Java stacks (I see cases where this could be an issue)
          - However, this has cleaned up quite a bit of the code and I have removed all mention of ASGCT and its structures now and use directly the JVMTI structures

   - GC is handled now, I have not yet done the qualitative/quantitative study on it but the big parts are in place

   - Due to the way the TLAB is called, the stack walker is now correct and produces the right stacks it seems (this is a bold sentence from my ONE JTREG test :))

Final notes on this webrev:
   - Have to fix that TLAB case for the FastTLABRefill
   - Implement the turn on/off system for the TLAB implementation
   - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
   - Have to check the GC elements and what that produces
   - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

As always, your comments and feedback are greatly appreciated! Happy Friday!
Jc


On Tue, May 30, 2017 at 5:33 AM, Robbin Ehn <[hidden email]> wrote:
Hi Jc,

On 05/22/2017 08:47 PM, JC Beyler wrote:
Dear all,

I have a new webrev up:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/

I liked this!

Two small things:

heapMonitoring.hpp
class HeapMonitoring should extend AllStatic

heapMonitoring.cpp
class MuxLocker should extend StackObj
But I think you should skip MuxLocker or push it separate generic enhancement.

Great with the jtreg test, thanks alot!


This webrev has, I hope, fixed a lot of the comments from Robbin:
   - The casts normally are all C++ style
   - Moved this to jdk10-hs
      - I have not tested slowdebug yet, hopefully it does not break there
   - Added the garbage collection system:
      - Now live sampled allocations are tracked throughout their lifetime
         - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
             - Those lists use the array system that the live objects were using before but have different re-use strategies
      - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
- Both use the same JVMTI structures
      - Added the calls to them for the test, though I've kept that test simple for now:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
         - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent structure. This is updated in my local repo and will get in the next webrev.

Next steps for this work are:
    - Putting the TLAB implementation (yes not forgotten ;-))
    - Adding more testing and separate the current test system to check things a bit more thoroughly
    - Have not tried to circumvent AsyncGetCallTrace yet
    - Still have to double check the stack walker a bit more

Looking forward to this.

Could someone from compiler take a look please?

Thanks!

/Robbin


Happy webrev perusal!
Jc


On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:

    Just a few answers,

    On 05/15/2017 06:48 PM, JC Beyler wrote:

        Dear all,

        I've updated the webrev to:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>


    I'll look at this later, thanks!


        Robbin,
        I believe I have addressed most of your items with webrev 02:
            - I added a JTreg test to show how it works:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
           - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as we move
        forward, that is still on my TODOs
           - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
           - I cleaned up other issues as well.

        However, I have three questions, which are probably because I'm new in this community:
           1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
               - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via klass() for
        example in collectedHeap.inline.hpp)
               - Would you know what is the right hg clone command so we are working on the same code base?


    We use jdk10-hs, e.g.
    hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> 10-hs

    There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...


           2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use. It might be
        that I don't understand when one uses one or the other but I see both used around the code base?
             - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?


    We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
    But for arrays you will need the macro NEW_C_HEAP_ARRAY.


           3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should go to C casts
        or C++ casts as I see both in the codebase. What is the convention I should use?


    Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
    And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.


        Final notes on this webrev:
            - I am still missing:
              - Putting a TLAB implementation so that we can compare both webrevs
              - Have not tried to circumvent AsyncGetCallTrace
              - Putting in the handling of GC'd objects
              - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev

        I will work on integrating those items for the next webrev!


    Thanks!


        Thanks for your help,
        Jc

        Ps:  I tested this on a new repo:

        hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
        ... building it
        cd test
        jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
        <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/


    I'll test it out!

    /Robbin



        On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]

        <mailto:[hidden email]>>> wrote:

             Robbin,

             Thank you for forwarding!
             I will review it.

             Thanks,
             Serguei



             On 5/4/17 02:13, Robbin Ehn wrote:

                 Hi,

                 To me the compiler changes looks what is expected.
                 It would be good if someone from compiler could take a look at that.
                 Added compiler to mail thread.

                 Also adding Serguei, It would be good with his view also.

                 My initial take on it, read through most of the code and took it for a ride.

                 ##############################
                 - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
                 instead of a separate check for sampling in fast path for the final version.

                 ##############################
                 - This patch I had to apply to get it compile on JDK 10:

                 diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
                 --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
                 +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
                 @@ -87,3 +87,3 @@
                       // support for object alloc event (no-op most of the time)
                 -    if (klass() != NULL && klass()->name() != NULL) {
                 +    if (klass != NULL && klass->name() != NULL) {
                         Thread *base_thread = Thread::current();
                 diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
                 @@ -316,3 +316,3 @@
                     JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
                 -  assert(o->size() << LogHeapWordSize == byte_size,
                 +  assert(o->size() << LogHeapWordSize == (long)byte_size,
                            "Object size is incorrect.");

                 ##############################
                 - This patch I had to apply to get it not asserting during slowdebug:

                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
                 @@ -32,3 +32,3 @@
                   // TODO(jcbeyler): should we make this into a JVMTI structure?
                 -struct StackTraceData {
                 +struct StackTraceData : CHeapObj<mtInternal> {
                     ASGCT_CallTrace *trace;
                 @@ -143,3 +143,2 @@
                   StackTraceStorage::StackTraceStorage() :
                 -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
                       _allocated_traces_size(MaxHeapTraces),
                 @@ -147,2 +146,3 @@
                       _allocated_count(0) {
                 +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
                     memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
                 @@ -152,3 +152,3 @@
                   StackTraceStorage::~StackTraceStorage() {
                 -  delete[] _allocated_traces;
                 +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
                   }

                 - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
                 - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
                 - In jvmtiHeapTransition.hpp you use C cast instead.

                 ##############################
                 - This patch I had apply to get traces without setting an ‘unrelated’ capability
                 - Should this not be a new capability?

                 diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
                 --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
                 +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
                 @@ -530,6 +530,6 @@

                 -  if (!JvmtiExport::should_post_class_load()) {
                 +/*  if (!JvmtiExport::should_post_class_load()) {
                       trace->num_frames = ticks_no_class_load; // -1
                       return;
                 -  }
                 +  }*/

                 ##############################
                 - forte.cpp: (I know this is not part of your changes but)
                 find_jmethod_id_or_null give me NULL for my test.
                 It looks like we actually want the regular jmethod_id() ?

                 Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
                 I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
                 But since we might need jmethod_id() if possible to avoid getting method id NULL,
                 we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

                 Something like:

                    if (jthread->has_last_Java_frame()) { // just to be safe
                      vframeStream vfst(jthread);
                      while (!vfst.at_end()) {
                        Method* m = vfst.method();
                        m->jmethod_id();
                        m->line_number_from_bci(vfst.bci());
                        vfst.next();
                      }

                 - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
                 Line number should be m->line_number_from_bci(bci);
                 Do the heapMonitoring suppose to trace with bci or line number?
                 I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

                 ##############################
                 - // TODO(jcbeyler): remove this extra code handling the extra trace for
                 Please fix all these TODO's :)

                 ##############################
                 - heapMonitoring.hpp:
                 // TODO(jcbeyler): is this algorithm acceptable in open source?

                 Why is this comment here? What is the implication?
                 Have you tested any simpler algorithm?

                 ##############################
                 - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

                 ##############################
                 - monitoring_period vs HeapMonitorRate, pick rate or period.

                 ##############################
                 - globals.hpp
                 Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

                 ##############################
                 - jvmtiStackTraceData + ASGCT_CallFrame memory
                 Are the agent suppose to loop through and free all ASGCT_CallFrame?
                 Wouldn't it be better with some kinda protocol, like:
                 (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
                 (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

                 Also using another data structure that have num_traces inside it simplifies things.
                 So I'm not convinced using the async structure is the best way forward.


                 I have more questions, but I think it's better if you respond and update the code first.

                 Thanks!

                 /Robbin


                 On 04/21/2017 11:34 PM, JC Beyler wrote:

                     Hi all,

                     I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>

                     The new webrev.01 also adds the actual heap monitoring sampling system in files:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
                     and
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>

                     My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

                     Thanks for your attention and let me know if there are any questions!

                     Have a wonderful Friday!
                     Jc

                     On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
        <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:

                          Hi all,

                          I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full data, so I
        am just
                     summarizing
                          here for now.

                          - Overhead of the feature

                          Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k being our
        default setting.

                          - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
                          - I want to rerun next week to ensure number stability

                          - Accuracy of the feature

                          I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from
        stacktrace
                     S2. The
                          microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I allocated10%
        S1 and 90%
                     S2, with a
                          sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

                          Let me know if there are any questions on the numbers and if you'd like to see some more data.

                          Note: this was done using our internal JDK8 implementation since the webrev provided by
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>> does not yet contain the whole
                     implementation and therefore would have been misleading.

                          Thanks,
                          Jc


                          On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:

                              Hi all,

                              To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
                              415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

                              This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374> <https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>>>; what it does show is parts of the implementation that is
                     proposed and hopefully can start the conversation going
                              as I work through the details.

                              For example, the changes to C2 are done here for the allocations:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>

                              Hopefully this all makes sense and thank you for all your future comments!
                              Jc


                              On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
        <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
                     wrote:

                                  Hello all,

                                  This is a follow-up from Jeremy's initial email from last year:
        http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
                     <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
                     <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>

                                  I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
        https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119> <https://bugs.openjdk.java.net/browse/JDK-8171119
        <https://bugs.openjdk.java.net/browse/JDK-8171119>>

                                  I think original conversation that happened last year in that thread still holds true:

                                    - We have a patch at Google that we think others might be interested in
                                       - It provides a means to understand where the allocation hotspots are at a very low overhead
                                       - Since it is at a low overhead, we can leave it on by default

                                  So I come to the mailing list with Jeremy's initial question:
                                  "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."

                                  A year ago, it seemed some thought it was a good idea, is this still true?

                                  Thanks,
                                  Jc










Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Robbin Ehn
Hi JC,

This version really makes my happy, awesome work!

Since we got no attention from compiler and you manage to get ride of most of the compiler changes, I dropped compiler but added runtime instead.
Serguei, when you have time, please chime in.

On 06/12/2017 08:11 PM, JC Beyler wrote:

> Dear all,
>
> I've continued working on this and have done the following webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>
> What it does:
>     - Implement the turn on/off system for the TLAB implementation
>        - Basically you can turn the system on/off using the JVMTI API (StartHeapProfiling and StopHeapProfiling here
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/src/share/vm/prims/jvmti.xml.patch)
>           - I've currently implemented that the system resets the profiling data at the StartHeapProfiling, this allows you to start profiling, stop profiling, and then the
> user can query what happened during profiling as a post-processing step :)

Nice addition.

>
>     - I've currently, for sake of simplicity and Robbin hinted it would be better for now, removed the mutex code for the data but think that will have to come back in a
> next patch or in this one at some point

29 // Keep muxlock for now
....
237   // Protects the traces currently sampled (below).
238   volatile intptr_t _allocated_traces_lock[1];

This seems unused now, and I do not see any locks at all?

>
>     - I've also really worked on the testing code to make it more expandable in this case
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c.patch is now a bit more generic and allows to turn on/off the
> sampling; it allows to query the existence or not of frames passed from Java world to JNI, which allows the test to define what should be seen and have the code in one place.
>            - That is done using the helper class http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/MyPackage/Frame.java.patch
>               - Basically each test can now provide an array of Frames that the native library can check the internal data for a match. This allows to have various tests
> have their own signatures, etc.

To get a more potential users it's nice with a Java API and as you say simplifies tests, good.

>
>            - This has allowed me to add a nice test here to test the wipe out of the data:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorOnOffTest.java.patch
>
>    - Finally, I've done initial overhead testing and, though my data was a bit noisy, I saw no real overhead using the Tlab approach while off. I will try to get better
> data and more importantly, more stable data when turning it on.

I think it's key to have numbers showing ~0% performance impact here.

>
> Things I still need to do:
>     - Have to fix that TLAB case for the FastTLABRefill
>     - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
>     - Have to check the GC elements and what that produces
>     - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

Also we will need support for some more platform, as the patch stands now, it's only the src/cpu/x86/vm/macroAssembler_x86.cpp part that needs to be done for other platforms?

Thanks!

/Robbin

>
> Thanks for looking at the webrev and have a great week!
> Jc
>
> On Fri, Jun 2, 2017 at 8:57 AM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Dear all,
>
>     Thanks Robbin for the comments. I have left the MuxLocker for now and will look at removing it or as a future enhancement as you say. I did make the class extends and
>     added a TODO for myself to test this in slowdebug.
>
>     I have also put a new webrev up that is still a work in progress but should allow us to talk about TLAB vs C1/C2 modifications.
>
>     TLAB implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/>
>     C1/C2 implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>
>
>     What this webrev has is:
>         - I have put in a TLAB implementation, it is a WIP, and I have not yet done the qualitative/quantitative study on it vs the implementation using compilation changes
>     but the big parts are in place
>             - Caveats:
>                 - There is a TODO: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch
>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch>
>                     - I have not fixed the calculation in the case of a FastTLABRefill case
>                 - This is always on right now, there is no way to turn it off, that's also a TODO to be directed by the JVMTI API
>
>          - I also have circumvented the AsyncGetCallTrace using the snippet of code you showed Robbin, it works for here/now
>                - But we might have to revisit this one day because it then does not try to get some of the native stacks and jumps directly to the Java stacks (I see cases
>     where this could be an issue)
>                - However, this has cleaned up quite a bit of the code and I have removed all mention of ASGCT and its structures now and use directly the JVMTI structures
>
>         - GC is handled now, I have not yet done the qualitative/quantitative study on it but the big parts are in place
>
>         - Due to the way the TLAB is called, the stack walker is now correct and produces the right stacks it seems (this is a bold sentence from my ONE JTREG test :))
>
>     Final notes on this webrev:
>         - Have to fix that TLAB case for the FastTLABRefill
>         - Implement the turn on/off system for the TLAB implementation
>         - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
>         - Have to check the GC elements and what that produces
>         - Run a slowdebug run and ensure I fixed all those issues you saw Robbin
>
>     As always, your comments and feedback are greatly appreciated! Happy Friday!
>     Jc
>
>
>     On Tue, May 30, 2017 at 5:33 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Hi Jc,
>
>         On 05/22/2017 08:47 PM, JC Beyler wrote:
>
>             Dear all,
>
>             I have a new webrev up:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>
>
>
>         I liked this!
>
>         Two small things:
>
>         heapMonitoring.hpp
>         class HeapMonitoring should extend AllStatic
>
>         heapMonitoring.cpp
>         class MuxLocker should extend StackObj
>         But I think you should skip MuxLocker or push it separate generic enhancement.
>
>         Great with the jtreg test, thanks alot!
>
>
>             This webrev has, I hope, fixed a lot of the comments from Robbin:
>                 - The casts normally are all C++ style
>                 - Moved this to jdk10-hs
>                    - I have not tested slowdebug yet, hopefully it does not break there
>                 - Added the garbage collection system:
>                    - Now live sampled allocations are tracked throughout their lifetime
>                       - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
>                           - Those lists use the array system that the live objects were using before but have different re-use strategies
>                    - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
>             - Both use the same JVMTI structures
>                    - Added the calls to them for the test, though I've kept that test simple for now:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>                       - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent
>             structure. This is updated in my local repo and will get in the next webrev.
>
>             Next steps for this work are:
>                  - Putting the TLAB implementation (yes not forgotten ;-))
>                  - Adding more testing and separate the current test system to check things a bit more thoroughly
>                  - Have not tried to circumvent AsyncGetCallTrace yet
>                  - Still have to double check the stack walker a bit more
>
>
>         Looking forward to this.
>
>         Could someone from compiler take a look please?
>
>         Thanks!
>
>         /Robbin
>
>
>             Happy webrev perusal!
>             Jc
>
>
>             On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>             wrote:
>
>                  Just a few answers,
>
>                  On 05/15/2017 06:48 PM, JC Beyler wrote:
>
>                      Dear all,
>
>                      I've updated the webrev to:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>>
>
>
>                  I'll look at this later, thanks!
>
>
>                      Robbin,
>                      I believe I have addressed most of your items with webrev 02:
>                          - I added a JTreg test to show how it works:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>>
>                         - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as
>             we move
>                      forward, that is still on my TODOs
>                         - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
>                         - I cleaned up other issues as well.
>
>                      However, I have three questions, which are probably because I'm new in this community:
>                         1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
>             <http://hg.openjdk.java.net/jdk10/jdk10>
>                      <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10
>             <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
>                             - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via
>             klass() for
>                      example in collectedHeap.inline.hpp)
>                             - Would you know what is the right hg clone command so we are working on the same code base?
>
>
>                  We use jdk10-hs, e.g.
>                  hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> <http://hg.openjdk.java.net/jdk10/hs
>             <http://hg.openjdk.java.net/jdk10/hs>> 10-hs
>
>                  There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...
>
>
>                         2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use.
>             It might be
>                      that I don't understand when one uses one or the other but I see both used around the code base?
>                           - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
>
>
>                  We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
>                  But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>
>
>                         3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should
>             go to C casts
>                      or C++ casts as I see both in the codebase. What is the convention I should use?
>
>
>                  Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
>                  And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.
>
>
>                      Final notes on this webrev:
>                          - I am still missing:
>                            - Putting a TLAB implementation so that we can compare both webrevs
>                            - Have not tried to circumvent AsyncGetCallTrace
>                            - Putting in the handling of GC'd objects
>                            - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
>
>                      I will work on integrating those items for the next webrev!
>
>
>                  Thanks!
>
>
>                      Thanks for your help,
>                      Jc
>
>                      Ps:  I tested this on a new repo:
>
>                      hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
>             <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>
>                      <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
>                      ... building it
>                      cd test
>                      jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
>                      <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
>
>
>                  I'll test it out!
>
>                  /Robbin
>
>
>
>                      On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>                      <mailto:[hidden email] <mailto:[hidden email]>>> <[hidden email] <mailto:[hidden email]>
>             <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>
>                      <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>
>                           Robbin,
>
>                           Thank you for forwarding!
>                           I will review it.
>
>                           Thanks,
>                           Serguei
>
>
>
>                           On 5/4/17 02:13, Robbin Ehn wrote:
>
>                               Hi,
>
>                               To me the compiler changes looks what is expected.
>                               It would be good if someone from compiler could take a look at that.
>                               Added compiler to mail thread.
>
>                               Also adding Serguei, It would be good with his view also.
>
>                               My initial take on it, read through most of the code and took it for a ride.
>
>                               ##############################
>                               - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
>                               instead of a separate check for sampling in fast path for the final version.
>
>                               ##############################
>                               - This patch I had to apply to get it compile on JDK 10:
>
>                               diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
>                               --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
>                               +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
>                               @@ -87,3 +87,3 @@
>                                     // support for object alloc event (no-op most of the time)
>                               -    if (klass() != NULL && klass()->name() != NULL) {
>                               +    if (klass != NULL && klass->name() != NULL) {
>                                       Thread *base_thread = Thread::current();
>                               diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>                               --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
>                               +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
>                               @@ -316,3 +316,3 @@
>                                   JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
>                               -  assert(o->size() << LogHeapWordSize == byte_size,
>                               +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>                                          "Object size is incorrect.");
>
>                               ##############################
>                               - This patch I had to apply to get it not asserting during slowdebug:
>
>                               --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
>                               +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
>                               @@ -32,3 +32,3 @@
>                                 // TODO(jcbeyler): should we make this into a JVMTI structure?
>                               -struct StackTraceData {
>                               +struct StackTraceData : CHeapObj<mtInternal> {
>                                   ASGCT_CallTrace *trace;
>                               @@ -143,3 +143,2 @@
>                                 StackTraceStorage::StackTraceStorage() :
>                               -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>                                     _allocated_traces_size(MaxHeapTraces),
>                               @@ -147,2 +146,3 @@
>                                     _allocated_count(0) {
>                               +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
>                                   memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
>                               @@ -152,3 +152,3 @@
>                                 StackTraceStorage::~StackTraceStorage() {
>                               -  delete[] _allocated_traces;
>                               +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>                                 }
>
>                               - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>                               - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
>             os::malloc and new.
>                               - In jvmtiHeapTransition.hpp you use C cast instead.
>
>                               ##############################
>                               - This patch I had apply to get traces without setting an ‘unrelated’ capability
>                               - Should this not be a new capability?
>
>                               diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>                               --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
>                               +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
>                               @@ -530,6 +530,6 @@
>
>                               -  if (!JvmtiExport::should_post_class_load()) {
>                               +/*  if (!JvmtiExport::should_post_class_load()) {
>                                     trace->num_frames = ticks_no_class_load; // -1
>                                     return;
>                               -  }
>                               +  }*/
>
>                               ##############################
>                               - forte.cpp: (I know this is not part of your changes but)
>                               find_jmethod_id_or_null give me NULL for my test.
>                               It looks like we actually want the regular jmethod_id() ?
>
>                               Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
>                               I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
>                               But since we might need jmethod_id() if possible to avoid getting method id NULL,
>                               we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
>
>                               Something like:
>
>                                  if (jthread->has_last_Java_frame()) { // just to be safe
>                                    vframeStream vfst(jthread);
>                                    while (!vfst.at_end()) {
>                                      Method* m = vfst.method();
>                                      m->jmethod_id();
>                                      m->line_number_from_bci(vfst.bci());
>                                      vfst.next();
>                                    }
>
>                               - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
>                               Line number should be m->line_number_from_bci(bci);
>                               Do the heapMonitoring suppose to trace with bci or line number?
>                               I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
>
>                               ##############################
>                               - // TODO(jcbeyler): remove this extra code handling the extra trace for
>                               Please fix all these TODO's :)
>
>                               ##############################
>                               - heapMonitoring.hpp:
>                               // TODO(jcbeyler): is this algorithm acceptable in open source?
>
>                               Why is this comment here? What is the implication?
>                               Have you tested any simpler algorithm?
>
>                               ##############################
>                               - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
>
>                               ##############################
>                               - monitoring_period vs HeapMonitorRate, pick rate or period.
>
>                               ##############################
>                               - globals.hpp
>                               Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
>
>                               ##############################
>                               - jvmtiStackTraceData + ASGCT_CallFrame memory
>                               Are the agent suppose to loop through and free all ASGCT_CallFrame?
>                               Wouldn't it be better with some kinda protocol, like:
>                               (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>                               (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>
>                               Also using another data structure that have num_traces inside it simplifies things.
>                               So I'm not convinced using the async structure is the best way forward.
>
>
>                               I have more questions, but I think it's better if you respond and update the code first.
>
>                               Thanks!
>
>                               /Robbin
>
>
>                               On 04/21/2017 11:34 PM, JC Beyler wrote:
>
>                                   Hi all,
>
>                                   I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>>
>
>                                   The new webrev.01 also adds the actual heap monitoring sampling system in files:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
>                                   <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>>
>                                   and
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>
>                                   <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>>
>
>                                   My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
>
>                                   Thanks for your attention and let me know if there are any questions!
>
>                                   Have a wonderful Friday!
>                                   Jc
>
>                                   On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>                      <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email]
>             <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>
>                                        Hi all,
>
>                                        I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full
>             data, so I
>                      am just
>                                   summarizing
>                                        here for now.
>
>                                        - Overhead of the feature
>
>                                        Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k
>             being our
>                      default setting.
>
>                                        - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo
>             and JDK9 it seems)
>                                        - I want to rerun next week to ensure number stability
>
>                                        - Accuracy of the feature
>
>                                        I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and
>             90% from
>                      stacktrace
>                                   S2. The
>                                        microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I
>             allocated10%
>                      S1 and 90%
>                                   S2, with a
>                                        sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>
>                                        Let me know if there are any questions on the numbers and if you'd like to see some more data.
>
>                                        Note: this was done using our internal JDK8 implementation since the webrev provided by
>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>> does not yet contain the whole
>                                   implementation and therefore would have been misleading.
>
>                                        Thanks,
>                                        Jc
>
>
>                                        On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>                      <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>
>                                            Hi all,
>
>                                            To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>>
>                                            415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>
>                                            This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
>                                   <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>
>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>>
>                                   <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>
>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>> <https://bugs.openjdk.java.net/browse/JDK-8177374
>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>>>; what it does show is parts of the
>             implementation that is
>                                   proposed and hopefully can start the conversation going
>                                            as I work through the details.
>
>                                            For example, the changes to C2 are done here for the allocations:
>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>>
>
>                                            Hopefully this all makes sense and thank you for all your future comments!
>                                            Jc
>
>
>                                            On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>                      <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>>
>                                   wrote:
>
>                                                Hello all,
>
>                                                This is a follow-up from Jeremy's initial email from last year:
>             http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>                                   <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>
>                                   <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>>
>
>                                                I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP form:
>             https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119> <https://bugs.openjdk.java.net/browse/JDK-8171119
>             <https://bugs.openjdk.java.net/browse/JDK-8171119>> <https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>
>                      <https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>>>
>
>                                                I think original conversation that happened last year in that thread still holds true:
>
>                                                  - We have a patch at Google that we think others might be interested in
>                                                     - It provides a means to understand where the allocation hotspots are at a very low overhead
>                                                     - Since it is at a low overhead, we can leave it on by default
>
>                                                So I come to the mailing list with Jeremy's initial question:
>                                                "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
>
>                                                A year ago, it seemed some thought it was a good idea, is this still true?
>
>                                                Thanks,
>                                                Jc
>
>
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Robbin Ehn
Hi again,

Forgot to say added Dan, he might be interested in JVMTI changes at least.

On 06/13/2017 11:19 AM, Robbin Ehn wrote:
>
> To get a more potential users it's nice with a Java API and as you say simplifies tests, good.

Obviously this is not a public API, but the code for doing it from java will at least but out there.

Thanks!

/Robbin

>
>>
>>            - This has allowed me to add a nice test here to test the wipe out of the data:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorOnOffTest.java.patch
>>
>>    - Finally, I've done initial overhead testing and, though my data was a bit noisy, I saw no real overhead using the Tlab approach while off. I will try to get better
>> data and more importantly, more stable data when turning it on.
>
> I think it's key to have numbers showing ~0% performance impact here.
>
>>
>> Things I still need to do:
>>     - Have to fix that TLAB case for the FastTLABRefill
>>     - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
>>     - Have to check the GC elements and what that produces
>>     - Run a slowdebug run and ensure I fixed all those issues you saw Robbin
>
> Also we will need support for some more platform, as the patch stands now, it's only the src/cpu/x86/vm/macroAssembler_x86.cpp part that needs to be done for other platforms?
>
> Thanks!
>
> /Robbin
>
>>
>> Thanks for looking at the webrev and have a great week!
>> Jc
>>
>> On Fri, Jun 2, 2017 at 8:57 AM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>     Dear all,
>>
>>     Thanks Robbin for the comments. I have left the MuxLocker for now and will look at removing it or as a future enhancement as you say. I did make the class extends and
>>     added a TODO for myself to test this in slowdebug.
>>
>>     I have also put a new webrev up that is still a work in progress but should allow us to talk about TLAB vs C1/C2 modifications.
>>
>>     TLAB implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/>
>>     C1/C2 implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>
>>
>>     What this webrev has is:
>>         - I have put in a TLAB implementation, it is a WIP, and I have not yet done the qualitative/quantitative study on it vs the implementation using compilation changes
>>     but the big parts are in place
>>             - Caveats:
>>                 - There is a TODO: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch>
>>                     - I have not fixed the calculation in the case of a FastTLABRefill case
>>                 - This is always on right now, there is no way to turn it off, that's also a TODO to be directed by the JVMTI API
>>
>>          - I also have circumvented the AsyncGetCallTrace using the snippet of code you showed Robbin, it works for here/now
>>                - But we might have to revisit this one day because it then does not try to get some of the native stacks and jumps directly to the Java stacks (I see cases
>>     where this could be an issue)
>>                - However, this has cleaned up quite a bit of the code and I have removed all mention of ASGCT and its structures now and use directly the JVMTI structures
>>
>>         - GC is handled now, I have not yet done the qualitative/quantitative study on it but the big parts are in place
>>
>>         - Due to the way the TLAB is called, the stack walker is now correct and produces the right stacks it seems (this is a bold sentence from my ONE JTREG test :))
>>
>>     Final notes on this webrev:
>>         - Have to fix that TLAB case for the FastTLABRefill
>>         - Implement the turn on/off system for the TLAB implementation
>>         - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
>>         - Have to check the GC elements and what that produces
>>         - Run a slowdebug run and ensure I fixed all those issues you saw Robbin
>>
>>     As always, your comments and feedback are greatly appreciated! Happy Friday!
>>     Jc
>>
>>
>>     On Tue, May 30, 2017 at 5:33 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>         Hi Jc,
>>
>>         On 05/22/2017 08:47 PM, JC Beyler wrote:
>>
>>             Dear all,
>>
>>             I have a new webrev up:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>
>>
>>
>>         I liked this!
>>
>>         Two small things:
>>
>>         heapMonitoring.hpp
>>         class HeapMonitoring should extend AllStatic
>>
>>         heapMonitoring.cpp
>>         class MuxLocker should extend StackObj
>>         But I think you should skip MuxLocker or push it separate generic enhancement.
>>
>>         Great with the jtreg test, thanks alot!
>>
>>
>>             This webrev has, I hope, fixed a lot of the comments from Robbin:
>>                 - The casts normally are all C++ style
>>                 - Moved this to jdk10-hs
>>                    - I have not tested slowdebug yet, hopefully it does not break there
>>                 - Added the garbage collection system:
>>                    - Now live sampled allocations are tracked throughout their lifetime
>>                       - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
>>                           - Those lists use the array system that the live objects were using before but have different re-use strategies
>>                    - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
>>             - Both use the same JVMTI structures
>>                    - Added the calls to them for the test, though I've kept that test simple for now:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>>                       - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent
>>             structure. This is updated in my local repo and will get in the next webrev.
>>
>>             Next steps for this work are:
>>                  - Putting the TLAB implementation (yes not forgotten ;-))
>>                  - Adding more testing and separate the current test system to check things a bit more thoroughly
>>                  - Have not tried to circumvent AsyncGetCallTrace yet
>>                  - Still have to double check the stack walker a bit more
>>
>>
>>         Looking forward to this.
>>
>>         Could someone from compiler take a look please?
>>
>>         Thanks!
>>
>>         /Robbin
>>
>>
>>             Happy webrev perusal!
>>             Jc
>>
>>
>>             On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>>             wrote:
>>
>>                  Just a few answers,
>>
>>                  On 05/15/2017 06:48 PM, JC Beyler wrote:
>>
>>                      Dear all,
>>
>>                      I've updated the webrev to:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>>
>>
>>
>>                  I'll look at this later, thanks!
>>
>>
>>                      Robbin,
>>                      I believe I have addressed most of your items with webrev 02:
>>                          - I added a JTreg test to show how it works:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>>
>>                         - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as
>>             we move
>>                      forward, that is still on my TODOs
>>                         - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
>>                         - I cleaned up other issues as well.
>>
>>                      However, I have three questions, which are probably because I'm new in this community:
>>                         1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
>>             <http://hg.openjdk.java.net/jdk10/jdk10>
>>                      <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10
>>             <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
>>                             - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via
>>             klass() for
>>                      example in collectedHeap.inline.hpp)
>>                             - Would you know what is the right hg clone command so we are working on the same code base?
>>
>>
>>                  We use jdk10-hs, e.g.
>>                  hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> <http://hg.openjdk.java.net/jdk10/hs
>>             <http://hg.openjdk.java.net/jdk10/hs>> 10-hs
>>
>>                  There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...
>>
>>
>>                         2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use.
>>             It might be
>>                      that I don't understand when one uses one or the other but I see both used around the code base?
>>                           - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?
>>
>>
>>                  We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
>>                  But for arrays you will need the macro NEW_C_HEAP_ARRAY.
>>
>>
>>                         3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should
>>             go to C casts
>>                      or C++ casts as I see both in the codebase. What is the convention I should use?
>>
>>
>>                  Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
>>                  And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.
>>
>>
>>                      Final notes on this webrev:
>>                          - I am still missing:
>>                            - Putting a TLAB implementation so that we can compare both webrevs
>>                            - Have not tried to circumvent AsyncGetCallTrace
>>                            - Putting in the handling of GC'd objects
>>                            - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev
>>
>>                      I will work on integrating those items for the next webrev!
>>
>>
>>                  Thanks!
>>
>>
>>                      Thanks for your help,
>>                      Jc
>>
>>                      Ps:  I tested this on a new repo:
>>
>>                      hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
>>             <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>
>>                      <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
>>                      ... building it
>>                      cd test
>>                      jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
>>                      <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/
>>
>>
>>                  I'll test it out!
>>
>>                  /Robbin
>>
>>
>>
>>                      On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>>                      <mailto:[hidden email] <mailto:[hidden email]>>> <[hidden email] <mailto:[hidden email]>
>>             <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>>
>>                      <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>>
>>                           Robbin,
>>
>>                           Thank you for forwarding!
>>                           I will review it.
>>
>>                           Thanks,
>>                           Serguei
>>
>>
>>
>>                           On 5/4/17 02:13, Robbin Ehn wrote:
>>
>>                               Hi,
>>
>>                               To me the compiler changes looks what is expected.
>>                               It would be good if someone from compiler could take a look at that.
>>                               Added compiler to mail thread.
>>
>>                               Also adding Serguei, It would be good with his view also.
>>
>>                               My initial take on it, read through most of the code and took it for a ride.
>>
>>                               ##############################
>>                               - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
>>                               instead of a separate check for sampling in fast path for the final version.
>>
>>                               ##############################
>>                               - This patch I had to apply to get it compile on JDK 10:
>>
>>                               diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
>>                               --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
>>                               +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
>>                               @@ -87,3 +87,3 @@
>>                                     // support for object alloc event (no-op most of the time)
>>                               -    if (klass() != NULL && klass()->name() != NULL) {
>>                               +    if (klass != NULL && klass->name() != NULL) {
>>                                       Thread *base_thread = Thread::current();
>>                               diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
>>                               --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
>>                               +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
>>                               @@ -316,3 +316,3 @@
>>                                   JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
>>                               -  assert(o->size() << LogHeapWordSize == byte_size,
>>                               +  assert(o->size() << LogHeapWordSize == (long)byte_size,
>>                                          "Object size is incorrect.");
>>
>>                               ##############################
>>                               - This patch I had to apply to get it not asserting during slowdebug:
>>
>>                               --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
>>                               +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
>>                               @@ -32,3 +32,3 @@
>>                                 // TODO(jcbeyler): should we make this into a JVMTI structure?
>>                               -struct StackTraceData {
>>                               +struct StackTraceData : CHeapObj<mtInternal> {
>>                                   ASGCT_CallTrace *trace;
>>                               @@ -143,3 +143,2 @@
>>                                 StackTraceStorage::StackTraceStorage() :
>>                               -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
>>                                     _allocated_traces_size(MaxHeapTraces),
>>                               @@ -147,2 +146,3 @@
>>                                     _allocated_count(0) {
>>                               +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
>>                                   memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
>>                               @@ -152,3 +152,3 @@
>>                                 StackTraceStorage::~StackTraceStorage() {
>>                               -  delete[] _allocated_traces;
>>                               +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
>>                                 }
>>
>>                               - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
>>                               - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
>>             os::malloc and new.
>>                               - In jvmtiHeapTransition.hpp you use C cast instead.
>>
>>                               ##############################
>>                               - This patch I had apply to get traces without setting an ‘unrelated’ capability
>>                               - Should this not be a new capability?
>>
>>                               diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
>>                               --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
>>                               +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
>>                               @@ -530,6 +530,6 @@
>>
>>                               -  if (!JvmtiExport::should_post_class_load()) {
>>                               +/*  if (!JvmtiExport::should_post_class_load()) {
>>                                     trace->num_frames = ticks_no_class_load; // -1
>>                                     return;
>>                               -  }
>>                               +  }*/
>>
>>                               ##############################
>>                               - forte.cpp: (I know this is not part of your changes but)
>>                               find_jmethod_id_or_null give me NULL for my test.
>>                               It looks like we actually want the regular jmethod_id() ?
>>
>>                               Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
>>                               I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
>>                               But since we might need jmethod_id() if possible to avoid getting method id NULL,
>>                               we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.
>>
>>                               Something like:
>>
>>                                  if (jthread->has_last_Java_frame()) { // just to be safe
>>                                    vframeStream vfst(jthread);
>>                                    while (!vfst.at_end()) {
>>                                      Method* m = vfst.method();
>>                                      m->jmethod_id();
>>                                      m->line_number_from_bci(vfst.bci());
>>                                      vfst.next();
>>                                    }
>>
>>                               - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
>>                               Line number should be m->line_number_from_bci(bci);
>>                               Do the heapMonitoring suppose to trace with bci or line number?
>>                               I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.
>>
>>                               ##############################
>>                               - // TODO(jcbeyler): remove this extra code handling the extra trace for
>>                               Please fix all these TODO's :)
>>
>>                               ##############################
>>                               - heapMonitoring.hpp:
>>                               // TODO(jcbeyler): is this algorithm acceptable in open source?
>>
>>                               Why is this comment here? What is the implication?
>>                               Have you tested any simpler algorithm?
>>
>>                               ##############################
>>                               - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)
>>
>>                               ##############################
>>                               - monitoring_period vs HeapMonitorRate, pick rate or period.
>>
>>                               ##############################
>>                               - globals.hpp
>>                               Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.
>>
>>                               ##############################
>>                               - jvmtiStackTraceData + ASGCT_CallFrame memory
>>                               Are the agent suppose to loop through and free all ASGCT_CallFrame?
>>                               Wouldn't it be better with some kinda protocol, like:
>>                               (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
>>                               (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);
>>
>>                               Also using another data structure that have num_traces inside it simplifies things.
>>                               So I'm not convinced using the async structure is the best way forward.
>>
>>
>>                               I have more questions, but I think it's better if you respond and update the code first.
>>
>>                               Thanks!
>>
>>                               /Robbin
>>
>>
>>                               On 04/21/2017 11:34 PM, JC Beyler wrote:
>>
>>                                   Hi all,
>>
>>                                   I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>>
>>
>>                                   The new webrev.01 also adds the actual heap monitoring sampling system in files:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
>>                                   <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>>
>>                                   and
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>
>>                                   <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>>
>>
>>                                   My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.
>>
>>                                   Thanks for your attention and let me know if there are any questions!
>>
>>                                   Have a wonderful Friday!
>>                                   Jc
>>
>>                                   On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>>                      <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email]
>>             <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>>
>>                                        Hi all,
>>
>>                                        I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full
>>             data, so I
>>                      am just
>>                                   summarizing
>>                                        here for now.
>>
>>                                        - Overhead of the feature
>>
>>                                        Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k
>>             being our
>>                      default setting.
>>
>>                                        - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo
>>             and JDK9 it seems)
>>                                        - I want to rerun next week to ensure number stability
>>
>>                                        - Accuracy of the feature
>>
>>                                        I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and
>>             90% from
>>                      stacktrace
>>                                   S2. The
>>                                        microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I
>>             allocated10%
>>                      S1 and 90%
>>                                   S2, with a
>>                                        sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.
>>
>>                                        Let me know if there are any questions on the numbers and if you'd like to see some more data.
>>
>>                                        Note: this was done using our internal JDK8 implementation since the webrev provided by
>>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>> does not yet contain the
>> whole
>>                                   implementation and therefore would have been misleading.
>>
>>                                        Thanks,
>>                                        Jc
>>
>>
>>                                        On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>>                      <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>>
>>                                            Hi all,
>>
>>                                            To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
>>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>>
>>                                            415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg
>>
>>                                            This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
>>                                   <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>>
>>                                   <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>> <https://bugs.openjdk.java.net/browse/JDK-8177374
>>             <https://bugs.openjdk.java.net/browse/JDK-8177374>
>>                      <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>>>; what it does show is parts of the
>>             implementation that is
>>                                   proposed and hopefully can start the conversation going
>>                                            as I work through the details.
>>
>>                                            For example, the changes to C2 are done here for the allocations:
>>             http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
>>                                   <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
>>                      <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
>>             <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>>
>>
>>                                            Hopefully this all makes sense and thank you for all your future comments!
>>                                            Jc
>>
>>
>>                                            On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
>>                      <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
>>             <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>>
>>                                   wrote:
>>
>>                                                Hello all,
>>
>>                                                This is a follow-up from Jeremy's initial email from last year:
>>             http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>>                                   <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>
>>                                   <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
>> <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
>>                      <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
>>             <http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>>
>>
>>                                                I've gone ahead and started working on preparing this and Jeremy and I went down the route of actually writing it up in JEP
>> form:
>>             https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119> <https://bugs.openjdk.java.net/browse/JDK-8171119
>>             <https://bugs.openjdk.java.net/browse/JDK-8171119>> <https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>
>>                      <https://bugs.openjdk.java.net/browse/JDK-8171119 <https://bugs.openjdk.java.net/browse/JDK-8171119>>>
>>
>>                                                I think original conversation that happened last year in that thread still holds true:
>>
>>                                                  - We have a patch at Google that we think others might be interested in
>>                                                     - It provides a means to understand where the allocation hotspots are at a very low overhead
>>                                                     - Since it is at a low overhead, we can leave it on by default
>>
>>                                                So I come to the mailing list with Jeremy's initial question:
>>                                                "I thought I would ask if there is any interest / if I should write a JEP / if I should just forget it."
>>
>>                                                A year ago, it seemed some thought it was a good idea, is this still true?
>>
>>                                                Thanks,
>>                                                Jc
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Thomas Schatzl
In reply to this post by JC Beyler
Hi all,

On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
> Dear all,
>
> I've continued working on this and have done the following webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

[...]
> Things I still need to do:
>    - Have to fix that TLAB case for the FastTLABRefill
>    - Have to start looking at the data to see that it is consistent
> and does gather the right samples, right frequency, etc.
>    - Have to check the GC elements and what that produces
>    - Run a slowdebug run and ensure I fixed all those issues you saw
> Robbin
>
> Thanks for looking at the webrev and have a great week!

  scratching a bit on the surface of this change, so apologies for
rather shallow comments:

- macroAssembler_x86.cpp:5604: while this is compiler code, and I am
not sure this is final, please avoid littering the code with TODO
remarks :) They tend to be candidates for later wtf moments only.

Just file a CR for that.

- calling HeapMonitoring::do_weak_oops() (which should probably be
called weak_oops_do() like other similar methods) only if string
deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.

The call should be at least around 6 lines up outside the if.

Preferentially in a method like process_weak_jni_handles(), including
additional logging. (No new (G1) gc phase without minimal logging :)).

Seeing the changes in referenceProcess.cpp, you need to add the call to
HeapMonitoring::do_weak_oops() exactly similar to
process_weak_jni_handles() in case there is no reference processing
done.

- psParallelCompact.cpp:2172 similar to other places where the change
adds the call to HeapMonitoring::do_weak_oops(), remove the empty line.

- the change doubles the size of
CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
threshold. Maybe it could be refactored a bit.

- referenceProcessor.cpp:40: please make sure that the includes are
sorted alphabetically (I did not check the other files yet).

- referenceProcessor.cpp:261: the change should add logging about the
number of references encountered, maybe after the corresponding "JNI
weak reference count" log message.

- threadLocalAllocBuffer.cpp:331: one more "TODO"

- threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
documentation should be updated about the sampling additions. I would
have no clue what the difference between "actual_end" and "end" would
be from the given information.

- threadLocalAllocBuffer.hpp:130: extra whitespace ;)

- some copyrights :)

- in heapMonitoring.hpp: there are some random comments about some code
that has been grabbed from "util/math/fastmath.[h|cc]". I can't tell
whether this is code that can be used but I assume that Noam Shazeer is
okay with that (i.e. that's all Google code).

- heapMonitoring.hpp/cpp static constant naming does not correspond to
Hotspot's. Additionally, in Hotspot static methods are cased like other
methods.

- in heapMonitoring.cpp there are a few cryptic comments at the top
that seem to refer to internal stuff that should probably be removed.

I did not think through the impact of the TLAB changes on collector
behavior yet (if there are). Also I did not check for problems with
concurrent mark and SATB/G1 (if there are).

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
In reply to this post by Robbin Ehn
Hi all,

As I work through the remarks it seems easier to handle this in maybe two shorter emails than one handling it all. So let me start with Robbin's questions/remarks:

- I removed the lock relicas, so that is handled.

- For performance numbers, is there a set of benchmarks used by serviceability-dev to show overhead? Currently, I've been using DaCapo for example since it should show any issues quickly,

- For the helper API that I'm doing for the tests, would you want it more polished straight away? My idea was to have something simple for testing and then work on something a bit more open to users afterwards, either directly in the test folders or somewhere else. Let me know what you think.

Thanks again and I'll be answering Thomas' comments in a bit,
Jc


On Tue, Jun 13, 2017 at 2:19 AM, Robbin Ehn <[hidden email]> wrote:
Hi JC,

This version really makes my happy, awesome work!

Since we got no attention from compiler and you manage to get ride of most of the compiler changes, I dropped compiler but added runtime instead.
Serguei, when you have time, please chime in.

On 06/12/2017 08:11 PM, JC Beyler wrote:
Dear all,

I've continued working on this and have done the following webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

What it does:
    - Implement the turn on/off system for the TLAB implementation
       - Basically you can turn the system on/off using the JVMTI API (StartHeapProfiling and StopHeapProfiling here http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/src/share/vm/prims/jvmti.xml.patch)
          - I've currently implemented that the system resets the profiling data at the StartHeapProfiling, this allows you to start profiling, stop profiling, and then the user can query what happened during profiling as a post-processing step :)

Nice addition.


    - I've currently, for sake of simplicity and Robbin hinted it would be better for now, removed the mutex code for the data but think that will have to come back in a next patch or in this one at some point

29 // Keep muxlock for now
....
237   // Protects the traces currently sampled (below).
238   volatile intptr_t _allocated_traces_lock[1];

This seems unused now, and I do not see any locks at all?


    - I've also really worked on the testing code to make it more expandable in this case
http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c.patch is now a bit more generic and allows to turn on/off the sampling; it allows to query the existence or not of frames passed from Java world to JNI, which allows the test to define what should be seen and have the code in one place.
           - That is done using the helper class http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/MyPackage/Frame.java.patch
              - Basically each test can now provide an array of Frames that the native library can check the internal data for a match. This allows to have various tests have their own signatures, etc.

To get a more potential users it's nice with a Java API and as you say simplifies tests, good.


           - This has allowed me to add a nice test here to test the wipe out of the data:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorOnOffTest.java.patch

   - Finally, I've done initial overhead testing and, though my data was a bit noisy, I saw no real overhead using the Tlab approach while off. I will try to get better data and more importantly, more stable data when turning it on.

I think it's key to have numbers showing ~0% performance impact here.


Things I still need to do:
    - Have to fix that TLAB case for the FastTLABRefill
    - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
    - Have to check the GC elements and what that produces
    - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

Also we will need support for some more platform, as the patch stands now, it's only the src/cpu/x86/vm/macroAssembler_x86.cpp part that needs to be done for other platforms?

Thanks!

/Robbin


Thanks for looking at the webrev and have a great week!
Jc

On Fri, Jun 2, 2017 at 8:57 AM, JC Beyler <[hidden email] <mailto:[hidden email]>> wrote:

    Dear all,

    Thanks Robbin for the comments. I have left the MuxLocker for now and will look at removing it or as a future enhancement as you say. I did make the class extends and
    added a TODO for myself to test this in slowdebug.

    I have also put a new webrev up that is still a work in progress but should allow us to talk about TLAB vs C1/C2 modifications.

    TLAB implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/>
    C1/C2 implementation: http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>

    What this webrev has is:
        - I have put in a TLAB implementation, it is a WIP, and I have not yet done the qualitative/quantitative study on it vs the implementation using compilation changes
    but the big parts are in place
            - Caveats:
                - There is a TODO: http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch
    <http://cr.openjdk.java.net/~rasbold/8171119/webrev.04/src/cpu/x86/vm/macroAssembler_x86.cpp.patch>
                    - I have not fixed the calculation in the case of a FastTLABRefill case
                - This is always on right now, there is no way to turn it off, that's also a TODO to be directed by the JVMTI API

         - I also have circumvented the AsyncGetCallTrace using the snippet of code you showed Robbin, it works for here/now
               - But we might have to revisit this one day because it then does not try to get some of the native stacks and jumps directly to the Java stacks (I see cases
    where this could be an issue)
               - However, this has cleaned up quite a bit of the code and I have removed all mention of ASGCT and its structures now and use directly the JVMTI structures

        - GC is handled now, I have not yet done the qualitative/quantitative study on it but the big parts are in place

        - Due to the way the TLAB is called, the stack walker is now correct and produces the right stacks it seems (this is a bold sentence from my ONE JTREG test :))

    Final notes on this webrev:
        - Have to fix that TLAB case for the FastTLABRefill
        - Implement the turn on/off system for the TLAB implementation
        - Have to start looking at the data to see that it is consistent and does gather the right samples, right frequency, etc.
        - Have to check the GC elements and what that produces
        - Run a slowdebug run and ensure I fixed all those issues you saw Robbin

    As always, your comments and feedback are greatly appreciated! Happy Friday!
    Jc


    On Tue, May 30, 2017 at 5:33 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]>> wrote:

        Hi Jc,

        On 05/22/2017 08:47 PM, JC Beyler wrote:

            Dear all,

            I have a new webrev up:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/>


        I liked this!

        Two small things:

        heapMonitoring.hpp
        class HeapMonitoring should extend AllStatic

        heapMonitoring.cpp
        class MuxLocker should extend StackObj
        But I think you should skip MuxLocker or push it separate generic enhancement.

        Great with the jtreg test, thanks alot!


            This webrev has, I hope, fixed a lot of the comments from Robbin:
                - The casts normally are all C++ style
                - Moved this to jdk10-hs
                   - I have not tested slowdebug yet, hopefully it does not break there
                - Added the garbage collection system:
                   - Now live sampled allocations are tracked throughout their lifetime
                      - When GC happens, it moves the sampled allocation information to two lists: recent and frequent GC lists
                          - Those lists use the array system that the live objects were using before but have different re-use strategies
                   - Added the JVMTI API for them via a GetFrequentGarbageTraces and GetGarbageTraces
            - Both use the same JVMTI structures
                   - Added the calls to them for the test, though I've kept that test simple for now:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
                      - As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent
            structure. This is updated in my local repo and will get in the next webrev.

            Next steps for this work are:
                 - Putting the TLAB implementation (yes not forgotten ;-))
                 - Adding more testing and separate the current test system to check things a bit more thoroughly
                 - Have not tried to circumvent AsyncGetCallTrace yet
                 - Still have to double check the stack walker a bit more


        Looking forward to this.

        Could someone from compiler take a look please?

        Thanks!

        /Robbin


            Happy webrev perusal!
            Jc


            On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
            wrote:

                 Just a few answers,

                 On 05/15/2017 06:48 PM, JC Beyler wrote:

                     Dear all,

                     I've updated the webrev to:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>>


                 I'll look at this later, thanks!


                     Robbin,
                     I believe I have addressed most of your items with webrev 02:
                         - I added a JTreg test to show how it works:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>>
                        - I've modified the code to use its own data structures both internally and externally, this will make it easier to move out of AsyncGetCallTrace as
            we move
                     forward, that is still on my TODOs
                        - I cleaned up the JVMTI API by passing a structure that handles the num_traces and put in a ReleaseTraces as well
                        - I cleaned up other issues as well.

                     However, I have three questions, which are probably because I'm new in this community:
                        1) My previous webrevs were based off of JDK9 by mistake. When I took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
            <http://hg.openjdk.java.net/jdk10/jdk10>
                     <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10
            <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
                            - I don't see code compatible with what you were showing (ie your patches don't make sense for that code base; ex: klass is still accessed via
            klass() for
                     example in collectedHeap.inline.hpp)
                            - Would you know what is the right hg clone command so we are working on the same code base?


                 We use jdk10-hs, e.g.
                 hg tclone http://hg.openjdk.java.net/jdk10/hs <http://hg.openjdk.java.net/jdk10/hs> <http://hg.openjdk.java.net/jdk10/hs
            <http://hg.openjdk.java.net/jdk10/hs>> 10-hs

                 There is sporadic big merges going from jdk9->jdk10->jdk10-hs and jdk10-hs->jdk10, so 10 is moving...


                        2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I use.
            It might be
                     that I don't understand when one uses one or the other but I see both used around the code base?
                          - Is it that new is to be used for anything internal and NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?


                 We overload new operator when you extend correct base class, e.g. CHeapObj<mtInternal> so use 'new'
                 But for arrays you will need the macro NEW_C_HEAP_ARRAY.


                        3) Casts: same kind question: which should I use. The code was using a bit of everything, I'll refactor it entirely but I was not clear if I should
            go to C casts
                     or C++ casts as I see both in the codebase. What is the convention I should use?


                 Just be consist, use what suites you, C++ casts might be preferable, if we are moving towards C++11.
                 And use 'right' cast, e.g. going from Thread* to JavaThread* you should use C cast or static_cast, not reinterpret_cast I would say.


                     Final notes on this webrev:
                         - I am still missing:
                           - Putting a TLAB implementation so that we can compare both webrevs
                           - Have not tried to circumvent AsyncGetCallTrace
                           - Putting in the handling of GC'd objects
                           - Fix a stack walker issue I have seen, I think I know the problem and will test that theory out for the next webrev

                     I will work on integrating those items for the next webrev!


                 Thanks!


                     Thanks for your help,
                     Jc

                     Ps:  I tested this on a new repo:

                     hg clone http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
            <http://hg.openjdk.java.net/jdk10/jdk10>> <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>
                     <http://hg.openjdk.java.net/jdk10/jdk10 <http://hg.openjdk.java.net/jdk10/jdk10>>> jdk10
                     ... building it
                     cd test
                     jtreg -nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/ -jdk
                     <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk ../hotspot/test/serviceability/jvmti/HeapMonitor/


                 I'll test it out!

                 /Robbin



                     On Thu, May 4, 2017 at 11:21 PM, [hidden email] <mailto:[hidden email]> <mailto:[hidden email]
            <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
                     <mailto:[hidden email] <mailto:[hidden email]>>> <[hidden email] <mailto:[hidden email]>
            <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>

                     <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:

                          Robbin,

                          Thank you for forwarding!
                          I will review it.

                          Thanks,
                          Serguei



                          On 5/4/17 02:13, Robbin Ehn wrote:

                              Hi,

                              To me the compiler changes looks what is expected.
                              It would be good if someone from compiler could take a look at that.
                              Added compiler to mail thread.

                              Also adding Serguei, It would be good with his view also.

                              My initial take on it, read through most of the code and took it for a ride.

                              ##############################
                              - Regarding the compiler changes: I think we need the 'TLAB end' trickery (mentioned by Tony P)
                              instead of a separate check for sampling in fast path for the final version.

                              ##############################
                              - This patch I had to apply to get it compile on JDK 10:

                              diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
                              --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri Apr 28 14:31:38 2017 +0200
                              +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu May 04 10:22:56 2017 +0200
                              @@ -87,3 +87,3 @@
                                    // support for object alloc event (no-op most of the time)
                              -    if (klass() != NULL && klass()->name() != NULL) {
                              +    if (klass != NULL && klass->name() != NULL) {
                                      Thread *base_thread = Thread::current();
                              diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
                              --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 14:31:38 2017 +0200
                              +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:22:56 2017 +0200
                              @@ -316,3 +316,3 @@
                                  JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
                              -  assert(o->size() << LogHeapWordSize == byte_size,
                              +  assert(o->size() << LogHeapWordSize == (long)byte_size,
                                         "Object size is incorrect.");

                              ##############################
                              - This patch I had to apply to get it not asserting during slowdebug:

                              --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 15:15:16 2017 +0200
                              +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 10:24:25 2017 +0200
                              @@ -32,3 +32,3 @@
                                // TODO(jcbeyler): should we make this into a JVMTI structure?
                              -struct StackTraceData {
                              +struct StackTraceData : CHeapObj<mtInternal> {
                                  ASGCT_CallTrace *trace;
                              @@ -143,3 +143,2 @@
                                StackTraceStorage::StackTraceStorage() :
                              -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
                                    _allocated_traces_size(MaxHeapTraces),
                              @@ -147,2 +146,3 @@
                                    _allocated_count(0) {
                              +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, mtInternal);
                                  memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
                              @@ -152,3 +152,3 @@
                                StackTraceStorage::~StackTraceStorage() {
                              -  delete[] _allocated_traces;
                              +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
                                }

                              - Classes should extend correct base class for which type of memory is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
                              - The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY,
            os::malloc and new.
                              - In jvmtiHeapTransition.hpp you use C cast instead.

                              ##############################
                              - This patch I had apply to get traces without setting an ‘unrelated’ capability
                              - Should this not be a new capability?

                              diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
                              --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 +0200
                              +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 +0200
                              @@ -530,6 +530,6 @@

                              -  if (!JvmtiExport::should_post_class_load()) {
                              +/*  if (!JvmtiExport::should_post_class_load()) {
                                    trace->num_frames = ticks_no_class_load; // -1
                                    return;
                              -  }
                              +  }*/

                              ##############################
                              - forte.cpp: (I know this is not part of your changes but)
                              find_jmethod_id_or_null give me NULL for my test.
                              It looks like we actually want the regular jmethod_id() ?

                              Since we are the thread we are talking about (and in same ucontext) and thread is in vm and have a last java frame,
                              I think most of the checks done in AsyncGetCallTrace is irrelevant, so you should be-able to call forte_fill_call_trace_given_top directly.
                              But since we might need jmethod_id() if possible to avoid getting method id NULL,
                              we need some fixes in forte code, or just do the vframStream loop inside heapMonitoring.cpp and not use forte.cpp.

                              Something like:

                                 if (jthread->has_last_Java_frame()) { // just to be safe
                                   vframeStream vfst(jthread);
                                   while (!vfst.at_end()) {
                                     Method* m = vfst.method();
                                     m->jmethod_id();
                                     m->line_number_from_bci(vfst.bci());
                                     vfst.next();
                                   }

                              - This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
                              Line number should be m->line_number_from_bci(bci);
                              Do the heapMonitoring suppose to trace with bci or line number?
                              I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use another data structure which says bci.

                              ##############################
                              - // TODO(jcbeyler): remove this extra code handling the extra trace for
                              Please fix all these TODO's :)

                              ##############################
                              - heapMonitoring.hpp:
                              // TODO(jcbeyler): is this algorithm acceptable in open source?

                              Why is this comment here? What is the implication?
                              Have you tested any simpler algorithm?

                              ##############################
                              - Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building the agent)

                              ##############################
                              - monitoring_period vs HeapMonitorRate, pick rate or period.

                              ##############################
                              - globals.hpp
                              Why is MaxHeapTraces not settable/overridable from jvmti interface? That would be handy.

                              ##############################
                              - jvmtiStackTraceData + ASGCT_CallFrame memory
                              Are the agent suppose to loop through and free all ASGCT_CallFrame?
                              Wouldn't it be better with some kinda protocol, like:
                              (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
                              (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

                              Also using another data structure that have num_traces inside it simplifies things.
                              So I'm not convinced using the async structure is the best way forward.


                              I have more questions, but I think it's better if you respond and update the code first.

                              Thanks!

                              /Robbin


                              On 04/21/2017 11:34 PM, JC Beyler wrote:

                                  Hi all,

                                  I've added size information to the allocation sampling system. This allows the callback to remember the size of each sampled allocation.
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>>

                                  The new webrev.01 also adds the actual heap monitoring sampling system in files:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
                                  <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>>
                                  and
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>
                                  <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
                     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>>

                                  My next step is to add the GC part to the webrev, which will allow users to determine what objects are live and what are garbage.

                                  Thanks for your attention and let me know if there are any questions!

                                  Have a wonderful Friday!
                                  Jc

                                  On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
            <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
                     <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email]
            <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:

                                       Hi all,

                                       I worked on getting a few numbers for overhead and accuracy for my feature. I'm unsure if here is the right place to provide the full
            data, so I
                     am just
                                  summarizing
                                       here for now.

                                       - Overhead of the feature

                                       Using the Dacapo benchmark (http://dacapobench.org/). My initial results are that sampling provides 2.4% with a 512k sampling, 512k
            being our
                     default setting.

                                       - Note: this was without the tradesoap, tradebeans and tomcat benchmarks since they did not work with my JDK9 (issue between Dacapo
            and JDK9 it seems)
                                       - I want to rerun next week to ensure number stability

                                       - Accuracy of the feature

                                       I wrote a small microbenchmark that allocates from two different stacktraces at a given ratio. For example, 10% of stacktrace S1 and
            90% from
                     stacktrace
                                  S2. The
                                       microbenchmark was run 20 times, I averaged the results and looked for accuracy. It seems that statistically it is sound since if I
            allocated10%
                     S1 and 90%
                                  S2, with a
                                       sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

                                       Let me know if there are any questions on the numbers and if you'd like to see some more data.

                                       Note: this was done using our internal JDK8 implementation since the webrev provided by
            http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                                  <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
                                  <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>> does not yet contain the whole
                                  implementation and therefore would have been misleading.

                                       Thanks,
                                       Jc


                                       On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
            <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]>
                     <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email]
            <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:

                                           Hi all,

                                           To move the discussion forward, with Chuck Rasbold's help to make a webrev, we pushed this:
            http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
                                  <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
            <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>>
                                           415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

                                           This is not a final change that does the whole proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
            <https://bugs.openjdk.java.net/browse/JDK-8177374>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>
                                  <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>
            <https://bugs.openjdk.java.net/browse/JDK-8177374 <https://bugs.openjdk.java.net/browse/JDK-8177374>>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
In reply to this post by Thomas Schatzl
Hi all,

First off: Thanks again to Robbin and Thomas for their reviews :)

Next, I've uploaded a new webrev:

Here is an update:

- @Robbin, I forgot to say that yes I need to look at implementing this for the other architectures and testing it before it is all ready to go. Is it common to have it working on all possible combinations or is there a subset that I should be doing first and we can do the others later?
- I've tested slowdebug, built and ran the JTreg tests I wrote with slowdebug and fixed a few more issues
- I've refactored a bit of the code following Thomas' comments
   - I think I've handled all the comments from Thomas (I put comments inline below for the specifics)

The biggest addition to this webrev is that there is more testing, I've added two tests for looking specifically at the two garbage sampling data Recent vs Frequent:
   and

   - Side question: I was looking at trying to make this a multi-file library so you would not have all the code in one spot. It seems this is not really done?
      - Is there a solution to this? What I really wanted was:
        - The core library that will help testing be easier
        - The JNI code for each Java class separate and calling into the core library

- Following Thomas' comments on statistics, I want to add some quality assurance tests and find that the easiest way would be to have a few counters of what is happening in the sampler and expose that to the user.
   - I'll be adding that in the next version if no one sees any objections to that.
   - This will allow me to add a sanity test in JTreg about number of samples and average of sampling rate

@Thomas: I had a few questions that I inlined below but I will summarize the "bigger ones" here:
   - You mentioned constants are not using the right conventions, I looked around and didn't see any convention except normal naming then for static constants. Is that right?
   - You mentioned putting the weak_oops_do in a separate method and logging, I inlined my answer below and would appreciate your feedback there too.

Thanks again for your reviews and patience!
Jc

PS: I've also inlined my answers to Thomas below:

On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl <[hidden email]> wrote:
Hi all,

On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
> Dear all,
>
> I've continued working on this and have done the following webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

[...]
> Things I still need to do:
>    - Have to fix that TLAB case for the FastTLABRefill
>    - Have to start looking at the data to see that it is consistent
> and does gather the right samples, right frequency, etc.
>    - Have to check the GC elements and what that produces
>    - Run a slowdebug run and ensure I fixed all those issues you saw
> Robbin
>
> Thanks for looking at the webrev and have a great week!

  scratching a bit on the surface of this change, so apologies for
rather shallow comments:

- macroAssembler_x86.cpp:5604: while this is compiler code, and I am
not sure this is final, please avoid littering the code with TODO
remarks :) They tend to be candidates for later wtf moments only.

Just file a CR for that.


Newcomer question: what is a CR and not sure I have the rights to do that yet ? :)
 
- calling HeapMonitoring::do_weak_oops() (which should probably be
called weak_oops_do() like other similar methods) only if string
deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.

The call should be at least around 6 lines up outside the if.

Preferentially in a method like process_weak_jni_handles(), including
additional logging. (No new (G1) gc phase without minimal logging :)).

Done but really not sure because:

I put for logging:
  log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap monitoring");

Since weak_jni_handles didn't have logging for me to be inspired from, I did that but unconvinced this is what should be done.
 

Seeing the changes in referenceProcess.cpp, you need to add the call to
HeapMonitoring::do_weak_oops() exactly similar to
process_weak_jni_handles() in case there is no reference processing
done.

- psParallelCompact.cpp:2172 similar to other places where the change
adds the call to HeapMonitoring::do_weak_oops(), remove the empty line.

Done from what I can tell in the whole webrev. (The only empty lines I still see are when I maintain an empty line, exception being http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/src/share/vm/gc/shared/collectedHeap.cpp.patch where I think it was "nicer")
 

- the change doubles the size of
CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
threshold. Maybe it could be refactored a bit.

Done I think, it looks better to me :).
 

- referenceProcessor.cpp:40: please make sure that the includes are
sorted alphabetically (I did not check the other files yet).

Done and checked all other files normally.
 

- referenceProcessor.cpp:261: the change should add logging about the
number of references encountered, maybe after the corresponding "JNI
weak reference count" log message.

Just to double check, are you saying that you'd like to have the heap sampler to keep in store how many sampled objects were encountered in the HeapMonitoring::weak_oops_do?
   - Would a return of the method with the number of handled references and logging that work?
   - Additionally, would you prefer it in a separate block with its GCTraceTime?


- threadLocalAllocBuffer.cpp:331: one more "TODO"

Removed it and added it to my personal todos to look at.
 

- threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
documentation should be updated about the sampling additions. I would
have no clue what the difference between "actual_end" and "end" would
be from the given information.

If you are talking about the comments in this file, I made them more clear I hope in the new webrev. If it was somewhere else, let me know where to change.
 

- threadLocalAllocBuffer.hpp:130: extra whitespace ;)

Fixed :)
 

- some copyrights :)


I think I fixed all the issues, if you see specific issues, let me know.
 
- in heapMonitoring.hpp: there are some random comments about some code
that has been grabbed from "util/math/fastmath.[h|cc]". I can't tell
whether this is code that can be used but I assume that Noam Shazeer is
okay with that (i.e. that's all Google code).

Jeremy and I double checked and we can release that as I thought. I removed the comment from that piece of code entirely.
 

- heapMonitoring.hpp/cpp static constant naming does not correspond to
Hotspot's. Additionally, in Hotspot static methods are cased like other
methods.

I think I fixed the methods to be cased the same way as all other methods. For static constants, I was not sure. I fixed a few other variables but I could not seem to really see a consistent trend for constants. I made them as variables but I'm not sure now.

 

- in heapMonitoring.cpp there are a few cryptic comments at the top
that seem to refer to internal stuff that should probably be removed.

Sorry about that! My personal todos not cleared out.
 

I did not think through the impact of the TLAB changes on collector
behavior yet (if there are). Also I did not check for problems with
concurrent mark and SATB/G1 (if there are).

I would love to know your thoughts on this, I think this is fine. I see issues with multiple threads right now hitting the stack storage instance. Previous webrevs had a mutex lock here but we took it out for simplificity (and only for now).
 

Thanks,
  Thomas


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Thomas Schatzl
Hi,

On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:

> Hi all,
>
> First off: Thanks again to Robbin and Thomas for their reviews :)
>
> Next, I've uploaded a new webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>
> Here is an update:
>
> - @Robbin, I forgot to say that yes I need to look at implementing
> this for the other architectures and testing it before it is all
> ready to go. Is it common to have it working on all possible
> combinations or is there a subset that I should be doing first and we
> can do the others later?
> - I've tested slowdebug, built and ran the JTreg tests I wrote with
> slowdebug and fixed a few more issues
> - I've refactored a bit of the code following Thomas' comments
>    - I think I've handled all the comments from Thomas (I put
> comments inline below for the specifics)

Thanks for handling all those.

> - Following Thomas' comments on statistics, I want to add some
> quality assurance tests and find that the easiest way would be to
> have a few counters of what is happening in the sampler and expose
> that to the user.
>    - I'll be adding that in the next version if no one sees any
> objections to that.
>    - This will allow me to add a sanity test in JTreg about number of
> samples and average of sampling rate
>
> @Thomas: I had a few questions that I inlined below but I will
> summarize the "bigger ones" here:
>    - You mentioned constants are not using the right conventions, I
> looked around and didn't see any convention except normal naming then
> for static constants. Is that right?

I looked through https://wiki.openjdk.java.net/display/HotSpot/StyleGui
de and the rule is to "follow an existing pattern and must have a
distinct appearance from other names". Which does not help a lot I
guess :/ The GC team started using upper camel case, e.g.
SomeOtherConstant, but very likely this is probably not applied
consistently throughout. So I am fine with not adding another style
(like kMaxStackDepth with the "k" in front with some unknown meaning)
is fine.

(Chances are you will find that style somewhere used anyway too,
apologies if so :/)

> PS: I've also inlined my answers to Thomas below:
>
> On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl <thomas.schatzl@oracl
> e.com> wrote:
> > Hi all,
> >
> > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
> > > Dear all,
> > >
> > > I've continued working on this and have done the following
> > webrev:
> > > http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
> >
> > [...]
> > > Things I still need to do:
> > >    - Have to fix that TLAB case for the FastTLABRefill
> > >    - Have to start looking at the data to see that it is
> > consistent and does gather the right samples, right frequency, etc.
> > >    - Have to check the GC elements and what that produces
> > >    - Run a slowdebug run and ensure I fixed all those issues you
> > saw > Robbin
> > >
> > > Thanks for looking at the webrev and have a great week!
> >
> >   scratching a bit on the surface of this change, so apologies for
> > rather shallow comments:
> >
> > - macroAssembler_x86.cpp:5604: while this is compiler code, and I
> > am not sure this is final, please avoid littering the code with
> > TODO remarks :) They tend to be candidates for later wtf moments
> > only.
> >
> > Just file a CR for that.
> >
> Newcomer question: what is a CR and not sure I have the rights to do
> that yet ? :)

Apologies. CR is a change request, this suggests to file a bug in the
bug tracker. And you are right, you can't just create a new account in
the OpenJDK JIRA yourselves. :(

I was mostly referring to the "... but it is a TODO" part of that
comment in macroassembler_x86.cpp. Comments about the why of the code
are appreciated.

[Note that I now understand that this is to some degree still work in
progress. As long as the final changeset does no contain TODO's I am
fine (and it's not a hard objection, rather their use in "final" code
is typically limited in my experience)]

5603   // Currently, if this happens, just set back the actual end to 
where it was.
5604   // We miss a chance to sample here.

Would be okay, if explaining "this" and the "why" of missing a chance
to sample here would be best.

Like maybe:

// If we needed to refill TLABs, just set the actual end point to 
// the end of the TLAB again. We do not sample here although we could.

I am not sure whether "miss a chance to sample" meant "we could, but
consciously don't because it's not that useful" or "it would be
necessary but don't because it's too complicated to do.".

Looking at the original comment once more, I am also not sure if that
comment shouldn't referring to the "end" variable (not actual_end)
because that's the variable that is responsible for taking the sampling
path? (Going from the member description of ThreadLocalAllocBuffer).

But maybe I am only confused and it's best to just leave the comment
away. :)

Thinking about it some more, doesn't this not-sampling in this case
mean that sampling does not work in any collector that does inline TLAB
allocation at the moment? (Or is inline TLAB alloc automatically
disabled with sampling somehow?)

That would indeed be a bigger TODO then :)

> > - calling HeapMonitoring::do_weak_oops() (which should probably be
> > called weak_oops_do() like other similar methods) only if string
> > deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.
>
> The call should be at least around 6 lines up outside the if.
>
> Preferentially in a method like process_weak_jni_handles(), including
> additional logging. (No new (G1) gc phase without minimal logging
> :)).
> Done but really not sure because:
>
> I put for logging:
>   log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap
> monitoring");

I would think that "gc, ref" would be more appropriate log tags for
this similar to jni handles.
(I am als not sure what weak reference handling has to do with
G1ConcRegionFreeing, so I am a bit puzzled)

> Since weak_jni_handles didn't have logging for me to be inspired
> from, I did that but unconvinced this is what should be done.

The JNI handle processing does have logging, but only in
ReferenceProcessor::process_discovered_references(). In
process_weak_jni_handles() only overall time is measured (in a G1
specific way, since only G1 supports disabling reference procesing) :/

The code in ReferenceProcessor prints both time taken
referenceProcessor.cpp:254, as well as the count, but strangely only in
debug VMs.

I have no idea why this logging is that unimportant to only print that
in a debug VM. However there are reviews out for changing this area a
bit, so it might be useful to wait for that (JDK-8173335).

> > - the change doubles the size of
> > CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
> > threshold. Maybe it could be refactored a bit.
> Done I think, it looks better to me :).

In ThreadLocalAllocBuffer::handle_sample() I think the
set_back_actual_end()/pick_next_sample() calls could be hoisted out of
the "if" :)

> > - referenceProcessor.cpp:261: the change should add logging about
> > the number of references encountered, maybe after the corresponding
> > "JNI weak reference count" log message.
> Just to double check, are you saying that you'd like to have the heap
> sampler to keep in store how many sampled objects were encountered in
> the HeapMonitoring::weak_oops_do?
>    - Would a return of the method with the number of handled
> references and logging that work?

Yes, it's fine if HeapMonitoring::weak_oops_do() only returned the
number of processed weak oops.

>    - Additionally, would you prefer it in a separate block with its
> GCTraceTime?

Yes. Both kinds of information is interesting: while the time taken is
typically more important, the next question would be why, and the
number of references typically goes a long way there.

See above though, it is probably best to wait a bit.

> > - threadLocalAllocBuffer.cpp:331: one more "TODO"
> Removed it and added it to my personal todos to look at.
>  
> >
> > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
> > documentation should be updated about the sampling additions. I
> > would have no clue what the difference between "actual_end" and
> > "end" would be from the given information.
> If you are talking about the comments in this file, I made them more
> clear I hope in the new webrev. If it was somewhere else, let me know
> where to change.

Thanks, that's much better. Maybe a note in the comment of the class
that ThreadLocalBuffer provides some sampling facility by modifying the
end() of the TLAB to cause "frequent" calls into the runtime call where
actual sampling takes place.

> > - in heapMonitoring.hpp: there are some random comments about some
> > code that has been grabbed from "util/math/fastmath.[h|cc]". I
> > can't tell whether this is code that can be used but I assume that
> > Noam Shazeer is okay with that (i.e. that's all Google code).
> Jeremy and I double checked and we can release that as I thought. I
> removed the comment from that piece of code entirely.

Thanks.

> > - heapMonitoring.hpp/cpp static constant naming does not correspond
> > to Hotspot's. Additionally, in Hotspot static methods are cased
> > like other methods.
> I think I fixed the methods to be cased the same way as all other
> methods. For static constants, I was not sure. I fixed a few other
> variables but I could not seem to really see a consistent trend for
> constants. I made them as variables but I'm not sure now.

Sorry again, style is a kind of mess. The goal of my suggestions here
is only to prevent yet another style creeping in.
 
> > - in heapMonitoring.cpp there are a few cryptic comments at the top
> > that seem to refer to internal stuff that should probably be
> > removed.
> Sorry about that! My personal todos not cleared out.

I am happy about comments, but I simply did not understand any of that
and I do not know about other readers as well.

If you think you will remember removing/updating them until the review
proper (I misunderstood the review situation a little it seems).

> > I did not think through the impact of the TLAB changes on collector
> > behavior yet (if there are). Also I did not check for problems with
> > concurrent mark and SATB/G1 (if there are).
> I would love to know your thoughts on this, I think this is fine. I

I think so too now. No objects are made live out of thin air :)

> see issues with multiple threads right now hitting the stack storage
> instance. Previous webrevs had a mutex lock here but we took it out
> for simplificity (and only for now).

:) When looking at this after some thinking I now assume for this
review that this code is not MT safe at all. There seems to be more
synchronization missing than just the one for the StackTraceStorage. So
no comments about this here.

Also, this would require some kind of specification of what is allowed
to be called when and where.

One potentially relevant observation about locking here: depending on
sampling frequency, StackTraceStore::add_trace() may be rather
frequently called. I assume that you are going to do measurements :)

I am not sure what the expected usage of the API is, but
StackTraceStore::add_trace() seems to be able to grow without bounds.
Only a GC truncates them to the live ones. That in itself seems to be
problematic (GCs can be *wide* apart), and of course some of the API
methods add to that because they duplicate that unbounded array. Do you
have any concerns/measurements about this?

Also, these stack traces might hold on to huge arrays. Any
consideration of that? Particularly it might be the cause for OOMEs in
tight memory situations.

- please consider adding a safepoint check in
HeapMonitoring::weak_oops_do to prevent accidental misuse.

- in struct StackTraceStorage, the public fields may also need
underscores. At least some files in the runtime directory have structs
with underscored public members (and some don't). The runtime team
should probably comment on that.

- In StackTraceStorage::weak_oops_do(), when examining the
StackTraceData, maybe it is useful to consider having a non-NULL
reference outside of the heap's reserved space an error. There should
be no oop outside of the heap's reserved space ever.

Unless you allow storing random values in StackTraceData::obj, which I
would not encourage.

- HeapMonitoring::weak_oops_do() does not seem to use the
passed AbstractRefProcTaskExecutor.

- I do not understand allowing to call this method with a NULL
complete_gc closure. This would mean that objects referenced from the
object that is referenced by the StackTraceData are not pulled, meaning
they would get stale.

- same with is_alive parameter value of NULL

- heapMonitoring.cpp:590: I do not completely understand the purpose of
this code: in the end this results in a fixed value directly dependent
on the Thread address anyway? In the end this results in a fixed value
directly dependent on the Thread address anyway?
IOW, what is special about exactly 20 rounds?

- also I would consider stripping a few bits of the threads' address as
initialization value for your rng. The last three bits (and probably
more, check whether the Thread object is allocated on special
boundaries) are always zero for them.
Not sure if the given "random" value is random enough before/after,
this method, so just skip that comment if you think this is not
required.

Some more random nits I did not find a place to put anywhere:

- ThreadLocalAllocBuffer::_extra_space does not seem to be used
anywhere?

- Maybe indent the declaration of ThreadLocalAllocBuffer::_bytes_until_sample to align below the other members of that group.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

JC Beyler
In reply to this post by JC Beyler
Hi Serguei and all,

Thanks Serguei for your review. I believe I have not answered many questions but raised more but I hope this helps understand what I am trying to do.

I have inlined my answers to make it easier to answer everything (and to be honest, ask questions).

@Thomas: Thank you for your thorough answer as well, I'll be working on it and will answer when I get the fixes and comments in a handled state, most likely beginning of next week :)

On Fri, Jun 23, 2017 at 3:01 AM, [hidden email] <[hidden email]> wrote:
Hi JC,

Some initial JVMTI-specific questions/comments.

I was not able to apply your patch as the are many merge conflicts.
Could you, please, re-base it on the latest JDK 10 sources?

This is weird, I did a pull right now and it worked. I hesitated I was using hg correctly so I just did:
bash ./get_source.sh
patch -p1 < hotspot.patch

(Maybe the patch is not the right way of doing things, there might be a hg equivalent?).

And it seemed to work:
bash ./configure --with-boot-jdk=/path/to/jdk1.8.0-latest --with-debug-level=slowdebug
make all images JOBS=48

Any idea why it is working for me and not you? I am assuming I am doing something wrong but I don't know what!
 

I think, it would make sense to introduce new optional capability, something like:
   can_sample_heap_allocations

I will work on adding that for the next webrev, Robbin had mentioned I should and I dropped the ball on that one.
 

Do you consider this API to be used by a single agent or it is a multi-agent feature?
What if one agent calls the StartHeapSampling() but another calls one of the others.
The API specs needs to clarify this.

I'm not sure you had a chance to carefully think on what JVMTI phases are allowed for new functions.

- I am pretty sure I have NOT really thought about this as carefully as you have because this is the first time I am doing this and these questions already make me go "hmmm" :)

I have a tendency to work through issues and try to keep the design simple first. Your question seems to imply I can make this work for a single agent world and have a means to explicitly disable it for multi-agent for now.

I looked quickly in the prims folder for anything mentioning single agent vs multi agent but did not find anything. Does this get handled s this more a documentation thing and tough luck for the user.


seems to say the latter. That the documentation should say that this is not a multiple agent support (yet?) and there is nothing we can do. Is that right?

 


http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/src/share/vm/runtime/heapMonitoring.cpp.html

I do not see the variable HeapMonitoring::_enabled is checked in the functions:
    HeapMonitoring::get_live_traces(),
    HeapMonitoring::get_garbage_traces(),
    HeapMonitoring::get_frequent_garbage_traces(),
    HeapMonitoring::release_traces()


So it is not checked there because the API allows you to:
  - Start sampling (which flips the enabled variable)
  - Stop sampling (which flips back the enabled variable)
  - Get the sampled traces at that point

I think the API documentation is not clear for this right now, so I'll change it for the next webrev. Let me also know if you think this is a bad idea. I thought it would be useful because it would allow you to start/stop the sampling and then later on go back and see what was sampled at a later time.
 
A question about a multi-agent feature from above applies here as well.

I'm open to either supporting multi-agent if you think it is important or doing whatever can be done to, as a first step, make it a single agent support. Let me know how to do that.
 


Thanks,
Serguei

Thank you!
Jc
 




On 6/21/17 13:45, JC Beyler wrote:
Hi all,

First off: Thanks again to Robbin and Thomas for their reviews :)

Next, I've uploaded a new webrev:

Here is an update:

- @Robbin, I forgot to say that yes I need to look at implementing this for the other architectures and testing it before it is all ready to go. Is it common to have it working on all possible combinations or is there a subset that I should be doing first and we can do the others later?
- I've tested slowdebug, built and ran the JTreg tests I wrote with slowdebug and fixed a few more issues
- I've refactored a bit of the code following Thomas' comments
   - I think I've handled all the comments from Thomas (I put comments inline below for the specifics)

The biggest addition to this webrev is that there is more testing, I've added two tests for looking specifically at the two garbage sampling data Recent vs Frequent:
   and

   - Side question: I was looking at trying to make this a multi-file library so you would not have all the code in one spot. It seems this is not really done?
      - Is there a solution to this? What I really wanted was:
        - The core library that will help testing be easier
        - The JNI code for each Java class separate and calling into the core library

- Following Thomas' comments on statistics, I want to add some quality assurance tests and find that the easiest way would be to have a few counters of what is happening in the sampler and expose that to the user.
   - I'll be adding that in the next version if no one sees any objections to that.
   - This will allow me to add a sanity test in JTreg about number of samples and average of sampling rate

@Thomas: I had a few questions that I inlined below but I will summarize the "bigger ones" here:
   - You mentioned constants are not using the right conventions, I looked around and didn't see any convention except normal naming then for static constants. Is that right?
   - You mentioned putting the weak_oops_do in a separate method and logging, I inlined my answer below and would appreciate your feedback there too.

Thanks again for your reviews and patience!
Jc

PS: I've also inlined my answers to Thomas below:

On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl <[hidden email]> wrote:
Hi all,

On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
> Dear all,
>
> I've continued working on this and have done the following webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

[...]
> Things I still need to do:
>    - Have to fix that TLAB case for the FastTLABRefill
>    - Have to start looking at the data to see that it is consistent
> and does gather the right samples, right frequency, etc.
>    - Have to check the GC elements and what that produces
>    - Run a slowdebug run and ensure I fixed all those issues you saw
> Robbin
>
> Thanks for looking at the webrev and have a great week!

  scratching a bit on the surface of this change, so apologies for
rather shallow comments:

- macroAssembler_x86.cpp:5604: while this is compiler code, and I am
not sure this is final, please avoid littering the code with TODO
remarks :) They tend to be candidates for later wtf moments only.

Just file a CR for that.


Newcomer question: what is a CR and not sure I have the rights to do that yet ? :)
 
- calling HeapMonitoring::do_weak_oops() (which should probably be
called weak_oops_do() like other similar methods) only if string
deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.

The call should be at least around 6 lines up outside the if.

Preferentially in a method like process_weak_jni_handles(), including
additional logging. (No new (G1) gc phase without minimal logging :)).

Done but really not sure because:

I put for logging:
  log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap monitoring");

Since weak_jni_handles didn't have logging for me to be inspired from, I did that but unconvinced this is what should be done.
 

Seeing the changes in referenceProcess.cpp, you need to add the call to
HeapMonitoring::do_weak_oops() exactly similar to
process_weak_jni_handles() in case there is no reference processing
done.

- psParallelCompact.cpp:2172 similar to other places where the change
adds the call to HeapMonitoring::do_weak_oops(), remove the empty line.

Done from what I can tell in the whole webrev. (The only empty lines I still see are when I maintain an empty line, exception being http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/src/share/vm/gc/shared/collectedHeap.cpp.patch where I think it was "nicer")
 

- the change doubles the size of
CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
threshold. Maybe it could be refactored a bit.

Done I think, it looks better to me :).
 

- referenceProcessor.cpp:40: please make sure that the includes are
sorted alphabetically (I did not check the other files yet).

Done and checked all other files normally.
 

- referenceProcessor.cpp:261: the change should add logging about the
number of references encountered, maybe after the corresponding "JNI
weak reference count" log message.

Just to double check, are you saying that you'd like to have the heap sampler to keep in store how many sampled objects were encountered in the HeapMonitoring::weak_oops_do?
   - Would a return of the method with the number of handled references and logging that work?
   - Additionally, would you prefer it in a separate block with its GCTraceTime?


- threadLocalAllocBuffer.cpp:331: one more "TODO"

Removed it and added it to my personal todos to look at.
 

- threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
documentation should be updated about the sampling additions. I would
have no clue what the difference between "actual_end" and "end" would
be from the given information.

If you are talking about the comments in this file, I made them more clear I hope in the new webrev. If it was somewhere else, let me know where to change.
 

- threadLocalAllocBuffer.hpp:130: extra whitespace ;)

Fixed :)
 

- some copyrights :)


I think I fixed all the issues, if you see specific issues, let me know.
 
- in heapMonitoring.hpp: there are some random comments about some code
that has been grabbed from "util/math/fastmath.[h|cc]". I can't tell
whether this is code that can be used but I assume that Noam Shazeer is
okay with that (i.e. that's all Google code).

Jeremy and I double checked and we can release that as I thought. I removed the comment from that piece of code entirely.
 

- heapMonitoring.hpp/cpp static constant naming does not correspond to
Hotspot's. Additionally, in Hotspot static methods are cased like other
methods.

I think I fixed the methods to be cased the same way as all other methods. For static constants, I was not sure. I fixed a few other variables but I could not seem to really see a consistent trend for constants. I made them as variables but I'm not sure now.

 

- in heapMonitoring.cpp there are a few cryptic comments at the top
that seem to refer to internal stuff that should probably be removed.

Sorry about that! My personal todos not cleared out.
 

I did not think through the impact of the TLAB changes on collector
behavior yet (if there are). Also I did not check for problems with
concurrent mark and SATB/G1 (if there are).

I would love to know your thoughts on this, I think this is fine. I see issues with multiple threads right now hitting the stack storage instance. Previous webrevs had a mutex lock here but we took it out for simplificity (and only for now).
 

Thanks,
  Thomas




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Low-Overhead Heap Profiling

Jeremy Manson-4
In reply to this post by Thomas Schatzl


On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <[hidden email]> wrote:

I looked through https://wiki.openjdk.java.net/display/HotSpot/StyleGui
de and the rule is to "follow an existing pattern and must have a
distinct appearance from other names". Which does not help a lot I
guess :/ The GC team started using upper camel case, e.g.
SomeOtherConstant, but very likely this is probably not applied
consistently throughout. So I am fine with not adding another style
(like kMaxStackDepth with the "k" in front with some unknown meaning)
is fine.


In case you're curious: the k- prefix for constants is found in Hungarian notation.  It's used as part of Google C++ style to indicate global (for some definition) variables declared constexpr or const.  See https://google.github.io/styleguide/cppguide.html#Constant_Names.

Generally, in our patches, we've ping-ponged on use of Hotspot style versus Google style.  We tend to use Hotspot if it is obvious to us what that is, but Google otherwise.

Obviously, it should match Hotspot style if it is being submitted, assuming we can find out what Hotspot style is. :)

Jeremy
12
Loading...