RFR: 8229517: Support for optional asynchronous/buffered logging

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
This patch provides a buffer to store asynchrounous messages and flush them to
underlying files periodically.

-------------

Commit messages:
 - 8229517: Support for optional asynchronous/buffered logging
 - 8229517: Support for optional asynchronous/buffered logging
 - 8229517: Support for optional asynchronous/buffered logging
 - 8229517: Support for optional asynchronous/buffered logging

Changes: https://git.openjdk.java.net/jdk/pull/3135/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3135&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8229517
  Stats: 523 lines in 12 files changed: 508 ins; 5 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3135.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3135/head:pull/3135

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <[hidden email]> wrote:

> This patch provides a buffer to store asynchrounous messages and flush them to
> underlying files periodically.

Hi, Reviewers,

I would like to restart the RFR process for the feature async logging.  We (AWS) have deployed this feature over a year in a few critical services. It helps us to reduce long-tail GC pauses. On Linux, we used to experience intermittent second-level delays due to gclog writings. If those undesirable events happen to appear at safepoints, hotspot has to prolong the pause intervals, which then increase the response time of Java application/service.

Originally, we observed and solved this issue on a Linux system with software RAID. In absence of hardware assistance, multiple writes have to be synchronized and it is that operation that yields long pauses. This issue may become more prevalent if Linux servers adopt ZFS in the future. We don’t think redirecting log files to tmpfs is a final solution. Hotspot should provide a self-contained and cross-platform solution. **Our solution is to provide a buffer and flush it in a standalone thread periodically.**

Since then, we found more unexpected but interesting scenarios. e.g. some cloud-based applications run entirely on a AWS EBS partition. syscall `write` could be a blocking operation if the underlying infrastructure is experiencing an intermittent issue. Even stdout/stderr are not absolutely non-blocking. Someone may send `XOFF` of software flow control and pause the tty to read. As a result, the JVM process which is emitting logs to the tty is blocked then. Yes, that action may freeze the Java service accidentally!

Those pain points are not AWS-exclusive. We found relevant questions on stackoverflow[1] and it seems that J9 provides an option `-Xgc:bufferedLogging` to mitigate it[2]. We hope hotspot would consider our feature.

Back to implementation, this is the 2nd revision based on Unified logging. Previous RFR[3] was a top-down design. We provide a parallel header file `asynclog.hpp` and hope log-sites opt in. That design is very stiff because asynclog.hpp is full of template parameters and was turned down[4]. The new patch has deprecated the old design and achieved asynchronous logging in bottom-up way. We provide an output-option which conforms to JEP-158[5]. Developers can choose asynchronous mode for a file-based output by providing an extra option **async=true**. e.g. `-Xlog:gc=debug:file=gc.log::async=true`

May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further,  we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.

[1] https://stackoverflow.com/questions/27072340/is-gc-log-writing-asynchronous-safe-to-put-gc-log-on-nfs-mount
[2] https://stackoverflow.com/questions/54994943/is-openj9-gc-log-asynchronous
[3] https://cr.openjdk.java.net/~xliu/8229517/01/webrev/
[4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-March/041034.html
[5] https://openjdk.java.net/jeps/158

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Yasumasa Suenaga-7
In reply to this post by Xin Liu
On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <[hidden email]> wrote:

> This patch provides a buffer to store asynchrounous messages and flush them to
> underlying files periodically.

I think this PR is very useful for us!

> May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further, we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.

`LogMessageBuffer` is used in `LogMessage`. For example, we can see it as following. Frame # 1 is `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)`. IMHO we do not need to change LogStdout/errOutput, but it is better to change LogMessageBuffer.

#0  LogFileStreamOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
    at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logDecorators.hpp:108
#1  0x00007ffff6e80e8e in LogFileOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
    at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logFileOutput.cpp:314
#2  0x00007ffff6e876eb in LogTagSet::log (
    this=this@entry=0x7ffff7d4a640 <LogTagSetMapping<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::_tagset>, msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.cpp:85
#3  0x00007ffff6a194df in LogImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::write (
    msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.hpp:150
#4  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::flush (
    this=0x7ffff58675d0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:79
#5  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::~LogMessageImpl (
    this=0x7ffff58675d0, __in_chrg=<optimized out>) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:74
#6  InstanceKlass::print_class_load_logging (this=this@entry=0x800007430, loader_data=loader_data@entry=0x7ffff00f5200,
    module_entry=module_entry@entry=0x0, cfs=cfs@entry=0x0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/oops/instanceKlass.cpp:3647

src/hotspot/share/runtime/globals.hpp line 2033:

> 2031:           "Milliseconds between asynchronous log flushing")                 \
> 2032:                                                                             \
> 2033:   product(bool, AsyncLogging, false,                                        \

I think this option is not needed - `async` should be set to `false` by default, and we should control it through `-Xlog` option like other log output options (e.g. `filecount`).

src/hotspot/share/runtime/globals.hpp line 2036:

> 2034:           "Enble asynchronous GC logging")                                  \
> 2035:                                                                             \
> 2036:   product(size_t, GCLogBufferSize, 2*K,                                     \

This PR is for UL, not only GC log. So it should be renamed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
On Thu, 25 Mar 2021 12:42:34 GMT, Yasumasa Suenaga <[hidden email]> wrote:

> I think this PR is very useful for us!
>
> > May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further, we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.
>
> `LogMessageBuffer` is used in `LogMessage`. For example, we can see it as following. Frame # 1 is `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)`. IMHO we do not need to change LogStdout/errOutput, but it is better to change LogMessageBuffer.
>
> ```
> #0  LogFileStreamOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logDecorators.hpp:108
> #1  0x00007ffff6e80e8e in LogFileOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logFileOutput.cpp:314
> #2  0x00007ffff6e876eb in LogTagSet::log (
>     this=this@entry=0x7ffff7d4a640 <LogTagSetMapping<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::_tagset>, msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.cpp:85
> #3  0x00007ffff6a194df in LogImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::write (
>     msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.hpp:150
> #4  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::flush (
>     this=0x7ffff58675d0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:79
> #5  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::~LogMessageImpl (
>     this=0x7ffff58675d0, __in_chrg=<optimized out>) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:74
> #6  InstanceKlass::print_class_load_logging (this=this@entry=0x800007430, loader_data=loader_data@entry=0x7ffff00f5200,
>     module_entry=module_entry@entry=0x0, cfs=cfs@entry=0x0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/oops/instanceKlass.cpp:3647
> ```

hi, @YaSuenag,

Thank you for providing the stacktrace! I didn't notice <logMessage.hpp> until you point out. Now I understand the rationale and usecases of logMessageBuffer. Let me figure out how to support it.

IIUC, the most important attribute of `LogMessage` is to guarantee messages are consecutive, or free from interleaving. I will focus on it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
In reply to this post by Yasumasa Suenaga-7
On Thu, 25 Mar 2021 12:21:41 GMT, Yasumasa Suenaga <[hidden email]> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> src/hotspot/share/runtime/globals.hpp line 2033:
>
>> 2031:           "Milliseconds between asynchronous log flushing")                 \
>> 2032:                                                                             \
>> 2033:   product(bool, AsyncLogging, false,                                        \
>
> I think this option is not needed - `async` should be set to `false` by default, and we should control it through `-Xlog` option like other log output options (e.g. `filecount`).

It's possible that a Java process have multiple file-based outputs. A global option `AsyncLogging` can set them all Otherwise, developers have to set async=true individually. It's part of CSR, right?

> src/hotspot/share/runtime/globals.hpp line 2036:
>
>> 2034:           "Enble asynchronous GC logging")                                  \
>> 2035:                                                                             \
>> 2036:   product(size_t, GCLogBufferSize, 2*K,                                     \
>
> This PR is for UL, not only GC log. So it should be renamed.

ack. I will rename it AsyncLogBufferSize.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Thomas Stuefe
In reply to this post by Xin Liu
On Thu, 25 Mar 2021 18:57:32 GMT, Xin Liu <[hidden email]> wrote:

>> I think this PR is very useful for us!
>>
>>> May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further, we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.
>>
>> `LogMessageBuffer` is used in `LogMessage`. For example, we can see it as following. Frame # 1 is `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)`. IMHO we do not need to change LogStdout/errOutput, but it is better to change LogMessageBuffer.
>>
>> #0  LogFileStreamOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logDecorators.hpp:108
>> #1  0x00007ffff6e80e8e in LogFileOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logFileOutput.cpp:314
>> #2  0x00007ffff6e876eb in LogTagSet::log (
>>     this=this@entry=0x7ffff7d4a640 <LogTagSetMapping<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::_tagset>, msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.cpp:85
>> #3  0x00007ffff6a194df in LogImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::write (
>>     msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.hpp:150
>> #4  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::flush (
>>     this=0x7ffff58675d0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:79
>> #5  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::~LogMessageImpl (
>>     this=0x7ffff58675d0, __in_chrg=<optimized out>) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:74
>> #6  InstanceKlass::print_class_load_logging (this=this@entry=0x800007430, loader_data=loader_data@entry=0x7ffff00f5200,
>>     module_entry=module_entry@entry=0x0, cfs=cfs@entry=0x0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/oops/instanceKlass.cpp:3647
>
>> I think this PR is very useful for us!
>>
>> > May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further, we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.
>>
>> `LogMessageBuffer` is used in `LogMessage`. For example, we can see it as following. Frame # 1 is `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)`. IMHO we do not need to change LogStdout/errOutput, but it is better to change LogMessageBuffer.
>>
>> ```
>> #0  LogFileStreamOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logDecorators.hpp:108
>> #1  0x00007ffff6e80e8e in LogFileOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logFileOutput.cpp:314
>> #2  0x00007ffff6e876eb in LogTagSet::log (
>>     this=this@entry=0x7ffff7d4a640 <LogTagSetMapping<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::_tagset>, msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.cpp:85
>> #3  0x00007ffff6a194df in LogImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::write (
>>     msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.hpp:150
>> #4  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::flush (
>>     this=0x7ffff58675d0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:79
>> #5  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::~LogMessageImpl (
>>     this=0x7ffff58675d0, __in_chrg=<optimized out>) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:74
>> #6  InstanceKlass::print_class_load_logging (this=this@entry=0x800007430, loader_data=loader_data@entry=0x7ffff00f5200,
>>     module_entry=module_entry@entry=0x0, cfs=cfs@entry=0x0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/oops/instanceKlass.cpp:3647
>> ```
>
> hi, @YaSuenag,
>
> Thank you for providing the stacktrace! I didn't notice <logMessage.hpp> until you point out. Now I understand the rationale and usecases of logMessageBuffer. Let me figure out how to support it.
>
> IIUC, the most important attribute of `LogMessage` is to guarantee messages are consecutive, or free from interleaving. I will focus on it.

Hi Xin,

I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.

- Who does the writing, and who is affected when the writing stalls?
- Do you then block or throw output away?
        - If the former, how do you mitigate the ripple effect?
        - If the latter, how does the reader of the log file know that something is missing?
- How often do you flush? How do you prevent missing output in the log file in case of crashes?
- Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
- Does this work with multiple target and multiple IO files?
- Does it cost anything if logging is off or not async?

Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.

I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.

- How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.

---

I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Thomas Stuefe
On Thu, 25 Mar 2021 20:11:46 GMT, Thomas Stuefe <[hidden email]> wrote:

>>> I think this PR is very useful for us!
>>>
>>> > May we know more about LogMessageBuffer.hpp/cpp? We haven’t found a real use of it. That’s why we are hesitating to support LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator). Further, we haven’t supported async_mode for LogStdoutOutput and LogStderrOutput either. It’s not difficult but needs to big code change.
>>>
>>> `LogMessageBuffer` is used in `LogMessage`. For example, we can see it as following. Frame # 1 is `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)`. IMHO we do not need to change LogStdout/errOutput, but it is better to change LogMessageBuffer.
>>>
>>> ```
>>> #0  LogFileStreamOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logDecorators.hpp:108
>>> #1  0x00007ffff6e80e8e in LogFileOutput::write (this=this@entry=0x7ffff0002af0, msg_iterator=...)
>>>     at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logFileOutput.cpp:314
>>> #2  0x00007ffff6e876eb in LogTagSet::log (
>>>     this=this@entry=0x7ffff7d4a640 <LogTagSetMapping<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::_tagset>, msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.cpp:85
>>> #3  0x00007ffff6a194df in LogImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::write (
>>>     msg=...) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logTagSet.hpp:150
>>> #4  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::flush (
>>>     this=0x7ffff58675d0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:79
>>> #5  LogMessageImpl<(LogTag::type)16, (LogTag::type)68, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0, (LogTag::type)0>::~LogMessageImpl (
>>>     this=0x7ffff58675d0, __in_chrg=<optimized out>) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/logging/logMessage.hpp:74
>>> #6  InstanceKlass::print_class_load_logging (this=this@entry=0x800007430, loader_data=loader_data@entry=0x7ffff00f5200,
>>>     module_entry=module_entry@entry=0x0, cfs=cfs@entry=0x0) at /home/ysuenaga/github-forked/jdk/src/hotspot/share/oops/instanceKlass.cpp:3647
>>> ```
>>
>> hi, @YaSuenag,
>>
>> Thank you for providing the stacktrace! I didn't notice <logMessage.hpp> until you point out. Now I understand the rationale and usecases of logMessageBuffer. Let me figure out how to support it.
>>
>> IIUC, the most important attribute of `LogMessage` is to guarantee messages are consecutive, or free from interleaving. I will focus on it.
>
> Hi Xin,
>
> I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
>
> - Who does the writing, and who is affected when the writing stalls?
> - Do you then block or throw output away?
> - If the former, how do you mitigate the ripple effect?
> - If the latter, how does the reader of the log file know that something is missing?
> - How often do you flush? How do you prevent missing output in the log file in case of crashes?
> - Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
> - Does this work with multiple target and multiple IO files?
> - Does it cost anything if logging is off or not async?
>
> Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
>
> I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
>
> - How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.
>
> ---
>
> I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
>
> Cheers, Thomas

p.s. I like the integration into UL via a target modification btw. That feels very organic.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Yasumasa Suenaga-7
In reply to this post by Xin Liu
On Thu, 25 Mar 2021 19:00:49 GMT, Xin Liu <[hidden email]> wrote:

>> src/hotspot/share/runtime/globals.hpp line 2033:
>>
>>> 2031:           "Milliseconds between asynchronous log flushing")                 \
>>> 2032:                                                                             \
>>> 2033:   product(bool, AsyncLogging, false,                                        \
>>
>> I think this option is not needed - `async` should be set to `false` by default, and we should control it through `-Xlog` option like other log output options (e.g. `filecount`).
>
> It's possible that a Java process have multiple file-based outputs. A global option `AsyncLogging` can set them all Otherwise, developers have to set async=true individually. It's part of CSR, right?

Yes, it's part of CSR.

I think it is prefer to set `async` to false by default because it should be treated same with `file` / `filecount` / `filesize` on the logger. The user should add `async=true` to the logger what the user want to set to.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

tstuefe
Can you please link the CSR to the issue?

On Fri, Mar 26, 2021 at 8:21 AM Yasumasa Suenaga <[hidden email]>
wrote:

> On Thu, 25 Mar 2021 19:00:49 GMT, Xin Liu <[hidden email]> wrote:
>
> >> src/hotspot/share/runtime/globals.hpp line 2033:
> >>
> >>> 2031:           "Milliseconds between asynchronous log flushing")
>            \
> >>> 2032:
>            \
> >>> 2033:   product(bool, AsyncLogging, false,
>             \
> >>
> >> I think this option is not needed - `async` should be set to `false` by
> default, and we should control it through `-Xlog` option like other log
> output options (e.g. `filecount`).
> >
> > It's possible that a Java process have multiple file-based outputs. A
> global option `AsyncLogging` can set them all Otherwise, developers have to
> set async=true individually. It's part of CSR, right?
>
> Yes, it's part of CSR.
>
> I think it is prefer to set `async` to false by default because it should
> be treated same with `file` / `filecount` / `filesize` on the logger. The
> user should add `async=true` to the logger what the user want to set to.
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3135
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Robbin Ehn-2
In reply to this post by Thomas Stuefe
On Thu, 25 Mar 2021 20:19:03 GMT, Thomas Stuefe <[hidden email]> wrote:

>> Hi Xin,
>>
>> I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
>>
>> - Who does the writing, and who is affected when the writing stalls?
>> - Do you then block or throw output away?
>> - If the former, how do you mitigate the ripple effect?
>> - If the latter, how does the reader of the log file know that something is missing?
>> - How often do you flush? How do you prevent missing output in the log file in case of crashes?
>> - Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
>> - Does this work with multiple target and multiple IO files?
>> - Does it cost anything if logging is off or not async?
>>
>> Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
>>
>> I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
>>
>> - How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.
>>
>> ---
>>
>> I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
>>
>> Cheers, Thomas
>
> p.s. I like the integration into UL via a target modification btw. That feels very organic.

Hi,

This is flushed by the watcher thread (non-JavaThread).
Flushing can thus happen during a safepoint and one or more safepoints may have passed between the actual logging and the flushing.
If the VM thread logs it can be delayed while watcher thread does "pop_all()" it seems like.
I suppose pop_all can take a while if you have a couple of thousands of logs messages?

We can also change log-configuration during run-time, e.g. turn on/off logs via jcmd.
Wouldn't it be more natural to flush the async logs-lines before we update the log configuration? (e.g. if you turn off a log via jcmd, we flush the async buffer before)

Thanks

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Volker Simonis-3
In reply to this post by Xin Liu
On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <[hidden email]> wrote:

> This patch provides a buffer to store asynchrounous messages and flush them to
> underlying files periodically.

Hi Xin,

thanks for finally addressing this issue. In general your change looks good. Please find my detailed comments inline.

Best regards,
Volker

src/hotspot/share/logging/logAsyncFlusher.cpp line 38:

> 36:
> 37: #ifndef PRODUCT
> 38: template<> void LinkedListDeque<AsyncLogMessage>::log_drop(AsyncLogMessage* e) {

See comment on `pop_front()`. I'd remove this function entirely and handle this functionality right to `LogAsyncFlusher::enqueue()`.

src/hotspot/share/logging/logAsyncFlusher.cpp line 72:

> 70:
> 71:     if (_buffer.size() >= GCLogBufferSize)  {
> 72:       _buffer.pop_front();

See comment on pop_front(). Instead of relying on `pop-front()` calling `log_drop()` I propose to remove `log_drop()` and implement it's functionality right here where I think it is more appropriate. You can call `buffer.front()` to get a pointer to the `AsyncLogMessage` and easily implement the functionality of `log_drop()` here.

src/hotspot/share/logging/logAsyncFlusher.cpp line 81:

> 79: LogAsyncFlusher* LogAsyncFlusher::_instance = NULL;
> 80:
> 81: void LogAsyncFlusher::initialize() {

I don't think you need this. Please merge the initialization into `LogAsyncFlusher::instance()` (see my comments on your changes in `init.cpp`).

src/hotspot/share/logging/logAsyncFlusher.hpp line 34:

> 32:
> 33: template <typename E>
> 34: class LinkedListDeque : private LinkedListImpl<E, ResourceObj::C_HEAP, mtLogging> {

The name `LinkedListDeque` implies that this is a general purpose Deque implementation which is not true. It's fine to have an implementation for your very specific needs (otherwise it should probably be in its own file under `share/utilities/dequeue.hpp`). But to make this explicitly clear to the reader, can you please rename it to something like `AsyncFlusherDeque` and specify it's semantics in a comment on top of the class. E.g this class doesn't support the usage of the inherited `add()` method because that would break the `size()` functionality.

src/hotspot/share/logging/logAsyncFlusher.hpp line 42:

> 40:   LinkedListDeque() : _tail(NULL), _size(0) {}
> 41:   void push_back(const E& e) {
> 42:     if (!_tail)

I think the convention is to use curly braces even for one-line blocks (as you've done in your other code).

src/hotspot/share/logging/logAsyncFlusher.hpp line 64:

> 62:     if (h != NULL) {
> 63:       --_size;
> 64:       log_drop(h->data());

I'm a little unhappy that some semantics of the Dequeues basic datatype become visible in the implementation of a basic Dequeue method. E.g. "*log dropping*" is not something common for a Dequeue but for the LogAsyncFlusher. I think it would be better to drop the call to `log_drop()` here and implement this functionality right in `LogAsyncFlusher::enqueue()`.

src/hotspot/share/logging/logAsyncFlusher.hpp line 79:

> 77:   }
> 78:
> 79:   void log_drop(E* e) {}

See comment on `pop_front()`. I'd remove this function entirely and handle it right in `LogAsyncFlusher::enqueue()`.

src/hotspot/share/logging/logAsyncFlusher.hpp line 95:

> 93:     : _output(output), _decorators(decorations.get_decorators()),
> 94:     _level(decorations.get_level()), _tagset(decorations.get_logTagSet()) {
> 95:       // allow to fail here, then _message is NULL

Why do you extract and store the `LogDecorators`, `_level` and `_tagset` set separately and re-create  the `LogDecorations` in `AsyncLogMessage::writeback()`? Is it to save memory (because  `LogDecorators` are much smaller than the `LogDecorations`) at the expense of time for recreating?

src/hotspot/share/logging/logAsyncFlusher.hpp line 96:

> 94:     _level(decorations.get_level()), _tagset(decorations.get_logTagSet()) {
> 95:       // allow to fail here, then _message is NULL
> 96:       _message = os::strdup(msg, mtLogging);

If you think `msg` can't be NULL here please add an assertion, otherwise please handle it.

src/hotspot/share/logging/logAsyncFlusher.hpp line 111:

> 109:     o._message = NULL; // transfer the ownership of _message to this
> 110:   }
> 111:

Maybe add an explicit copy assignment operator with a `ShouldNotReachHere` to make sure `AsyncLogMessages` are not assigned unintentionally.

src/hotspot/share/logging/logAsyncFlusher.hpp line 116:

> 114:   bool equals(const AsyncLogMessage& o) const {
> 115:     return (&_output == &o._output) && (_message == o._message || !strcmp(_message, o._message));
> 116:   }

[`strcmp()` is not defined for `NULL`](https://en.cppreference.com/w/cpp/string/byte/strcmp) but you can have `_message == NULL` if you've transferred ownership in the copy constructor.

src/hotspot/share/logging/logAsyncFlusher.hpp line 124:

> 122:
> 123: class LogAsyncFlusher : public PeriodicTask {
> 124:  private:

As far as I know, `PeriodicTask` is designed for short running task. But `LogAsyncFlusher::task()` will now call `AsyncLogMessage::writeback()` which does blocking I/O and can block for quite some time (that's why we have this change in the first place :). How does this affect the other periodic tasks and the `WatcherThread`. What's the worst case scenario if the `WatcherThread` is blocked? Is this any better than before?

src/hotspot/share/logging/logConfiguration.cpp line 544:

> 542:                                     " If set to 0, log rotation is disabled."
> 543:                                     " This will cause existing log files to be overwritten.");
> 544:   out->print_cr("   async=true|false - write asynchronously or not.");

Mention the default here which should be "false".

src/hotspot/share/logging/logDecorations.cpp line 68:

> 66: #undef DECORATOR
> 67:
> 68:   assert(get_decorators() == decorators, "insanity check");

I think this should read "sanity check".

src/hotspot/share/logging/logDecorators.hpp line 89:

> 87:   }
> 88:
> 89:   LogDecorators(const LogDecorators& o) : _decorators(o._decorators) {

Why do you need this new copy constructor?

src/hotspot/share/logging/logDecorators.hpp line 92:

> 90:   }
> 91:
> 92:   LogDecorators& operator=(const LogDecorators& rhs) {

Why do you need this new assignment operator?

src/hotspot/share/logging/logFileOutput.cpp line 50:

> 48:     : LogFileStreamOutput(NULL), _name(os::strdup_check_oom(name, mtLogging)),
> 49:       _file_name(NULL), _archive_name(NULL), _current_file(0),
> 50:       _file_count(DefaultFileCount), _is_default_file_count(true), _async_mode(AsyncLogging), _archive_name_len(0),

See comments on `globals.hpp`. No need for an extra option. Make this `false` by default.

And can you please also add the `_async_mode` to the following log trace in `LogFileOutput::initialize()`:
  log_trace(logging)("Initializing logging to file '%s' (filecount: %u"
                     ", filesize: " SIZE_FORMAT " KiB).",
_file_name, _file_count, _rotate_size / K);

src/hotspot/share/logging/logFileOutput.cpp line 322:

> 320:
> 321:   LogAsyncFlusher* flusher = LogAsyncFlusher::instance();
> 322:   if (_async_mode && flusher != NULL) {

Why you don't check for `flusher == NULL` in `LogAsyncFlusher::instance()` and call `LogAsyncFlusher::initialize()` in case it is NULL. I think it's no difference where the NULL check is and doing it in `LogAsyncFlusher::instance()` will save you from calling `LogAsyncFlusher::initialize()` in `init_globals()`.

Put `LogAsyncFlusher::instance()` into the `if (_async_mode)` block and add an assertion that `flusher != NULL`.

src/hotspot/share/logging/logFileOutput.cpp line 324:

> 322:   if (_async_mode && flusher != NULL) {
> 323:     flusher->enqueue(*this, decorations, msg);
> 324:     return 0;

I think the contract of `LogFileOutput::write()` is not clear. Should this return the number of characters that have been actually written out or the number of characters that have been consumed. For the time beeing this doesn't seem to be a problem though, because the current callers of `LogFileOutput::write()` don't seem to check the return value anyway.

src/hotspot/share/logging/logFileOutput.cpp line 336:

> 334:   }
> 335:
> 336:   assert(!_async_mode, "AsyncLogging is not supported yet");

Can you please explain in which circumstances `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)` will be called and why it is not necessary to support `_async_mode` here?

src/hotspot/share/runtime/init.cpp line 126:

> 124:     return status;
> 125:
> 126:   LogAsyncFlusher::initialize();

I don't think this is required here. See my comment on `LogFileOutput::write()`. Just do the initialization  in `LogAsyncFlusher::instance()` when it is called for the first time (i.e. `LogAsyncFlusher::_instance` is still NULL).

test/hotspot/gtest/logging/test_asynclog.cpp line 168:

> 166:   EXPECT_FALSE(file_contains_substring(TestLogFileName, "log_trace-test")); // trace message is masked out
> 167:   EXPECT_TRUE(file_contains_substring(TestLogFileName, "log_debug-test"));
> 168: }

Should have a newline at the end.

-------------

Changes requested by simonis (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Volker Simonis-3
In reply to this post by Yasumasa Suenaga-7
On Fri, 26 Mar 2021 07:18:34 GMT, Yasumasa Suenaga <[hidden email]> wrote:

>> It's possible that a Java process have multiple file-based outputs. A global option `AsyncLogging` can set them all Otherwise, developers have to set async=true individually. It's part of CSR, right?
>
> Yes, it's part of CSR.
>
> I think it is prefer to set `async` to false by default because it should be treated same with `file` / `filecount` / `filesize` on the logger. The user should add `async=true` to the logger what the user want to set to.

Agreed. No need for an extra option. Handle it like other logging framework defaults (e.g. `DefaultFileCount` or `DefaultFileSize`)

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Volker Simonis-3
In reply to this post by Yasumasa Suenaga-7
On Thu, 25 Mar 2021 12:21:44 GMT, Yasumasa Suenaga <[hidden email]> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> src/hotspot/share/runtime/globals.hpp line 2036:
>
>> 2034:           "Enble asynchronous GC logging")                                  \
>> 2035:                                                                             \
>> 2036:   product(size_t, GCLogBufferSize, 2*K,                                     \
>
> This PR is for UL, not only GC log. So it should be renamed.

Agree, this should be something like "AsyncLoggingBufferSize".

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
In reply to this post by Volker Simonis-3
On Fri, 26 Mar 2021 08:37:41 GMT, Volker Simonis <[hidden email]> wrote:

>> Yes, it's part of CSR.
>>
>> I think it is prefer to set `async` to false by default because it should be treated same with `file` / `filecount` / `filesize` on the logger. The user should add `async=true` to the logger what the user want to set to.
>
> Agreed. No need for an extra option. Handle it like other logging framework defaults (e.g. `DefaultFileCount` or `DefaultFileSize`)
>
> You also have to check that `LogAsyncInterval` is zero modulo 10 otherwise you'll get:
> $ ./images/jdk/bin/java -XX:LogAsyncInterval=13 -Xlog:gc:file=/tmp/class.log::async=true -version
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc:  SuppressErrorAt=/task.cpp:73
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/runtime/task.cpp:73), pid=21168, tid=21169
> #  assert(_interval >= PeriodicTask::min_interval && _interval % PeriodicTask::interval_gran == 0) failed: improper PeriodicTask interval time
> #
> # JRE version:  (17.0) (slowdebug build )
> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 17-internal+0-adhoc.simonisv.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0x1244301]  PeriodicTask::PeriodicTask(unsigned long)+0x83
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /priv/simonisv/output/jdk-dbg/hs_err_pid21168.log
> #
> #
> Aborted

oh, I don't know this constraint.  I will add a constraint check for the option.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
In reply to this post by Volker Simonis-3
On Thu, 25 Mar 2021 19:37:25 GMT, Volker Simonis <[hidden email]> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> src/hotspot/share/logging/logAsyncFlusher.hpp line 64:
>
>> 62:     if (h != NULL) {
>> 63:       --_size;
>> 64:       log_drop(h->data());
>
> I'm a little unhappy that some semantics of the Dequeues basic datatype become visible in the implementation of a basic Dequeue method. E.g. "*log dropping*" is not something common for a Dequeue but for the LogAsyncFlusher. I think it would be better to drop the call to `log_drop()` here and implement this functionality right in `LogAsyncFlusher::enqueue()`.

ACK. I would like to have a general purpose linked-listed deque here.  
I will move the log_drop() logic to LogAsyncFlusher.

> src/hotspot/share/logging/logAsyncFlusher.hpp line 95:
>
>> 93:     : _output(output), _decorators(decorations.get_decorators()),
>> 94:     _level(decorations.get_level()), _tagset(decorations.get_logTagSet()) {
>> 95:       // allow to fail here, then _message is NULL
>
> Why do you extract and store the `LogDecorators`, `_level` and `_tagset` set separately and re-create  the `LogDecorations` in `AsyncLogMessage::writeback()`? Is it to save memory (because  `LogDecorators` are much smaller than the `LogDecorations`) at the expense of time for recreating?

Saving memory is my intension. To keep thing simple, I used to copy objects directly. Then I found `LogDecorations` consists of a char array(256bytes) . This information can be compressed into an uint mask, which is `LogDecorators`.  

Because AsyncLogMessage is the payload, which hotspot stores thousands of them in the buffer, size matters.

> src/hotspot/share/logging/logFileOutput.cpp line 336:
>
>> 334:   }
>> 335:
>> 336:   assert(!_async_mode, "AsyncLogging is not supported yet");
>
> Can you please explain in which circumstances `LogFileOutput::write(LogMessageBuffer::Iterator msg_iterator)` will be called and why it is not necessary to support `_async_mode` here?
>
> Looks like this method is used e.g. when doing a class loading log (`-Xlog:class+load`) and will now result in a hard crash in debug builds which is certainly not appropriate:
> $ ./images/jdk/bin/java -Xlog:class+load:file=/tmp/class.log::async=true -version
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc:  SuppressErrorAt=/logFileOutput.cpp:336
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/logging/logFileOutput.cpp:336), pid=20491, tid=20492
> #  assert(!_async_mode) failed: AsyncLogging is not supported yet
> #
> # JRE version:  (17.0) (slowdebug build )
> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 17-internal+0-adhoc.simonisv.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0xde5ae4]  LogFileOutput::write(LogMessageBuffer::Iterator)+0x3c
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /priv/simonisv/output/jdk-dbg/hs_err_pid20491.log
> #
> #
> Aborted

I overlook it.  Sorry. I didn't notice the logging construct `LogMessage`.
I've  fixed this issue.  I will update it soon.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
In reply to this post by Volker Simonis-3
On Fri, 26 Mar 2021 10:02:37 GMT, Volker Simonis <[hidden email]> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> Hi Xin,
>
> thanks for finally addressing this issue. In general your change looks good. Please find my detailed comments inline.
>
> Best regards,
> Volker

hi, Thomas,

Thank you for reviewing this PR.

> Hi Xin,
>
> I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
>
> * Who does the writing, and who is affected when the writing stalls?

The WatchThread eventually flushes those buffered messages.  if the writing stalls, it blocks periodic tasks.
It blocks long enough,  other periodic tasks are skipped.

> * Do you then block or throw output away?
>  
>   * If the former, how do you mitigate the ripple effect?
>   * If the latter, how does the reader of the log file know that something is missing?

The capacity of buffer is limited, which is `AsyncLogBufferSize` (2k by default).
Actually, logTagSet.cpp limits the maximal length of a vwrite is 512. That means that maximal memory used by this buffer is 1M (=2k * 0.5k).

If the buffer overflows, it starts dropping the heads. this behavior simulates a ringbuffer.
If you enable `-XX:+Verbose`, the dropping message will be printed to the tty console.

I prefer to drop messages than keeping them growing because later may trigger out-of-memory error.

> * How often do you flush? How do you prevent missing output in the log file in case of crashes?

The interval is defined by `LogAsyncInterval` (300ms by default). I insert a statement `async->flusher()` in `ostream_abort()`.

> * Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
to be honest, it can't. I see a lot of dropping messages on console with -XX:+Verbose.

I have tuned parameters that it won't drop messages easily for normal GC activity with info verbosity.
`-Xlog:*=trace` will drop messages indeed, but this is tunable. I have a [stress test](https://github.com/navyxliu/JavaGCworkload/blob/master/runJavaUL.sh) to tweak parameters.

> * Does this work with multiple target and multiple IO files?

Yes, it works if you have multiple outputs. `LogAsyncFlusher` is singleton. one single buffer and one thread serve them all.

> * Does it cost anything if logging is off or not async?
>
so far, LogAsyncFlusher as a periodic task remains active even no output is in async_mode.
it wakes up every `LogAsyncInterval` ms. it's a dummy task because the deque is always empty.  the cost is almost nothing.


> Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
>
> I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
>

You concern is reasonable. I don't understand why there is only one watchThread and up to 10 periodic tasks are crowded in it.
If it's a bottleneck, I plan to improve this infrastructure.  I can make hotspot supports multiple watcher threads and spread periodic tasks among them.  All watcher threads are connected using linked list to manage.  

Can we treat it as a separated task?  for normal usage, I think the delay is quite managed.  Writing thousands of lines to a file usually can be done in sub-ms.

> * How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.

IMHO, logging shouldn't hurt performance a lot. At least, those do impact on performance are not supposed to enable by default. On the other side,  I hope logging messages from other threads avoid from interweaving when I enable them to read.
That leads me to use mutex. That actually improves readability.  

My design target is non-blocking.  pop_all() is an ad-hoc operation which pop up all elements and release the mutex immediately. writeback() does IO without it.

In our real applications, we haven't seen this feature downgrade GC performance yet.
>
> I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
>
> Cheers, Thomas

I believe UL has its own reasons.  In my defense, I don't make UL more complex. I only changed a couple of lines in one of its implementation file(logFileOutput.cpp) and didn't change its interfaces.

I try my best to reuse existing codebase. We can always refactor existing code([JDK-8239066](https://bugs.openjdk.java.net/browse/JDK-8239066), [JDK-8263840](https://bugs.openjdk.java.net/browse/JDK-8263840)), but it's not this PR's purpose.

thanks,
--lx

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
On Fri, 26 Mar 2021 23:22:25 GMT, Xin Liu <[hidden email]> wrote:

>> Hi Xin,
>>
>> thanks for finally addressing this issue. In general your change looks good. Please find my detailed comments inline.
>>
>> Best regards,
>> Volker
>
> hi, Thomas,
>
> Thank you for reviewing this PR.
>
>> Hi Xin,
>>
>> I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
>>
>> * Who does the writing, and who is affected when the writing stalls?
>
> The WatchThread eventually flushes those buffered messages.  if the writing stalls, it blocks periodic tasks.
> It blocks long enough,  other periodic tasks are skipped.
>
>> * Do you then block or throw output away?
>>  
>>   * If the former, how do you mitigate the ripple effect?
>>   * If the latter, how does the reader of the log file know that something is missing?
>
> The capacity of buffer is limited, which is `AsyncLogBufferSize` (2k by default).
> Actually, logTagSet.cpp limits the maximal length of a vwrite is 512. That means that maximal memory used by this buffer is 1M (=2k * 0.5k).
>
> If the buffer overflows, it starts dropping the heads. this behavior simulates a ringbuffer.
> If you enable `-XX:+Verbose`, the dropping message will be printed to the tty console.
>
> I prefer to drop messages than keeping them growing because later may trigger out-of-memory error.
>
>> * How often do you flush? How do you prevent missing output in the log file in case of crashes?
>
> The interval is defined by `LogAsyncInterval` (300ms by default). I insert a statement `async->flusher()` in `ostream_abort()`.
>
>> * Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
> to be honest, it can't. I see a lot of dropping messages on console with -XX:+Verbose.
>
> I have tuned parameters that it won't drop messages easily for normal GC activity with info verbosity.
> `-Xlog:*=trace` will drop messages indeed, but this is tunable. I have a [stress test](https://github.com/navyxliu/JavaGCworkload/blob/master/runJavaUL.sh) to tweak parameters.
>
>> * Does this work with multiple target and multiple IO files?
>
> Yes, it works if you have multiple outputs. `LogAsyncFlusher` is singleton. one single buffer and one thread serve them all.
>
>> * Does it cost anything if logging is off or not async?
>>
> so far, LogAsyncFlusher as a periodic task remains active even no output is in async_mode.
> it wakes up every `LogAsyncInterval` ms. it's a dummy task because the deque is always empty.  the cost is almost nothing.
>
>
>> Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
>>
>> I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
>>
>
> You concern is reasonable. I don't understand why there is only one watchThread and up to 10 periodic tasks are crowded in it.
> If it's a bottleneck, I plan to improve this infrastructure.  I can make hotspot supports multiple watcher threads and spread periodic tasks among them.  All watcher threads are connected using linked list to manage.  
>
> Can we treat it as a separated task?  for normal usage, I think the delay is quite managed.  Writing thousands of lines to a file usually can be done in sub-ms.
>
>> * How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.
>
> IMHO, logging shouldn't hurt performance a lot. At least, those do impact on performance are not supposed to enable by default. On the other side,  I hope logging messages from other threads avoid from interweaving when I enable them to read.
> That leads me to use mutex. That actually improves readability.  
>
> My design target is non-blocking.  pop_all() is an ad-hoc operation which pop up all elements and release the mutex immediately. writeback() does IO without it.
>
> In our real applications, we haven't seen this feature downgrade GC performance yet.
>>
>> I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
>>
>> Cheers, Thomas
>
> I believe UL has its own reasons.  In my defense, I don't make UL more complex. I only changed a couple of lines in one of its implementation file(logFileOutput.cpp) and didn't change its interfaces.
>
> I try my best to reuse existing codebase. We can always refactor existing code([JDK-8239066](https://bugs.openjdk.java.net/browse/JDK-8239066), [JDK-8263840](https://bugs.openjdk.java.net/browse/JDK-8263840)), but it's not this PR's purpose.
>
> thanks,
> --lx

> _Mailing list message from [Thomas St��fe](mailto:[hidden email]) on [hotspot-dev](mailto:[hidden email]):_
>
> Can you please link the CSR to the issue?
>
> On Fri, Mar 26, 2021 at 8:21 AM Yasumasa Suenaga <ysuenaga at openjdk.java.net>
> wrote:

https://bugs.openjdk.java.net/browse/JDK-8264323

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Thomas Stuefe
In reply to this post by Xin Liu
On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <[hidden email]> wrote:

> This patch provides a buffer to store asynchrounous messages and flush them to
> underlying files periodically.

Hi Xin,

thank you for your detailed answers.

As I wrote, I think this is a useful change. A prior design discussion with a rough sketch would have made things easier. Also, it would have been good to have the CSR discussion beforehand, since it affects how complex the implementation needs to be. I don't know whether there had been design discussions beforehand; if I missed them, I apologize.

I am keenly aware that design discussions often lead nowhere because no-one answers. So I understand why you started with a patch.

About your proposal:

I do not think it can be made airtight, and I think that is okay - if we work with a limited flush buffer and we log too much, things will get dropped, that is unavoidable. But it has to be reliable and comprehensible after the fact.

As you write, the patch you propose works well with AWS, but I suspect that is an environment with limited variables, and outside use of the VM could be much more diverse. We must make sure to roll out only well designed solutions which work for us all.

E.g. a log system which randomly omits log entries because some internal buffer is full without giving any indication *in the log itself* is a terrible idea :). Since log files are a cornerstone for our support, I am interested in a good solution.

First off, the CSR:

---
1) With what you propose, we could have a arbitrary combination of targets with different log files and different async options:
java -Xlog:os*:file=os.log::async=false -Xlog:os+thread:file=thread.log::async=true
 

Do we really need that much freedom? How probable is that someone wants different async options for different trace sinks? The more freedom we have here the more complex the implementation gets. All that stuff has to be tested. Why not just make "async" a global setting.

2) AsyncLogBufferSize should be a user-definable memory size, not "number of slots". The fact that internally we keep a vector of disjunct memory snippets is an implementation detail; the user should just give a memory size and the VM should interprete this. This leaves us the freedom to later change the implementation as we see fit.

3) LogAsyncInterval should not exist at all. If it has to exist, it should be a diagnostic switch, not a production one; but ideally, we would just log as soon as there is something to log, see below.

---

Implementation:

The use of the WatcherThread and PeriodicTask. Polling is plain inefficient, beside the concerns Robbin voiced about blocking things. This is a typical producer-consumer problem, and I would implement it using an own dedicated flusher thread and a monitor. The flusher thread should wake only if there is something to write. This is something I would not do in a separate RFE but now. It would also disarm any arguments against blocking the WatcherThread.

----

The fact that every log message gets strduped could be done better. This can be left for a future RFE - but it explains why I dislike "AsyncLogBufferSize" being "number of entries" instead of a memory size.

I think processing a memory-size AsyncLogBufferSize can be kept simple: it would be okay to just guess an average log line length and go with that. Lets say 256 chars. An AsyncLogBufferSize=1M could thus be translated to 4096 entries in your solution. If the sum of all 4096 allocated lines overshoots 1M from time to time, well so be it.

A future better solution could use a preallocated fixed sized buffer. There are two ways to do this, the naive but memory inefficient way - array of fixed sized text slots like the event system does. And a smart way: a ring buffer of variable sized strings, '\0' separated, laid out one after the other in memory. The latter is a bit more involved, but can be done, and it would be fast and very memory efficient. But as I wrote, this is an optimization which can be postponed.

----

I may misunderstand the patch, but do you resolve decorators when the flusher is printing? Would this not distort time-dependent decorators (timemillis, timenanos, uptime etc)? Since we record the time of printing, not the time of logging?.

If yes, it may be better to resolve the message early and just store the plain string and print that. Basically this would mean to move the whole buffering down a layer or two right at where the raw strings get printed. This would be vastly simplified if we abandon the "async for every trace sink" notion in favor of just a global flag.

This would also save a bit of space, since we would not have to carry all the meta information in `AsyncLogMessage` around. I count at least three 64bit slots, possibly 4-5, which alone makes for ~40 bytes per message. Resolved decorators are often smaller than that.

Please find further remarks inline.

> hi, Thomas,
>
> Thank you for reviewing this PR.
>
> > Hi Xin,
> > I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
> >
> > * Who does the writing, and who is affected when the writing stalls?
>
> The WatchThread eventually flushes those buffered messages. if the writing stalls, it blocks periodic tasks.
> It blocks long enough, other periodic tasks are skipped.
>
> > * Do you then block or throw output away?
> >  
> >   * If the former, how do you mitigate the ripple effect?
> >   * If the latter, how does the reader of the log file know that something is missing?
>
> The capacity of buffer is limited, which is `AsyncLogBufferSize` (2k by default).
> Actually, logTagSet.cpp limits the maximal length of a vwrite is 512. That means that maximal memory used by this buffer is 1M (=2k * 0.5k).
>
> If the buffer overflows, it starts dropping the heads. this behavior simulates a ringbuffer.
> If you enable `-XX:+Verbose`, the dropping message will be printed to the tty console.
>
> I prefer to drop messages than keeping them growing because later may trigger out-of-memory error.
>
> > * How often do you flush? How do you prevent missing output in the log file in case of crashes?
>
> The interval is defined by `LogAsyncInterval` (300ms by default). I insert a statement `async->flusher()` in `ostream_abort()`.
>

If the flusher blocks, this could block VM shutdown? Would this be different from what we do now, e.g. since all log output is serialized and done by one thread? Its probably fine, but we should think about this.

> > * Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
> >   to be honest, it can't. I see a lot of dropping messages on console with -XX:+Verbose.
>
> I have tuned parameters that it won't drop messages easily for normal GC activity with info verbosity.
> `-Xlog:*=trace` will drop messages indeed, but this is tunable. I have a [stress test](https://github.com/navyxliu/JavaGCworkload/blob/master/runJavaUL.sh) to tweak parameters.
>
> > * Does this work with multiple target and multiple IO files?
>
> Yes, it works if you have multiple outputs. `LogAsyncFlusher` is singleton. one single buffer and one thread serve them all.

The question was how we handle multiple trace sinks, see my "CSR" remarks.

>
> > * Does it cost anything if logging is off or not async?
>
> so far, LogAsyncFlusher as a periodic task remains active even no output is in async_mode.
> it wakes up every `LogAsyncInterval` ms. it's a dummy task because the deque is always empty. the cost is almost nothing.
>
> > Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
> > I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
>
> You concern is reasonable. I don't understand why there is only one watchThread and up to 10 periodic tasks are crowded in it.
> If it's a bottleneck, I plan to improve this infrastructure. I can make hotspot supports multiple watcher threads and spread periodic tasks among them. All watcher threads are connected using linked list to manage.
>
> Can we treat it as a separated task? for normal usage, I think the delay is quite managed. Writing thousands of lines to a file usually can be done in sub-ms.
>
> > * How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.
>
> IMHO, logging shouldn't hurt performance a lot. At least, those do impact on performance are not supposed to enable by default. On the other side, I hope logging messages from other threads avoid from interweaving when I enable them to read.
> That leads me to use mutex. That actually improves readability.
>
> My design target is non-blocking. pop_all() is an ad-hoc operation which pop up all elements and release the mutex immediately. writeback() does IO without it.

Since you use a mutex it introduces synchronization, however short, across all logging threads. So it influences runtime behavior. For the record, I think this is okay; maybe a future RFE could improve this with a lockless algorithm. I just wanted to know if you measured anything, and I was curious whether there is a difference now between synchronous and asynchronous logging.

(Funnily, asynchronous logging is really more synchronous in a sense, since it synchronizes all logging threads across a common resource).

>
> In our real applications, we haven't seen this feature downgrade GC performance yet.
>
> > I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
> > Cheers, Thomas
>
> I believe UL has its own reasons. In my defense, I don't make UL more complex. I only changed a couple of lines in one of its implementation file(logFileOutput.cpp) and didn't change its interfaces.
>
> I try my best to reuse existing codebase. We can always refactor existing code([JDK-8239066](https://bugs.openjdk.java.net/browse/JDK-8239066), [JDK-8263840](https://bugs.openjdk.java.net/browse/JDK-8263840)), but it's not this PR's purpose.
>

I understand. Its fine to do this in a later RFE.

> thanks,
> --lx

Cheers, Thomas

src/hotspot/share/logging/logAsyncFlusher.cpp line 33:

> 31:     // should cache this object somehow
> 32:     LogDecorations decorations(_level, _tagset, _decorators);
> 33:     _output.write_blocking(decorations, _message);

Would this mean that time dependent decorators get resolved at print time, not when the log happen?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu
In reply to this post by Volker Simonis-3
On Fri, 26 Mar 2021 09:41:01 GMT, Volker Simonis <[hidden email]> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> src/hotspot/share/logging/logFileOutput.cpp line 324:
>
>> 322:   if (_async_mode && flusher != NULL) {
>> 323:     flusher->enqueue(*this, decorations, msg);
>> 324:     return 0;
>
> I think the contract of `LogFileOutput::write()` is not clear. Should this return the number of characters that have been actually written out or the number of characters that have been consumed. For the time beeing this doesn't seem to be a problem though, because the current callers of `LogFileOutput::write()` don't seem to check the return value anyway.

yes, it seems that it doesn't matter.  logTagSet.cpp don't read the return value of LogOutput::write().
'return 0' may not be good here. IMHO, we can return -1, which means unknown or NA.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8229517: Support for optional asynchronous/buffered logging [v2]

Xin Liu
In reply to this post by Xin Liu
On Fri, 26 Mar 2021 19:05:28 GMT, Xin Liu <[hidden email]> wrote:

>> Agreed. No need for an extra option. Handle it like other logging framework defaults (e.g. `DefaultFileCount` or `DefaultFileSize`)
>>
>> You also have to check that `LogAsyncInterval` is zero modulo 10 otherwise you'll get:
>> $ ./images/jdk/bin/java -XX:LogAsyncInterval=13 -Xlog:gc:file=/tmp/class.log::async=true -version
>> # To suppress the following error report, specify this argument
>> # after -XX: or in .hotspotrc:  SuppressErrorAt=/task.cpp:73
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/runtime/task.cpp:73), pid=21168, tid=21169
>> #  assert(_interval >= PeriodicTask::min_interval && _interval % PeriodicTask::interval_gran == 0) failed: improper PeriodicTask interval time
>> #
>> # JRE version:  (17.0) (slowdebug build )
>> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 17-internal+0-adhoc.simonisv.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # V  [libjvm.so+0x1244301]  PeriodicTask::PeriodicTask(unsigned long)+0x83
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /priv/simonisv/output/jdk-dbg/hs_err_pid21168.log
>> #
>> #
>> Aborted
>
> oh, I don't know this constraint.  I will add a constraint check for the option.

add  a constraint for it. now it will get the error message for 13.
java -XX:LogAsyncInterval=13 -Xlog:gc:file=/tmp/class.log::async=true -version
LogAsyncInterval (13) must be evenly divisible by PeriodicTask::interval_gran (10)
Improperly specified VM option 'LogAsyncInterval=13'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135
12