Request For Comment: Asynchronous Logging

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Request For Comment: Asynchronous Logging

Volker Simonis
Hi,

I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
We are successfully using this feature productively at Amazon both in
jdk8 and jdk11 to reduce the tail latency of services which use
logging. We think that async logging is a useful addition to the
current logging framework which might be beneficial to a wider range
of users. The following write-up tries to capture the comments and
suggestions from the previous discussions we are aware of.

Current state:

- HotSpot uses the so called "Unified Logging" (UL) framework which
was introduced by JEP 158 [5] in JDK 9. Most logs have been
retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
[6]).
- The current UL implementation is based on the standard C buffered
stream I/O interface [7]. The LogFileStreamOutput class which writes
logs to abstract FILE streams is the only child of the abstract base
class LogOutput. LogFileStreamOutput has three child classes
LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
stdout, stderr or an arbitrary file respectively. The initial UL JEP
158 [5] envisioned logging to a socket but has not implemented it. At
least one such extension has been proposed since then [8].
- UL synchronizes logs from different threads with the help of the
standard C flockfile()/funlockfile() [9] functions. But even without
this explicit locking, all the "stdio functions are thread-safe. This
is achieved by assigning to each FILE object a lockcount and (if the
lockcount is nonzero) an owning thread.  For each library call, these
functions wait until the FILE object is no longer locked by a
different thread, then lock it, do the requested I/O, and unlock the
object again" [9]. A quick look at the glibc sources reveals that FILE
locking is implemented with the help of futex() [10] which breaks down
to s simple atomic compare and swap (CAS) on the fast path.
- Notice that UL "synchronizes" logs from different threads to avoid
log interleaving. But it does not "serialize" logs according to the
time at which  they occurred. This is because the underlying stdio
functions do not guarantee a specific order for different threads
waiting on a locked FILE stream. E.g. if three log events A, B, C
occur in that order, the first will lock the output stream. If the log
events B and C both arrive while the stream is locked, it is
unspecified which of B and C will be logged first after A releases the
lock.

Problem statement:

- The amount of time a log event will block its FILE stream depends on
the underlying file system. This can range from a few nanoseconds for
in-memory file systems or milliseconds for physical discs under heavy
load up to several seconds in the worst case scenario for e.g. network
file systems. A blocked log output stream will block all concurrent
threads which try to write log messages at the same time. If logging
is done during a safepoint, this can significantly increase the
safepoint time (e.g. several parallel GC threads trying to log at the
same time). We can treat stdout/stderr as special files here without
loss of generality.

Proposed solution:

Extend UL with an asynchronous logging facility. Asynchronous logging
will be optional and disabled by default. It should have the following
properties:
- If disabled (the default) asynchronous logging should have no
observable impact on logging.
- If enabled, log messages will be stored in an intermediate data
structure (e.g. a double ended queue).
- A service thread will concurrently read and remove log messages from
that data structure in a FIFO style and write them to the output
stream
- Storing log messages in the intermediate data structure should take
constant time and not longer than logging a message takes in the
traditional UL system (in general the time should be much shorter
because the actual I/O is deferred).
- Asynchronous logging trades memory overhead (i.e. the size of the
intermediate data structure) for log accuracy. This means that in the
unlikely case where the service thread which does the asynchronous
logging falls behind the log producing threads, some logs might be
lost. However, the probability for this to happen can be minimized by
increasing the configurable size of the intermediate data structure.
- The final output produced by asynchronous logging should be equal to
the output from normal logging if no messages had to be dropped.
Notice that in contrast to the traditional unified logging,
asynchronous logging will give us the possibility to not only
synchronize log events, but to optionally also serialize them based on
their generation time if that's desired. This is because we are in
full control of the synchronization primitives for the intermediate
data structure which stores the logs.
- If log messages had to be dropped, this should be logged in the log
output (e.g. "[..] 42 messages dropped due to async logging")
- Asynchronous logging should ideally be implemented in such a way
that it can be easily adapted by alternative log targets like for
example sockets in the future.

Alternative solutions:
- It has repeatedly been suggested to place the log files into a
memory file system but we don't think this is an optimal solution.
Main memory is often a constrained resource and we don't want log
files to compete with the JVM for it in such cases.
- It has also been argued to place the log files on a fast file system
which is only used for logging but in containerized environments file
system are often virtualized and the properties of the underlying
physical devices are not obvious.
- The load on the file system might be unpredictable due to other
applications on the same host.
- All these workarounds won't work if we eventually implement direct
logging to a network socket as suggested in [8].

Implementation details / POC:

- A recent pull request [2] for JDK-8229517 [3] proposed to use a
simple deque implementation derived from HotSpot's LinkedListImpl
class for the intermediate data structure. It synchronizes access to
the queue with a MutexLocker which is internally implemented with
pthread_lock() and results in an atomic CAS on the fast path. So
performance-wise the locking itself is not different from the
flockfile()/funlockfile() functionality currently used by UL but
adding a log message to the deque should be constant as it basically
only requires a strdup(). And we could even eliminate the strdup() if
we'd pre-allocate a big enough array for holding the log messages as
proposed in the pull request [2].
- The pull pull request [2] for JDK-8229517 [3] proposed to set the
async flag as an attribute of the Xlog option which feels more natural
because UL configuration is traditionally done within the Xlog option.
But we could just as well use a global -XX flag to enable async
logging? What are your preferences here?
- The pull pull request [2] for JDK-8229517 [3] (mis)uses the
WatcherThread as service thread to concurrently process the
intermediate data structure and write the log messages out to the log
stream. That should definitely be changed to an independent service
thread.
- The pull pull request [2] for JDK-8229517 [3] initially proposed
that the "service thread" runs at a fixed interval to dump log
messages to the log streams. But reviewers commented that this should
better happen either continuously or based on the filling level of the
intermediate data structure. What are your preferences here?
- What are your preferences on the configuration of the intermediate
data structure? Should it be configured based on the maximum number of
log messages it can store or rather on the total size of the stored
log messages? I think that for normal runs this distinction probably
won't make a big difference because the size of log messages will
probably be the same on average so "number of log messages" should
always be proportional to "total size of log mesages".

1. Before diving into more implementation details, I'd first like to
reach a general consensus that asynchronous logging is a useful
feature that's worth while adding to HotSpot.

2. Once we agree on that, we should agree on the desired properties of
asynchronous logging. I've tried to collect a basic set in the
"Proposed solution" section.

3. If that's done as well, we can discuss various implementation
details and finally prepare new pull requests.

Thank you and best regards,
Volker

[1] https://bugs.openjdk.java.net/browse/JDK-8229517
[2] https://github.com/openjdk/jdk/pull/3135
[3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
[4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
[5] https://openjdk.java.net/jeps/158
[6] https://openjdk.java.net/jeps/271
[7] https://man7.org/linux/man-pages/man3/stdio.3.html
[8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
[9] https://man7.org/linux/man-pages/man3/flockfile.3.html
[10] https://man7.org/linux/man-pages/man2/futex.2.html
Reply | Threaded
Open this post in threaded view
|

Re: Request For Comment: Asynchronous Logging

Liu, Xin
Thanks Volker for this. I would like to append some additional materials.  I forgot to mention them when I wrote the Rationale[5] yesterday.

We identified and root-caused the tail-latency on a Linux system with software RAID in 2018.  
We have different implementations on jdk8u and jdk11u. We are seeking to merge this feature to tip.

Nonetheless, it doesn't mean "async-logging facility" only solves Amazon's peculiar problem.  When we studied this, we found many interesting references.
Eg. LinkedIn reported and analyzed it well[1]. In particular, they mentioned that one reason was Linux cache writeback [2]. IMHO, that could impact
almost all mass-storge Linux filesystems. Twitter also expressed that "I would love to hear if this can happen with OpenJDK!"[3].  This is also reported
by other companies[4].

Thanks,
--lx


[1] https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
[2] https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
[3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
[4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
[5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487

On 3/30/21, 11:20 AM, "Volker Simonis" <[hidden email]> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hi,

    I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
    We are successfully using this feature productively at Amazon both in
    jdk8 and jdk11 to reduce the tail latency of services which use
    logging. We think that async logging is a useful addition to the
    current logging framework which might be beneficial to a wider range
    of users. The following write-up tries to capture the comments and
    suggestions from the previous discussions we are aware of.

    Current state:

    - HotSpot uses the so called "Unified Logging" (UL) framework which
    was introduced by JEP 158 [5] in JDK 9. Most logs have been
    retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
    [6]).
    - The current UL implementation is based on the standard C buffered
    stream I/O interface [7]. The LogFileStreamOutput class which writes
    logs to abstract FILE streams is the only child of the abstract base
    class LogOutput. LogFileStreamOutput has three child classes
    LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
    stdout, stderr or an arbitrary file respectively. The initial UL JEP
    158 [5] envisioned logging to a socket but has not implemented it. At
    least one such extension has been proposed since then [8].
    - UL synchronizes logs from different threads with the help of the
    standard C flockfile()/funlockfile() [9] functions. But even without
    this explicit locking, all the "stdio functions are thread-safe. This
    is achieved by assigning to each FILE object a lockcount and (if the
    lockcount is nonzero) an owning thread.  For each library call, these
    functions wait until the FILE object is no longer locked by a
    different thread, then lock it, do the requested I/O, and unlock the
    object again" [9]. A quick look at the glibc sources reveals that FILE
    locking is implemented with the help of futex() [10] which breaks down
    to s simple atomic compare and swap (CAS) on the fast path.
    - Notice that UL "synchronizes" logs from different threads to avoid
    log interleaving. But it does not "serialize" logs according to the
    time at which  they occurred. This is because the underlying stdio
    functions do not guarantee a specific order for different threads
    waiting on a locked FILE stream. E.g. if three log events A, B, C
    occur in that order, the first will lock the output stream. If the log
    events B and C both arrive while the stream is locked, it is
    unspecified which of B and C will be logged first after A releases the
    lock.

    Problem statement:

    - The amount of time a log event will block its FILE stream depends on
    the underlying file system. This can range from a few nanoseconds for
    in-memory file systems or milliseconds for physical discs under heavy
    load up to several seconds in the worst case scenario for e.g. network
    file systems. A blocked log output stream will block all concurrent
    threads which try to write log messages at the same time. If logging
    is done during a safepoint, this can significantly increase the
    safepoint time (e.g. several parallel GC threads trying to log at the
    same time). We can treat stdout/stderr as special files here without
    loss of generality.

    Proposed solution:

    Extend UL with an asynchronous logging facility. Asynchronous logging
    will be optional and disabled by default. It should have the following
    properties:
    - If disabled (the default) asynchronous logging should have no
    observable impact on logging.
    - If enabled, log messages will be stored in an intermediate data
    structure (e.g. a double ended queue).
    - A service thread will concurrently read and remove log messages from
    that data structure in a FIFO style and write them to the output
    stream
    - Storing log messages in the intermediate data structure should take
    constant time and not longer than logging a message takes in the
    traditional UL system (in general the time should be much shorter
    because the actual I/O is deferred).
    - Asynchronous logging trades memory overhead (i.e. the size of the
    intermediate data structure) for log accuracy. This means that in the
    unlikely case where the service thread which does the asynchronous
    logging falls behind the log producing threads, some logs might be
    lost. However, the probability for this to happen can be minimized by
    increasing the configurable size of the intermediate data structure.
    - The final output produced by asynchronous logging should be equal to
    the output from normal logging if no messages had to be dropped.
    Notice that in contrast to the traditional unified logging,
    asynchronous logging will give us the possibility to not only
    synchronize log events, but to optionally also serialize them based on
    their generation time if that's desired. This is because we are in
    full control of the synchronization primitives for the intermediate
    data structure which stores the logs.
    - If log messages had to be dropped, this should be logged in the log
    output (e.g. "[..] 42 messages dropped due to async logging")
    - Asynchronous logging should ideally be implemented in such a way
    that it can be easily adapted by alternative log targets like for
    example sockets in the future.

    Alternative solutions:
    - It has repeatedly been suggested to place the log files into a
    memory file system but we don't think this is an optimal solution.
    Main memory is often a constrained resource and we don't want log
    files to compete with the JVM for it in such cases.
    - It has also been argued to place the log files on a fast file system
    which is only used for logging but in containerized environments file
    system are often virtualized and the properties of the underlying
    physical devices are not obvious.
    - The load on the file system might be unpredictable due to other
    applications on the same host.
    - All these workarounds won't work if we eventually implement direct
    logging to a network socket as suggested in [8].

    Implementation details / POC:

    - A recent pull request [2] for JDK-8229517 [3] proposed to use a
    simple deque implementation derived from HotSpot's LinkedListImpl
    class for the intermediate data structure. It synchronizes access to
    the queue with a MutexLocker which is internally implemented with
    pthread_lock() and results in an atomic CAS on the fast path. So
    performance-wise the locking itself is not different from the
    flockfile()/funlockfile() functionality currently used by UL but
    adding a log message to the deque should be constant as it basically
    only requires a strdup(). And we could even eliminate the strdup() if
    we'd pre-allocate a big enough array for holding the log messages as
    proposed in the pull request [2].
    - The pull pull request [2] for JDK-8229517 [3] proposed to set the
    async flag as an attribute of the Xlog option which feels more natural
    because UL configuration is traditionally done within the Xlog option.
    But we could just as well use a global -XX flag to enable async
    logging? What are your preferences here?
    - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
    WatcherThread as service thread to concurrently process the
    intermediate data structure and write the log messages out to the log
    stream. That should definitely be changed to an independent service
    thread.
    - The pull pull request [2] for JDK-8229517 [3] initially proposed
    that the "service thread" runs at a fixed interval to dump log
    messages to the log streams. But reviewers commented that this should
    better happen either continuously or based on the filling level of the
    intermediate data structure. What are your preferences here?
    - What are your preferences on the configuration of the intermediate
    data structure? Should it be configured based on the maximum number of
    log messages it can store or rather on the total size of the stored
    log messages? I think that for normal runs this distinction probably
    won't make a big difference because the size of log messages will
    probably be the same on average so "number of log messages" should
    always be proportional to "total size of log mesages".

    1. Before diving into more implementation details, I'd first like to
    reach a general consensus that asynchronous logging is a useful
    feature that's worth while adding to HotSpot.

    2. Once we agree on that, we should agree on the desired properties of
    asynchronous logging. I've tried to collect a basic set in the
    "Proposed solution" section.

    3. If that's done as well, we can discuss various implementation
    details and finally prepare new pull requests.

    Thank you and best regards,
    Volker

    [1] https://bugs.openjdk.java.net/browse/JDK-8229517
    [2] https://github.com/openjdk/jdk/pull/3135
    [3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
    [4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
    [5] https://openjdk.java.net/jeps/158
    [6] https://openjdk.java.net/jeps/271
    [7] https://man7.org/linux/man-pages/man3/stdio.3.html
    [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
    [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
    [10] https://man7.org/linux/man-pages/man2/futex.2.html

Reply | Threaded
Open this post in threaded view
|

Re: Request For Comment: Asynchronous Logging

tstuefe
In reply to this post by Volker Simonis
Hi Volker,

Excellent summary. Thank you for starting the design discussion away from
the PR thread at GH. I think this is a better place for this discussion.

I think UL had been missing a feature like this. I think we should provide
it. Different people have come forward with the same idea in the past, so I
believe there is a real need.

You captured all points succinctly and prepared the discussion well. My
remarks are inline (where I don't write one I agree).

On Tue, Mar 30, 2021 at 8:19 PM Volker Simonis <[hidden email]>
wrote:

> Hi,
>
> I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
> We are successfully using this feature productively at Amazon both in
> jdk8 and jdk11 to reduce the tail latency of services which use
> logging. We think that async logging is a useful addition to the
> current logging framework which might be beneficial to a wider range
> of users. The following write-up tries to capture the comments and
> suggestions from the previous discussions we are aware of.
>
> Current state:
>
> - HotSpot uses the so called "Unified Logging" (UL) framework which
> was introduced by JEP 158 [5] in JDK 9. Most logs have been
> retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
> [6]).
> - The current UL implementation is based on the standard C buffered
> stream I/O interface [7]. The LogFileStreamOutput class which writes
> logs to abstract FILE streams is the only child of the abstract base
> class LogOutput. LogFileStreamOutput has three child classes
> LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
> stdout, stderr or an arbitrary file respectively. The initial UL JEP
> 158 [5] envisioned logging to a socket but has not implemented it. At
> least one such extension has been proposed since then [8].
> - UL synchronizes logs from different threads with the help of the
> standard C flockfile()/funlockfile() [9] functions. But even without
> this explicit locking, all the "stdio functions are thread-safe. This
> is achieved by assigning to each FILE object a lockcount and (if the
> lockcount is nonzero) an owning thread.  For each library call, these
> functions wait until the FILE object is no longer locked by a
> different thread, then lock it, do the requested I/O, and unlock the
> object again" [9]. A quick look at the glibc sources reveals that FILE
> locking is implemented with the help of futex() [10] which breaks down
> to s simple atomic compare and swap (CAS) on the fast path.
> - Notice that UL "synchronizes" logs from different threads to avoid
> log interleaving. But it does not "serialize" logs according to the
> time at which  they occurred. This is because the underlying stdio
> functions do not guarantee a specific order for different threads
> waiting on a locked FILE stream. E.g. if three log events A, B, C
> occur in that order, the first will lock the output stream. If the log
> events B and C both arrive while the stream is locked, it is
> unspecified which of B and C will be logged first after A releases the
> lock.
>
> Problem statement:
>
> - The amount of time a log event will block its FILE stream depends on
> the underlying file system. This can range from a few nanoseconds for
> in-memory file systems or milliseconds for physical discs under heavy
> load up to several seconds in the worst case scenario for e.g. network
> file systems. A blocked log output stream will block all concurrent
> threads which try to write log messages at the same time. If logging
> is done during a safepoint, this can significantly increase the
> safepoint time (e.g. several parallel GC threads trying to log at the
> same time). We can treat stdout/stderr as special files here without
> loss of generality.
>
> Proposed solution:
>
> Extend UL with an asynchronous logging facility. Asynchronous logging
> will be optional and disabled by default. It should have the following
> properties:
> - If disabled (the default) asynchronous logging should have no
> observable impact on logging.
>

Additionally, if disabled, it should not cost anything. If enabled, it
should cost as little as possible (eg. if logging is enabled but nobody
logs).


> - If enabled, log messages will be stored in an intermediate data
> structure (e.g. a double ended queue).
> - A service thread will concurrently read and remove log messages from
> that data structure in a FIFO style and write them to the output
> stream
> - Storing log messages in the intermediate data structure should take
> constant time and not longer than logging a message takes in the
> traditional UL system (in general the time should be much shorter
> because the actual I/O is deferred).
> - Asynchronous logging trades memory overhead (i.e. the size of the
> intermediate data structure) for log accuracy. This means that in the
> unlikely case where the service thread which does the asynchronous
> logging falls behind the log producing threads, some logs might be
> lost. However, the probability for this to happen can be minimized by
> increasing the configurable size of the intermediate data structure.
> - The final output produced by asynchronous logging should be equal to
> the output from normal logging if no messages had to be dropped.
>

+1. This means decorators have to be resolved at the log site, not in the
flusher.


> Notice that in contrast to the traditional unified logging,
> asynchronous logging will give us the possibility to not only
> synchronize log events, but to optionally also serialize them based on
> their generation time if that's desired. This is because we are in
> full control of the synchronization primitives for the intermediate
> data structure which stores the logs.

- If log messages had to be dropped, this should be logged in the log
> output (e.g. "[..] 42 messages dropped due to async logging")
> - Asynchronous logging should ideally be implemented in such a way
> that it can be easily adapted by alternative log targets like for
> example sockets in the future.
>

Additional requests:
- no log output should be withheld in case of vm exit or crash
- no log output should be unreasonably delayed
- The logger side should use as little VM infrastructure as possible to
prevent circularity.

>
> Alternative solutions:
> - It has repeatedly been suggested to place the log files into a
> memory file system but we don't think this is an optimal solution.
> Main memory is often a constrained resource and we don't want log
> files to compete with the JVM for it in such cases.
> - It has also been argued to place the log files on a fast file system
> which is only used for logging but in containerized environments file
> system are often virtualized and the properties of the underlying
> physical devices are not obvious.
> - The load on the file system might be unpredictable due to other
> applications on the same host.
> - All these workarounds won't work if we eventually implement direct
> logging to a network socket as suggested in [8].
>
> Implementation details / POC:
>
> - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> simple deque implementation derived from HotSpot's LinkedListImpl
> class for the intermediate data structure. It synchronizes access to
> the queue with a MutexLocker which is internally implemented with
> pthread_lock() and results in an atomic CAS on the fast path. So
> performance-wise the locking itself is not different from the
> flockfile()/funlockfile() functionality currently used by UL but
> adding a log message to the deque should be constant as it basically
> only requires a strdup(). And we could even eliminate the strdup() if
> we'd pre-allocate a big enough array for holding the log messages as
> proposed in the pull request [2].
>

IIUC to be equivalent to flockfile the implementation would have to use one
queue and one mutex *per file sink*, not a global queue/mutex as the patch
proposed. Because otherwise you now introduce synchronization between log
sites logging into different files, which before did not affect each other.


> - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> async flag as an attribute of the Xlog option which feels more natural
> because UL configuration is traditionally done within the Xlog option.
> But we could just as well use a global -XX flag to enable async
> logging? What are your preferences here?
>

I prefer to keep "async" a global option. I think we should expose as
little freedom to the user as possible. I do not think there is a sensible
scenario where one would wish to write to one file with async, to another
file without async. Nevertheless, if we make this option target-specific
this has to work, and perform, and be regression-tested in all its
variations. Every option we roll out is a contract we have to fulfill. They
find their way into all kinds of environments and user scripts, and once
out there it is difficult to roll out incompatible changes. For instance,
there is no mechanism to deprecate a part of an option. We have a mechanism
for deprecating normal VM options, but Xlog is not a  normal option.

I am concerned with keeping UL maintainable, and that means keeping the
implementation malleable. The more implementation details we expose in the
form of options and functionality, the more our hands are tied if we want
to change the implementation later. E.g. the implementation of a
target-specific async option has to be aware of the existence of targets,
and would prevent implementation of this feature in a layer which does not
know about targets (e.g. deep down in File IO code).


> - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> WatcherThread as service thread to concurrently process the
> intermediate data structure and write the log messages out to the log
> stream. That should definitely be changed to an independent service
> thread.
>

Yes.


> - The pull pull request [2] for JDK-8229517 [3] initially proposed
> that the "service thread" runs at a fixed interval to dump log
> messages to the log streams. But reviewers commented that this should
> better happen either continuously or based on the filling level of the
> intermediate data structure. What are your preferences here?
>

I'd say dump continuously. I like the "flush on filling level" idea even
less than the idea of periodic flushes. I do not like trace systems which
arbitrarily keep output from me and need a flush to spit everything out. Or
omit the last n lines if the VM crashes.

- What are your preferences on the configuration of the intermediate
> data structure? Should it be configured based on the maximum number of
> log messages it can store or rather on the total size of the stored
> log messages? I think that for normal runs this distinction probably
> won't make a big difference because the size of log messages will
> probably be the same on average so "number of log messages" should
> always be proportional to "total size of log mesages".
>
>
I prefer the configuration of the intermediate buffer to be as a memory
size, not "number of entries". The latter does not carry any information
(what entries? how large are they?). It also, again, exposes implementation
details - in this case that there is a vector of entries.

A memory size could initially, in a first implementation, translated
roughly to deque size by just assuming an average log line length. Future
implementations then have the freedom to change this, e.g. use a
pre-allocated fixed sized buffer of the given length, or a more involved
scheme.


> 1. Before diving into more implementation details, I'd first like to
> reach a general consensus that asynchronous logging is a useful
> feature that's worth while adding to HotSpot.
>
> 2. Once we agree on that, we should agree on the desired properties of
> asynchronous logging. I've tried to collect a basic set in the
> "Proposed solution" section.
>
> 3. If that's done as well, we can discuss various implementation
> details and finally prepare new pull requests.
>
> Thank you and best regards,
> Volker
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> [2] https://github.com/openjdk/jdk/pull/3135
> [3]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> [4]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> [5] https://openjdk.java.net/jeps/158
> [6] https://openjdk.java.net/jeps/271
> [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> [10] https://man7.org/linux/man-pages/man2/futex.2.html


Thanks, Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Request For Comment: Asynchronous Logging

Yasumasa Suenaga-6
Hi,

On 2021/04/01 4:10, Thomas Stüfe wrote:

> Hi Volker,
>
> Excellent summary. Thank you for starting the design discussion away from the PR thread at GH. I think this is a better place for this discussion.
>
> I think UL had been missing a feature like this. I think we should provide it. Different people have come forward with the same idea in the past, so I believe there is a real need.
>
> You captured all points succinctly and prepared the discussion well. My remarks are inline (where I don't write one I agree).
>
> On Tue, Mar 30, 2021 at 8:19 PM Volker Simonis <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
>     We are successfully using this feature productively at Amazon both in
>     jdk8 and jdk11 to reduce the tail latency of services which use
>     logging. We think that async logging is a useful addition to the
>     current logging framework which might be beneficial to a wider range
>     of users. The following write-up tries to capture the comments and
>     suggestions from the previous discussions we are aware of.
>
>     Current state:
>
>     - HotSpot uses the so called "Unified Logging" (UL) framework which
>     was introduced by JEP 158 [5] in JDK 9. Most logs have been
>     retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
>     [6]).
>     - The current UL implementation is based on the standard C buffered
>     stream I/O interface [7]. The LogFileStreamOutput class which writes
>     logs to abstract FILE streams is the only child of the abstract base
>     class LogOutput. LogFileStreamOutput has three child classes
>     LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
>     stdout, stderr or an arbitrary file respectively. The initial UL JEP
>     158 [5] envisioned logging to a socket but has not implemented it. At
>     least one such extension has been proposed since then [8].
>     - UL synchronizes logs from different threads with the help of the
>     standard C flockfile()/funlockfile() [9] functions. But even without
>     this explicit locking, all the "stdio functions are thread-safe. This
>     is achieved by assigning to each FILE object a lockcount and (if the
>     lockcount is nonzero) an owning thread.  For each library call, these
>     functions wait until the FILE object is no longer locked by a
>     different thread, then lock it, do the requested I/O, and unlock the
>     object again" [9]. A quick look at the glibc sources reveals that FILE
>     locking is implemented with the help of futex() [10] which breaks down
>     to s simple atomic compare and swap (CAS) on the fast path.
>     - Notice that UL "synchronizes" logs from different threads to avoid
>     log interleaving. But it does not "serialize" logs according to the
>     time at which  they occurred. This is because the underlying stdio
>     functions do not guarantee a specific order for different threads
>     waiting on a locked FILE stream. E.g. if three log events A, B, C
>     occur in that order, the first will lock the output stream. If the log
>     events B and C both arrive while the stream is locked, it is
>     unspecified which of B and C will be logged first after A releases the
>     lock.
>
>     Problem statement:
>
>     - The amount of time a log event will block its FILE stream depends on
>     the underlying file system. This can range from a few nanoseconds for
>     in-memory file systems or milliseconds for physical discs under heavy
>     load up to several seconds in the worst case scenario for e.g. network
>     file systems. A blocked log output stream will block all concurrent
>     threads which try to write log messages at the same time. If logging
>     is done during a safepoint, this can significantly increase the
>     safepoint time (e.g. several parallel GC threads trying to log at the
>     same time). We can treat stdout/stderr as special files here without
>     loss of generality.
>
>     Proposed solution:
>
>     Extend UL with an asynchronous logging facility. Asynchronous logging
>     will be optional and disabled by default. It should have the following
>     properties:
>     - If disabled (the default) asynchronous logging should have no
>     observable impact on logging.
>
>
> Additionally, if disabled, it should not cost anything. If enabled, it should cost as little as possible (eg. if logging is enabled but nobody logs).
>
>     - If enabled, log messages will be stored in an intermediate data
>     structure (e.g. a double ended queue).
>     - A service thread will concurrently read and remove log messages from
>     that data structure in a FIFO style and write them to the output
>     stream
>     - Storing log messages in the intermediate data structure should take
>     constant time and not longer than logging a message takes in the
>     traditional UL system (in general the time should be much shorter
>     because the actual I/O is deferred).
>     - Asynchronous logging trades memory overhead (i.e. the size of the
>     intermediate data structure) for log accuracy. This means that in the
>     unlikely case where the service thread which does the asynchronous
>     logging falls behind the log producing threads, some logs might be
>     lost. However, the probability for this to happen can be minimized by
>     increasing the configurable size of the intermediate data structure.
>     - The final output produced by asynchronous logging should be equal to
>     the output from normal logging if no messages had to be dropped.
>
>
> +1. This means decorators have to be resolved at the log site, not in the flusher.
>
>     Notice that in contrast to the traditional unified logging,
>     asynchronous logging will give us the possibility to not only
>     synchronize log events, but to optionally also serialize them based on
>     their generation time if that's desired. This is because we are in
>     full control of the synchronization primitives for the intermediate
>     data structure which stores the logs.
>
>     - If log messages had to be dropped, this should be logged in the log
>     output (e.g. "[..] 42 messages dropped due to async logging")
>     - Asynchronous logging should ideally be implemented in such a way
>     that it can be easily adapted by alternative log targets like for
>     example sockets in the future.
>
>
> Additional requests:
> - no log output should be withheld in case of vm exit or crash
> - no log output should be unreasonably delayed
> - The logger side should use as little VM infrastructure as possible to prevent circularity.
>
>
>     Alternative solutions:
>     - It has repeatedly been suggested to place the log files into a
>     memory file system but we don't think this is an optimal solution.
>     Main memory is often a constrained resource and we don't want log
>     files to compete with the JVM for it in such cases.
>     - It has also been argued to place the log files on a fast file system
>     which is only used for logging but in containerized environments file
>     system are often virtualized and the properties of the underlying
>     physical devices are not obvious.
>     - The load on the file system might be unpredictable due to other
>     applications on the same host.
>     - All these workarounds won't work if we eventually implement direct
>     logging to a network socket as suggested in [8].
>
>     Implementation details / POC:
>
>     - A recent pull request [2] for JDK-8229517 [3] proposed to use a
>     simple deque implementation derived from HotSpot's LinkedListImpl
>     class for the intermediate data structure. It synchronizes access to
>     the queue with a MutexLocker which is internally implemented with
>     pthread_lock() and results in an atomic CAS on the fast path. So
>     performance-wise the locking itself is not different from the
>     flockfile()/funlockfile() functionality currently used by UL but
>     adding a log message to the deque should be constant as it basically
>     only requires a strdup(). And we could even eliminate the strdup() if
>     we'd pre-allocate a big enough array for holding the log messages as
>     proposed in the pull request [2].
>
>
> IIUC to be equivalent to flockfile the implementation would have to use one queue and one mutex *per file sink*, not a global queue/mutex as the patch proposed. Because otherwise you now introduce synchronization between log sites logging into different files, which before did not affect each other.
>
>     - The pull pull request [2] for JDK-8229517 [3] proposed to set the
>     async flag as an attribute of the Xlog option which feels more natural
>     because UL configuration is traditionally done within the Xlog option.
>     But we could just as well use a global -XX flag to enable async
>     logging? What are your preferences here?
>
>
> I prefer to keep "async" a global option. I think we should expose as little freedom to the user as possible. I do not think there is a sensible scenario where one would wish to write to one file with async, to another file without async. Nevertheless, if we make this option target-specific this has to work, and perform, and be regression-tested in all its variations. Every option we roll out is a contract we have to fulfill. They find their way into all kinds of environments and user scripts, and once out there it is difficult to roll out incompatible changes. For instance,  there is no mechanism to deprecate a part of an option. We have a mechanism for deprecating normal VM options, but Xlog is not a  normal option.

I don't like to keep "async" a global option because other -Xlog options (e.g. filecount) does not global.
It might break consistency for -Xlog options.


> I am concerned with keeping UL maintainable, and that means keeping the implementation malleable. The more implementation details we expose in the form of options and functionality, the more our hands are tied if we want to change the implementation later. E.g. the implementation of a target-specific async option has to be aware of the existence of targets, and would prevent implementation of this feature in a layer which does not know about targets (e.g. deep down in File IO code).
>
>     - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
>     WatcherThread as service thread to concurrently process the
>     intermediate data structure and write the log messages out to the log
>     stream. That should definitely be changed to an independent service
>     thread.
>
>
> Yes.
>
>     - The pull pull request [2] for JDK-8229517 [3] initially proposed
>     that the "service thread" runs at a fixed interval to dump log
>     messages to the log streams. But reviewers commented that this should
>     better happen either continuously or based on the filling level of the
>     intermediate data structure. What are your preferences here?
>
>
> I'd say dump continuously. I like the "flush on filling level" idea even less than the idea of periodic flushes. I do not like trace systems which arbitrarily keep output from me and need a flush to spit everything out. Or omit the last n lines if the VM crashes.
>
>     - What are your preferences on the configuration of the intermediate
>     data structure? Should it be configured based on the maximum number of
>     log messages it can store or rather on the total size of the stored
>     log messages? I think that for normal runs this distinction probably
>     won't make a big difference because the size of log messages will
>     probably be the same on average so "number of log messages" should
>     always be proportional to "total size of log mesages".
>
>
> I prefer the configuration of the intermediate buffer to be as a memory size, not "number of entries". The latter does not carry any information (what entries? how large are they?). It also, again, exposes implementation details - in this case that there is a vector of entries.
>
> A memory size could initially, in a first implementation, translated roughly to deque size by just assuming an average log line length. Future implementations then have the freedom to change this, e.g. use a pre-allocated fixed sized buffer of the given length, or a more involved scheme.
>
>     1. Before diving into more implementation details, I'd first like to
>     reach a general consensus that asynchronous logging is a useful
>     feature that's worth while adding to HotSpot.
>
>     2. Once we agree on that, we should agree on the desired properties of
>     asynchronous logging. I've tried to collect a basic set in the
>     "Proposed solution" section.
>
>     3. If that's done as well, we can discuss various implementation
>     details and finally prepare new pull requests.
>
>     Thank you and best regards,
>     Volker
>
>     [1] https://bugs.openjdk.java.net/browse/JDK-8229517 <https://bugs.openjdk.java.net/browse/JDK-8229517>
>     [2] https://github.com/openjdk/jdk/pull/3135 <https://github.com/openjdk/jdk/pull/3135>
>     [3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
>     [4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
>     [5] https://openjdk.java.net/jeps/158 <https://openjdk.java.net/jeps/158>
>     [6] https://openjdk.java.net/jeps/271 <https://openjdk.java.net/jeps/271>
>     [7] https://man7.org/linux/man-pages/man3/stdio.3.html <https://man7.org/linux/man-pages/man3/stdio.3.html>
>     [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
>     [9] https://man7.org/linux/man-pages/man3/flockfile.3.html <https://man7.org/linux/man-pages/man3/flockfile.3.html>
>     [10] https://man7.org/linux/man-pages/man2/futex.2.html <https://man7.org/linux/man-pages/man2/futex.2.html>
>
>
> Thanks, Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Request For Comment: Asynchronous Logging

tstuefe
>
>
>
> >     - What are your preferences on the configuration of the intermediate
> >     data structure? Should it be configured based on the maximum number
> of
> >     log messages it can store or rather on the total size of the stored
> >     log messages? I think that for normal runs this distinction probably
> >     won't make a big difference because the size of log messages will
> >     probably be the same on average so "number of log messages" should
> >     always be proportional to "total size of log mesages".
> >
> >
> > I prefer the configuration of the intermediate buffer to be as a memory
> size, not "number of entries". The latter does not carry any information
> (what entries? how large are they?). It also, again, exposes implementation
> details - in this case that there is a vector of entries.
> >


Maybe our perspective differs from yours.

We provide long term support for customers who use the JDK in a large
variety of situations, and on a wide number of platforms. Any combination
of flags, regardless how odd and exotic, will be used by a small subset of
our users in the years to come. So it has to work reliably and has to be
tested. I would like to keep long term maintainability of the JDK, and that
includes being careful with rolling out new features.

I personally think that UL is already too complex. IMHO exposing the
ability to log to different log files was a bad choice. I have never seen a
situation where this feature was useful, let alone vital. But it is there
and can never be undone and its complexity causes technical debt. Just as
one example of how this already impacts us, I am currently changing NMT to
be late-initializable (see JDK-8256844), and that includes revamping early
dynamic C++ initialization. UL is by far the most complex early
initialization in that stage, with a surprising number of os::malloc calls.
All to initialize what basically is just a constant static structure, which
should have been static-initialized as a number of constants. But revamping
that is very difficult because we chose to expose features like this.

Every feature increases complexity. If the feature is useful, fine. But
aesthetics alone are not a good reason.

Please let's have this design discussion before pushing patches. The right
way to do things is to build consent first, then to write code. Every added
feature increases maintenance effort for *all* maintainers.

Thanks, Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Request For Comment: Asynchronous Logging

Ioi Lam
In reply to this post by Yasumasa Suenaga-6


On 3/31/21 5:16 PM, Yasumasa Suenaga wrote:

> On 2021/04/01 4:10, Thomas Stüfe wrote:
>>
>> I prefer to keep "async" a global option. I think we should expose as
>> little freedom to the user as possible. I do not think there is a
>> sensible scenario where one would wish to write to one file with
>> async, to another file without async. Nevertheless, if we make this
>> option target-specific this has to work, and perform, and be
>> regression-tested in all its variations. Every option we roll out is
>> a contract we have to fulfill. They find their way into all kinds of
>> environments and user scripts, and once out there it is difficult to
>> roll out incompatible changes. For instance,  there is no mechanism
>> to deprecate a part of an option. We have a mechanism for deprecating
>> normal VM options, but Xlog is not a  normal option.
>
> I don't like to keep "async" a global option because other -Xlog
> options (e.g. filecount) does not global.
> It might break consistency for -Xlog options.
>

There is actually one global Xlog option today:

     java -Xlog:disable ....

For async logging, I would prefer a global option like

     java -Xlog:async -Xlog:gc=gc.txt -Xlog:metaspace=metaspace.txt

I think we should expand the set of global Xlog options when appropriate.

To take this further (maybe in a separate RFE), having a global file
option will be useful. It's a pain to specify the same file many times,
especially if you don't want the log file rotation:

     java  -Xlog:gc=log.txt:none:filesize=0
-Xlog:metaspace=log.txt:none:filesize=0 ...

Yes, you can put all of the logs into a single command-line argument, like

     java  -Xlog:gc,metaspace:file=log.txt:none:filesize=0

but that will be difficult if you have a script that builds the
command-line conditionally.

So, if we now add a new "async" option for each file, it will make
things even more unmanageable.

Instead, something like this will be much easier to use:

     java  -Xlog:file=log.txt -Xlog:gc -Xlog:metaspace
     java  -Xlog:file=log.txt -Xlog:gc -Xlog:metaspace -Xlog:async

Thanks
- Ioi