Parameter HandshakeTimeout is in milliseconds.
Internally we use nanoseconds. HandshakeTimeout must be scaled to nanoseconds. Passes T1 ------------- Commit messages: - Use millis_to_nanos Changes: https://git.openjdk.java.net/jdk/pull/2666/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2666&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262094 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2666.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2666/head:pull/2666 PR: https://git.openjdk.java.net/jdk/pull/2666 |
On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote:
> Parameter HandshakeTimeout is in milliseconds. > Internally we use nanoseconds. > HandshakeTimeout must be scaled to nanoseconds. > > Passes T1 Hi Robbin, This change seems much clearer, but to be clear the actual result is the same as the "counter" tracks nanoseconds. This counts as a trivial change IMO. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2666 |
On Mon, 22 Feb 2021 07:46:47 GMT, David Holmes <[hidden email]> wrote:
> Hi Robbin, > > This change seems much clearer, but to be clear the actual result is the same as the "counter" tracks nanoseconds. > > This counts as a trivial change IMO. > > Thanks, > David Hi David, unfortunately I don't I agree that this is trivial: `open/src/hotspot/os/windows/os_windows.cpp:jlong os::elapsed_frequency() { open/src/hotspot/os/windows/os_windows.cpp- return performance_frequency; open/src/hotspot/os/windows/os_windows.cpp-}` This value comes from QueryPerformanceFrequency, which depends on what resolution the Windows thinks is reasonable given what hardware support there are. In a virtualized env. this causes a 20.000 ms timeout to timeout after 2 s for me due to low resolution (windows only). (all other platforms hardcoded to NANASECONDS) We have two more suspective uses here, which I have not investigated (but we have seem wired timeouts with both ZGC and G1, but I have seen non with parGC): `open/src/hotspot/share/gc/z/zMark.cpp: _timeout(_start.value() + TimeHelper::micros_to_counter(timeout_in_micros))` `open/src/hotspot/share/gc/g1/g1ServiceThread.cpp: jlong delay = TimeHelper::millis_to_counter(delay_ms);` Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by Robbin Ehn-2
On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote:
> Parameter HandshakeTimeout is in milliseconds. > Internally we use nanoseconds. > HandshakeTimeout must be scaled to nanoseconds. > > Passes T1 Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by Robbin Ehn-2
On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote:
> Parameter HandshakeTimeout is in milliseconds. > Internally we use nanoseconds. > HandshakeTimeout must be scaled to nanoseconds. > > Passes T1 `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see? ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by Stefan Karlsson-3
On Mon, 22 Feb 2021 09:01:16 GMT, Stefan Karlsson <[hidden email]> wrote:
> Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). Both uses a 'java type', jlong. For logging and logic for yielding we want a humanly understand unit that is absolute. Since things goes pretty fast nowadays nanseconds seems reasonable. Instead of to converting the elapsed_counter to nanos, it's much simpler to just get the nanos. But yes it would preferably with a os::timeNanos(). ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
On Mon, 22 Feb 2021 11:02:04 GMT, Robbin Ehn <[hidden email]> wrote:
>> Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). > >> Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). > > Both uses a 'java type', jlong. > For logging and logic for yielding we want a humanly understand unit that is absolute. > Since things goes pretty fast nowadays nanseconds seems reasonable. > Instead of to converting the elapsed_counter to nanos, it's much simpler to just get the nanos. > But yes it would preferably with a os::time_nanos(). > `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see? The logging during handshake timeout doesn't work on windows for some reason, when I locally fixed the logging I notice that timeout was actually 2 seconds instead of 20 seconds. But that only seems to apply to some windows versions/(virtual hardware). During some tests I have seem results such as: "Error: time between events 0 ns" When looking at the code the events are serialized with a mutex, so some time must have passed but the granularity of the system is to coarse. I don't know what else this can cause. ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by Robbin Ehn-2
Hi Robbin,
On 22/02/2021 6:19 pm, Robbin Ehn wrote: > On Mon, 22 Feb 2021 07:46:47 GMT, David Holmes <[hidden email]> wrote: > >> Hi Robbin, >> >> This change seems much clearer, but to be clear the actual result is the same as the "counter" tracks nanoseconds. >> >> This counts as a trivial change IMO. >> >> Thanks, >> David > > Hi David, unfortunately I don't I agree that this is trivial: > > `open/src/hotspot/os/windows/os_windows.cpp:jlong os::elapsed_frequency() { > open/src/hotspot/os/windows/os_windows.cpp- return performance_frequency; > open/src/hotspot/os/windows/os_windows.cpp-}` > > This value comes from QueryPerformanceFrequency, which depends on what resolution the Windows thinks is reasonable given what hardware support there are. Yep mea culpa I completely overlooked Windows. So on Windows this change actually alters the timeout. So a simple fix but not trivial. > In a virtualized env. this causes a 20.000 ms timeout to timeout after 2 s for me due to low resolution (windows only). > (all other platforms hardcoded to NANASECONDS) > > We have two more suspective uses here, which I have not investigated (but we have seem wired timeouts with both ZGC and G1, but I have seen non with parGC): > `open/src/hotspot/share/gc/z/zMark.cpp: _timeout(_start.value() + TimeHelper::micros_to_counter(timeout_in_micros))` > > `open/src/hotspot/share/gc/g1/g1ServiceThread.cpp: jlong delay = TimeHelper::millis_to_counter(delay_ms);` These should only be heuristics in any case, but I don't understand why we would want to express a delay/timeout this way. Cheers, David ----- > Thanks > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2666 > |
In reply to this post by Stefan Karlsson-3
On 22/02/2021 7:03 pm, Stefan Karlsson wrote:
> On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote: > >> Parameter HandshakeTimeout is in milliseconds. >> Internally we use nanoseconds. >> HandshakeTimeout must be scaled to nanoseconds. >> >> Passes T1 > > Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). Runtime code (threading and sync) uses actual time units (ie javaTimeNanos) rather than os::elapsed_counter(). And elapsed_counter() uses javaTimeNanos() on all platforms but Windows. It's not really about interaction with Java, just about semantics. A timeout/delay in "ticks" would scale with CPU speed if "ticks" presented that, but os::elapsed_counter() doesn't - so a timeout/delay of N "counts" is somewhat arbitrary in duration. Conversely, of course, any timeout/delay expressed in ms or ns is somewhat arbitrary in terms of how much code may have executed in that time. But these timeouts are heuristics anyway. Cheers, David > ------------- > > Changes requested by stefank (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/2666 > |
In reply to this post by Robbin Ehn-2
On Mon, 22 Feb 2021 11:07:42 GMT, Robbin Ehn <[hidden email]> wrote:
>>> Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). >> >> Both uses a 'java type', jlong. >> For logging and logic for yielding we want a humanly understand unit that is absolute. >> Since things goes pretty fast nowadays nanseconds seems reasonable. >> Instead of to converting the elapsed_counter to nanos, it's much simpler to just get the nanos. >> But yes it would preferably with a os::time_nanos(). > >> `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see? > > The logging during handshake timeout doesn't work on windows for some reason, when I locally fixed the logging I notice that timeout was actually 2 seconds instead of 20 seconds. But that only seems to apply to some windows versions/(virtual hardware). > > During some tests I have seem results such as: > "Error: time between events 0 ns" > When looking at the code the events are serialized with a mutex, so some time must have passed but the granularity of the system is to coarse. I don't know what else this can cause. > _Mailing list message from [David Holmes](mailto:[hidden email]) on [hotspot-runtime-dev](mailto:[hidden email]):_ > > On 22/02/2021 7:03 pm, Stefan Karlsson wrote: > > > On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote: > > > Parameter HandshakeTimeout is in milliseconds. > > > Internally we use nanoseconds. > > > HandshakeTimeout must be scaled to nanoseconds. > > > Passes T1 > > > > > > Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). > > Runtime code (threading and sync) uses actual time units (ie > javaTimeNanos) rather than os::elapsed_counter(). And elapsed_counter() > uses javaTimeNanos() on all platforms but Windows. I see. In the GC code we've tried to unify towards using one time source (elapsed_counter/elapsedTime/Ticks all use counters), and then convert to nanos/micros/millis when needed. It's a bit unfortunate that we don't use the same time source throughout the JVM. It's not really about > interaction with Java, just about semantics. From the GC's point-of-view it is. We use os::javaTimeNanos() just were we are forced to use it because interactions with Java. Other use-cases uses the counter based source. A timeout/delay in "ticks" > would scale with CPU speed if "ticks" presented that, but > os::elapsed_counter() doesn't - so a timeout/delay of N "counts" is > somewhat arbitrary in duration. For those cases you wouldn't pass down counter, but use TimeHelper::counter_to_millis (or equivalent). But OK, given that Runtime already uses os::javaTimeNanos() I'll back off. ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
On Mon, 22 Feb 2021 12:42:40 GMT, Stefan Karlsson <[hidden email]> wrote:
>>> `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see? >> >> The logging during handshake timeout doesn't work on windows for some reason, when I locally fixed the logging I notice that timeout was actually 2 seconds instead of 20 seconds. But that only seems to apply to some windows versions/(virtual hardware). >> >> During some tests I have seem results such as: >> "Error: time between events 0 ns" >> When looking at the code the events are serialized with a mutex, so some time must have passed but the granularity of the system is to coarse. I don't know what else this can cause. > >> _Mailing list message from [David Holmes](mailto:[hidden email]) on [hotspot-runtime-dev](mailto:[hidden email]):_ >> >> On 22/02/2021 7:03 pm, Stefan Karlsson wrote: >> >> > On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> > > Parameter HandshakeTimeout is in milliseconds. >> > > Internally we use nanoseconds. >> > > HandshakeTimeout must be scaled to nanoseconds. >> > > Passes T1 >> > >> > >> > Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter(). >> >> Runtime code (threading and sync) uses actual time units (ie >> javaTimeNanos) rather than os::elapsed_counter(). And elapsed_counter() >> uses javaTimeNanos() on all platforms but Windows. > > I see. In the GC code we've tried to unify towards using one time source (elapsed_counter/elapsedTime/Ticks all use counters), and then convert to nanos/micros/millis when needed. It's a bit unfortunate that we don't use the same time source throughout the JVM. > > It's not really about >> interaction with Java, just about semantics. > > From the GC's point-of-view it is. We use os::javaTimeNanos() just were we are forced to use it because interactions with Java. Other use-cases uses the counter based source. > > A timeout/delay in "ticks" >> would scale with CPU speed if "ticks" presented that, but >> os::elapsed_counter() doesn't - so a timeout/delay of N "counts" is >> somewhat arbitrary in duration. > > For those cases you wouldn't pass down counter, but use TimeHelper::counter_to_millis (or equivalent). > > But OK, given that Runtime already uses os::javaTimeNanos() I'll back off. I agree with that we should use the same time source. But here I just want to fixed this bug so the time-out is working as intended on windows. ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by David Holmes
Hi David,
> > > In a virtualized env. this causes a 20.000 ms timeout to timeout > > after 2 s for me due to low resolution (windows only). > > (all other platforms hardcoded to NANASECONDS) > > > > We have two more suspective uses here, which I have not > > investigated (but we have seem wired timeouts with both ZGC and G1, > > but I have seen non with parGC): > > `open/src/hotspot/share/gc/z/zMark.cpp: > > _timeout(_start.value() + > > TimeHelper::micros_to_counter(timeout_in_micros))` > > > > `open/src/hotspot/share/gc/g1/g1ServiceThread.cpp: jlong delay = > > TimeHelper::millis_to_counter(delay_ms);` > FYI: I looked at these and they are fine. > These should only be heuristics in any case, but I don't understand > why > we would want to express a delay/timeout this way. I don't either. Thanks, Robbin > > Cheers, > David > ----- > > > Thanks > > > > ------------- > > > > PR: https://git.openjdk.java.net/jdk/pull/2666 > > |
In reply to this post by Robbin Ehn-2
On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote:
> Parameter HandshakeTimeout is in milliseconds. > Internally we use nanoseconds. > HandshakeTimeout must be scaled to nanoseconds. > > Passes T1 Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
On Tue, 23 Feb 2021 07:37:54 GMT, Stefan Karlsson <[hidden email]> wrote:
>> Parameter HandshakeTimeout is in milliseconds. >> Internally we use nanoseconds. >> HandshakeTimeout must be scaled to nanoseconds. >> >> Passes T1 > > Marked as reviewed by stefank (Reviewer). Thanks David and Stefan! ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
In reply to this post by Robbin Ehn-2
On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <[hidden email]> wrote:
> Parameter HandshakeTimeout is in milliseconds. > Internally we use nanoseconds. > HandshakeTimeout must be scaled to nanoseconds. > > Passes T1 This pull request has now been integrated. Changeset: 9d9bedd0 Author: Robbin Ehn <[hidden email]> URL: https://git.openjdk.java.net/jdk/commit/9d9bedd0 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8262094: Handshake timeout scaled wrong Reviewed-by: dholmes, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2666 |
Free forum by Nabble | Edit this page |