Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Chris Hegarty
Xuelei,

Without diving deeper into this issue, Rob’s suggested approach seems reasonable to me, and better than existing out-of-the-box behaviour. I’m not sure what issues you are thinking of, with using the read timeout in combination with a retry mechanism, in this manner? If the network is so slow, surely there will be other issues with connecting and reading, why is closing any different.

-Chris.

> On 13 Sep 2017, at 16:52, Rob McKenna <[hidden email]> wrote:
>
> Hi Xuelei,
>
> This behaviour is already exposed via the autoclose boolean in:
>
> https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLSocketFactory.html#createSocket-java.net.Socket-java.io.InputStream-boolean-
>
> My position would be that allowing 5 retries allows us to say with some
> confidence that we're not going to get a close_notify from the server.
> If this is the case I think its reasonable to close the connection.
>
> W.r.t. a separate timeout, the underlying mechanics of a close already
> depend on the readTimeout in this situation. (waiting on a close_notify
> requires performing a read so the read timeout makes sense in this
> context) I'm happy to alter that but I think that the combination of
> a timeout and a retry count is straightforward and lower impact.
>
> In my opinion the default behaviour of potentially hanging indefinitely
> is worse than the alternative here. (bearing in mind that we are closing
> the underlying socket)
>
> I'll file a CSR as soon as we settle on the direction this fix will
> take.
>
>    -Rob
>
> On 13/09/17 05:52, Xuelei Fan wrote:
>> In theory, there are intermittent compatibility problems as this update may
>> not close the SSL connection over the existing socket layer gracefully, even
>> for high speed networking environments, while the underlying socket is
>> alive.  The impact could be serious in some environment.
>>
>> For safe, I may suggest turn this countermeasure off by default.  And
>> providing options to turn on this countermeasure:
>> 1. Close the SSL connection gracefully by default; or
>> 2. Close the SSL connection after a timeout.
>>
>> It's hardly to say 5 times receiving timeout is better/safer than timeout
>> once in this context.  As you have already had a system property to control,
>> you may be able to use options other than the customized socket receiving
>> timeout, so that the closing timeout is not mixed/confused/dependent on/with
>> the receiving timeout.
>>
>> Put all together:
>> 1. define a closing timeout, for example "jdk.tls.waitForClose".
>> 2. the property default value is zero, no behavior changes.
>> 3. applications can set positive milliseconds value for the property. The
>> SSL connection will be closed in the set milliseconds (or about the maximum
>> value between SO_TIMEOUT and closing timeout), the connection is not grant
>> to be gracefully.
>>
>> What do you think?
>>
>> BTW, please file a CSR as this update is introducing an external system
>> property.
>>
>> Thanks,
>> Xuelei
>>
>> On 9/11/2017 3:29 PM, Rob McKenna wrote:
>>> Hi folks,
>>>
>>> In high latency environments a client SSLSocket with autoClose set to false
>>> can hang indefinitely if it does not correctly recieve a close_notify
>>> from the server.
>>>
>>> In order to rectify this situation I would like to suggest that we
>>> implement an integer JDK property (jdk.tls.closeRetries) which instructs
>>> waitForClose to attempt the close no more times than the value of the
>>> property. I would also suggest that 5 is a reasonable default.
>>>
>>> Note: each attempt times out based on the value of
>>> Socket.setSoTimeout(int timeout).
>>>
>>> Also, the behaviour here is similar to that of waitForClose() when
>>> autoClose is set to true, less the retries.
>>>
>>> http://cr.openjdk.java.net/~robm/8184328/webrev.01/
>>>
>>>    -Rob
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Rob McKenna
But they are inextricably linked regardless.

When we close an SSLSocket it performs a readReply which is subject to
the read timeout. So if no read timeout is specified, the call to
readReply will hang indefinitely. If a read timeout is specified we
would need to maintain two separate timeouts and take each into account
when polling.

What you are suggesting would effectively necessitate a reimplementation
of the close mechanics discarding the read timeout completely. (which
would be a significant enough change in terms of compatibility)

    -Rob

On 13/09/17 03:56, Xuelei Fan wrote:
> On 9/13/2017 8:52 AM, Rob McKenna wrote:
> >W.r.t. a separate timeout, the underlying mechanics of a close already
> >depend on the readTimeout in this situation.
> That's a concerns of mine.  In order to work for your countermeasure,
> applications have to set receiving timeout, and take care of the closing
> timeout when evaluate what's a right timeout value.  The mixing could be
> misleading and not easy to use.
>
> Xuelei
Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Rob McKenna
Perhaps I'm misunderstanding you here. Can you illustrate this a bit
further?

Applications already have to set a read timeout, my proposal doesn't
alter this fact. (i.e. if the read timeout isn't set applications which
call close could potentially get stuck in readReply indefinitely)

    -Rob

On 15/09/17 07:23, Xuelei Fan wrote:

> On 9/15/2017 7:07 AM, Rob McKenna wrote:
> >But they are inextricably linked regardless.
> >
> >When we close an SSLSocket it performs a readReply which is subject to
> >the read timeout. So if no read timeout is specified, the call to
> >readReply will hang indefinitely.
> That's one of what I worried about.  Applications have to set receiving
> timeout in your proposal.  I don't want closing timeout binding to receiving
> timeout.  It's doable and the impact is minimal.
>
> Xuelei
>
> >If a read timeout is specified we
> >would need to maintain two separate timeouts and take each into account
> >when polling.
> >
> >What you are suggesting would effectively necessitate a reimplementation
> >of the close mechanics discarding the read timeout completely. (which
> >would be a significant enough change in terms of compatibility)
> >
> >     -Rob
> >
> >On 13/09/17 03:56, Xuelei Fan wrote:
> >>On 9/13/2017 8:52 AM, Rob McKenna wrote:
> >>>W.r.t. a separate timeout, the underlying mechanics of a close already
> >>>depend on the readTimeout in this situation.
> >>That's a concerns of mine.  In order to work for your countermeasure,
> >>applications have to set receiving timeout, and take care of the closing
> >>timeout when evaluate what's a right timeout value.  The mixing could be
> >>misleading and not easy to use.
> >>
> >>Xuelei
Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Rob McKenna
In reply to this post by Chris Hegarty
On 15/09/17 07:32, Xuelei Fan wrote:

> On 9/15/2017 7:16 AM, Rob McKenna wrote:
> >On 13/09/17 03:52, Xuelei Fan wrote:
> >>
> >>
> >>On 9/13/2017 8:52 AM, Rob McKenna wrote:
> >>>Hi Xuelei,
> >>>
> >>>This behaviour is already exposed via the autoclose boolean in:
> >>>
> >>>https://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLSocketFactory.html#createSocket-java.net.Socket-java.io.InputStream-boolean-
> >>>
> >>I did not get the point.  What do you mean by this behavior is already
> >>exposed?
> >
> >In SSLSocketImpl.closeSocket() waitForClose is only called if autoclose
> >is true. If not the SSLSocket simply calls super.close().
> >
> Did you get something different?  I think waitForClose is only called if
> autoclose is false.
>
> No matter the autoclose is true or false, I'm not sure what do you mean by
> this behavior is already exposed.  Can you describe more about the point.
>

Ack, yes, sorry, I got that backwards. If you set autoclose to true
SSLSocket.close() will skip waitForClose() and simply call
super.close(). When I say this behaviour is already exposed I am
referring to the call to super.close(). (which is effectively what this
fix does after the specified number of attempts via the call to fatal)

    -Rob

> >>
> >>>My position would be that allowing 5 retries allows us to say with some
> >>>confidence that we're not going to get a close_notify from the server.
> >>You have more chance to get the close_notify, but it does not mean you can
> >>always get the close_notify in 5 retries.  When you cannot get it, something
> >>bad happens.
> >
> >No, the property would need to be tuned to suit the networking
> >environment in which the application is deployed. Much the same as a
> >timeout would be.
> >
> >>
> >>>If this is the case I think its reasonable to close the connection.
> >>>
> >>>W.r.t. a separate timeout, the underlying mechanics of a close already
> >>>depend on the readTimeout in this situation. (waiting on a close_notify
> >>>requires performing a read so the read timeout makes sense in this
> >>>context) I'm happy to alter that but I think that the combination of
> >>>a timeout and a retry count is straightforward and lower impact.
> >>>
> >>>In my opinion the default behaviour of potentially hanging indefinitely
> >>>is worse than the alternative here. (bearing in mind that we are closing
> >>>the underlying socket)
> >>>
> >>I did not get the point, are we really closing the underlying socket (or the
> >>layered ssl connection?) for the context of you update?
> >
> >We're calling fatal which calls closeSocket which in turn calls
> >super.close(). (this calls Socket.close() via BaseSSLSocketImpl /
> >SSLSocket) As noted in an earlier reply, this will close the
> >underlying native socket. (I'll perform more testing to verify this)
> >
> When the fatal get called?  I may miss something.  Could you describe the
> scenarios in more details?
>
> Xuelei
>
> >     -Rob
> >
> >>
> >>Xuelei
> >>
> >>>I'll file a CSR as soon as we settle on the direction this fix will
> >>>take.
> >>>
> >>>     -Rob
> >>>
> >>>On 13/09/17 05:52, Xuelei Fan wrote:
> >>>>In theory, there are intermittent compatibility problems as this update may
> >>>>not close the SSL connection over the existing socket layer gracefully, even
> >>>>for high speed networking environments, while the underlying socket is
> >>>>alive.  The impact could be serious in some environment.
> >>>>
> >>>>For safe, I may suggest turn this countermeasure off by default.  And
> >>>>providing options to turn on this countermeasure:
> >>>>1. Close the SSL connection gracefully by default; or
> >>>>2. Close the SSL connection after a timeout.
> >>>>
> >>>>It's hardly to say 5 times receiving timeout is better/safer than timeout
> >>>>once in this context.  As you have already had a system property to control,
> >>>>you may be able to use options other than the customized socket receiving
> >>>>timeout, so that the closing timeout is not mixed/confused/dependent on/with
> >>>>the receiving timeout.
> >>>>
> >>>>Put all together:
> >>>>1. define a closing timeout, for example "jdk.tls.waitForClose".
> >>>>2. the property default value is zero, no behavior changes.
> >>>>3. applications can set positive milliseconds value for the property. The
> >>>>SSL connection will be closed in the set milliseconds (or about the maximum
> >>>>value between SO_TIMEOUT and closing timeout), the connection is not grant
> >>>>to be gracefully.
> >>>>
> >>>>What do you think?
> >>>>
> >>>>BTW, please file a CSR as this update is introducing an external system
> >>>>property.
> >>>>
> >>>>Thanks,
> >>>>Xuelei
> >>>>
> >>>>On 9/11/2017 3:29 PM, Rob McKenna wrote:
> >>>>>Hi folks,
> >>>>>
> >>>>>In high latency environments a client SSLSocket with autoClose set to false
> >>>>>can hang indefinitely if it does not correctly recieve a close_notify
> >>>>>from the server.
> >>>>>
> >>>>>In order to rectify this situation I would like to suggest that we
> >>>>>implement an integer JDK property (jdk.tls.closeRetries) which instructs
> >>>>>waitForClose to attempt the close no more times than the value of the
> >>>>>property. I would also suggest that 5 is a reasonable default.
> >>>>>
> >>>>>Note: each attempt times out based on the value of
> >>>>>Socket.setSoTimeout(int timeout).
> >>>>>
> >>>>>Also, the behaviour here is similar to that of waitForClose() when
> >>>>>autoClose is set to true, less the retries.
> >>>>>
> >>>>>http://cr.openjdk.java.net/~robm/8184328/webrev.01/
> >>>>>
> >>>>>     -Rob
> >>>>>
Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Xuelei Fan-2
In reply to this post by Chris Hegarty
I still prefer to not-depends on socket receiving timeout.  But I'm fine
if you want to move on with it.

As we can close the super socket in the current implementation, it
implies that application can handle it already.  So you may not need the
system property and 5 times retries.  I think it's fine just call
fatal() for the first timeout.

Xuelei

On 9/15/2017 4:32 PM, Xuelei Fan wrote:

> On 9/15/2017 8:22 AM, Rob McKenna wrote:
>> This test calls close directly. (3rd last line in the stack)
>>
>> I believe this is the only possible stack (with the new parameter) once
>> autoclose is set to false. If autoclose is true we'd skip the call to
>> waitForClose and just go directly to Socket.close() unless I'm mistaken.
>>
> I did not find the call to fatal() in the current implementation.  I
> think you mean you added the call to fatal() in your update so that when
> timeout, a fatal() will always get called?
>
> Thinking about two things:
> 1. application have to set receiving timeout in order to  get receiving
> timeout.
> I have a concern about it, as described in other comments.
>
> 2. can we close the super socket?
> It is a surprise to me to close super socket even we don't allocate it.
> It does not feel right to me, but this is the current behavior.  All
> right, I get your point.
>
> Xuelei
>
>>      -Rob
>>
>> On 15/09/17 07:55, Xuelei Fan wrote:
>>> On 9/15/2017 7:41 AM, Rob McKenna wrote:
>>>> On 15/09/17 07:07, Xuelei Fan wrote:
>>>>> On 9/15/2017 7:00 AM, Rob McKenna wrote:
>>>>>> When we call close() on the SSLSocket that calls close() on the
>>>>>> underlying java Socket which closes the native socket.
>>>>>>
>>>>> Sorry, I did not get the point.  Please see the close()
>>>>> implementation of
>>>>> SSLSocket (sun.security.ssl.SSLSocketImpl.close()) about the details.
>>>>
>>>> Running my original test against an instrumented 8u-dev produces the
>>>> following:
>>>>
>>>> java.lang.Exception: Stack trace
>>>>     at java.lang.Thread.dumpStack(Thread.java:1336)
>>>>     at java.net.Socket.close(Socket.java:1491)
>>>>     at
>>>> sun.security.ssl.BaseSSLSocketImpl.close(BaseSSLSocketImpl.java:624)
>>>>     at
>>>> sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1579)
>>>>     at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1980)
>>>>     at
>>>> sun.security.ssl.SSLSocketImpl.waitForClose(SSLSocketImpl.java:1793)
>>>>     at
>>>> sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1592)
>>>>     at
>>>> sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1726)
>>>>     at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:1615)
>>>>     at ssl.SSLClient.close(SSLClient.java:143)
>>>>     at
>>>> ssl.SocketTimeoutCloseHang.ReadHang.testSSLServer(ReadHang.java:77)
>>>>
>>> It is just one possible stacks of many.  There are cases where no
>>> fatal()
>>> get called.  For example, application call close() method directly.
>>>
>>> Xuelei
Reply | Threaded
Open this post in threaded view
|

Re: [RFR] 8184328: JDK8u131 socketRead0 hang at SSL read

Xuelei Fan-2


> On Sep 22, 2017, at 6:06 AM, Rob McKenna <[hidden email]> wrote:
>
> Thanks Xuelei, webrev below:
>
> http://cr.openjdk.java.net/~robm/8184328/webrev.02/
>
Looks fine to me.

> Presumably this won't require a CSR?
>

Agreed.

Xuelei

>    -Rob
>
>> On 15/09/17 04:38, Xuelei Fan wrote:
>> I still prefer to not-depends on socket receiving timeout.  But I'm fine if
>> you want to move on with it.
>>
>> As we can close the super socket in the current implementation, it implies
>> that application can handle it already.  So you may not need the system
>> property and 5 times retries.  I think it's fine just call fatal() for the
>> first timeout.
>>
>> Xuelei
>>
>>> On 9/15/2017 4:32 PM, Xuelei Fan wrote:
>>>> On 9/15/2017 8:22 AM, Rob McKenna wrote:
>>>> This test calls close directly. (3rd last line in the stack)
>>>>
>>>> I believe this is the only possible stack (with the new parameter) once
>>>> autoclose is set to false. If autoclose is true we'd skip the call to
>>>> waitForClose and just go directly to Socket.close() unless I'm mistaken.
>>>>
>>> I did not find the call to fatal() in the current implementation.  I think
>>> you mean you added the call to fatal() in your update so that when
>>> timeout, a fatal() will always get called?
>>>
>>> Thinking about two things:
>>> 1. application have to set receiving timeout in order to  get receiving
>>> timeout.
>>> I have a concern about it, as described in other comments.
>>>
>>> 2. can we close the super socket?
>>> It is a surprise to me to close super socket even we don't allocate it. It
>>> does not feel right to me, but this is the current behavior.  All right, I
>>> get your point.
>>>
>>> Xuelei
>>>
>>>>      -Rob
>>>>
>>>>> On 15/09/17 07:55, Xuelei Fan wrote:
>>>>>> On 9/15/2017 7:41 AM, Rob McKenna wrote:
>>>>>>> On 15/09/17 07:07, Xuelei Fan wrote:
>>>>>>>> On 9/15/2017 7:00 AM, Rob McKenna wrote:
>>>>>>>> When we call close() on the SSLSocket that calls close() on the
>>>>>>>> underlying java Socket which closes the native socket.
>>>>>>>>
>>>>>>> Sorry, I did not get the point.  Please see the close()
>>>>>>> implementation of
>>>>>>> SSLSocket (sun.security.ssl.SSLSocketImpl.close()) about the details.
>>>>>>
>>>>>> Running my original test against an instrumented 8u-dev produces the
>>>>>> following:
>>>>>>
>>>>>> java.lang.Exception: Stack trace
>>>>>>     at java.lang.Thread.dumpStack(Thread.java:1336)
>>>>>>     at java.net.Socket.close(Socket.java:1491)
>>>>>>     at
>>>>>> sun.security.ssl.BaseSSLSocketImpl.close(BaseSSLSocketImpl.java:624)
>>>>>>     at
>>>>>> sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1579)
>>>>>>     at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1980)
>>>>>>     at
>>>>>> sun.security.ssl.SSLSocketImpl.waitForClose(SSLSocketImpl.java:1793)
>>>>>>     at
>>>>>> sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1592)
>>>>>>     at
>>>>>> sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1726)
>>>>>>     at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:1615)
>>>>>>     at ssl.SSLClient.close(SSLClient.java:143)
>>>>>>     at
>>>>>> ssl.SocketTimeoutCloseHang.ReadHang.testSSLServer(ReadHang.java:77)
>>>>>>
>>>>> It is just one possible stacks of many.  There are cases where no
>>>>> fatal()
>>>>> get called.  For example, application call close() method directly.
>>>>>
>>>>> Xuelei