Still plagued by "Agent communication error"

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Still plagued by "Agent communication error"

Martin Buchholz-3
We continue to see rare "Agent communication error" problems when running jtreg tests.
We believe something has gone wrong in the JDK under test, but we never get any details.
The failure is correlated with running specific tests, and specific JDKs.
Anecdotally, it appears to be more common with fastdebug JDKs.

A sample snippet:

TEST RESULT: Error. Agent communication error: java.net.SocketException: Broken pipe (Write failed); check console log for any additional details
Reply | Threaded
Open this post in threaded view
|

Re: Still plagued by "Agent communication error"

Alan Bateman
On 13/08/2019 23:39, Martin Buchholz wrote:

> We continue to see rare "Agent communication error" problems when
> running jtreg tests.
> We believe something has gone wrong in the JDK under test, but we
> never get any details.
> The failure is correlated with running specific tests, and specific JDKs.
> Anecdotally, it appears to be more common with fastdebug JDKs.
>
> A sample snippet:
>
> TEST RESULT: Error. Agent communication error:
> java.net.SocketException: Broken pipe (Write failed); check console
> log for any additional details
Are the agent VMs crashing? Maybe the fastdebug builds are hitting
asserts earlier than the crash with product bits. Have you looked at
hs_err logs or core files on the systems?

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: Still plagued by "Agent communication error"

Martin Buchholz-3


On Tue, Aug 13, 2019 at 11:49 PM Alan Bateman <[hidden email]> wrote:
On 13/08/2019 23:39, Martin Buchholz wrote:
> We continue to see rare "Agent communication error" problems when
> running jtreg tests.
> We believe something has gone wrong in the JDK under test, but we
> never get any details.
> The failure is correlated with running specific tests, and specific JDKs.
> Anecdotally, it appears to be more common with fastdebug JDKs.
>
> A sample snippet:
>
> TEST RESULT: Error. Agent communication error:
> java.net.SocketException: Broken pipe (Write failed); check console
> log for any additional details
Are the agent VMs crashing? Maybe the fastdebug builds are hitting
asserts earlier than the crash with product bits. Have you looked at
hs_err logs or core files on the systems?

It's not so easy for us to get hs_err log files; we only get jtreg stdout/stderr.
While we could/should improve our infrastructure ...
jtreg provides helpful diagnostics in other cases, e.g. thread dump on test timeout, so it would be good to be helpful even when communication with the agent breaks down.  Maybe jtreg should use one of those VM flags to get agent VMs to send failure data to stderr?
 
Reply | Threaded
Open this post in threaded view
|

Re: Still plagued by "Agent communication error"

Martin Buchholz-3
Here's another way to look at it:  We have a failure to communicate with an agent process. That's probably a subprocess.  It probably died with some serious error, a hotspot crash or OOM.  Probably the agent process printed something helpful to stderr before it terminated.  What happened to that output?  The failure is probably related to whatever test it was supposed to be running at the time, so I'd want that information in e.g. the jtr file.

On Wed, Aug 14, 2019 at 12:28 AM Martin Buchholz <[hidden email]> wrote:


On Tue, Aug 13, 2019 at 11:49 PM Alan Bateman <[hidden email]> wrote:
On 13/08/2019 23:39, Martin Buchholz wrote:
> We continue to see rare "Agent communication error" problems when
> running jtreg tests.
> We believe something has gone wrong in the JDK under test, but we
> never get any details.
> The failure is correlated with running specific tests, and specific JDKs.
> Anecdotally, it appears to be more common with fastdebug JDKs.
>
> A sample snippet:
>
> TEST RESULT: Error. Agent communication error:
> java.net.SocketException: Broken pipe (Write failed); check console
> log for any additional details
Are the agent VMs crashing? Maybe the fastdebug builds are hitting
asserts earlier than the crash with product bits. Have you looked at
hs_err logs or core files on the systems?

It's not so easy for us to get hs_err log files; we only get jtreg stdout/stderr.
While we could/should improve our infrastructure ...
jtreg provides helpful diagnostics in other cases, e.g. thread dump on test timeout, so it would be good to be helpful even when communication with the agent breaks down.  Maybe jtreg should use one of those VM flags to get agent VMs to send failure data to stderr?