RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
Virus checkers and, on more modern Windows OSes, Windows Experience
Service, and other external processes, have been causing trouble when it
comes to tests deleting files.

On a number of occasions in the past retry logic has been added to tests
that need to delete files. Also, the jtreg harness has similar logic
when cleaning up temporary files.

This has reared its head again with:
   8027902: TEST_BUG: java/net/URLClassLoader/closetest/CloseTest.java
and java/net/URLClassLoader/closetest/GetResourceAsStream.java failed
due to dir operation failed.
   8022213: Intermittent test failures in java/net/URLClassLoader

I'd like to add file utility functions to the common jdk/testlibrary,
starting with a "more reliable" delete. I've made minimal changes to the
URLClassloader tests to use this ( more could be done, but this is just
a start ).

http://cr.openjdk.java.net/~chegar/fileUtils/webrev/

The important part here for review is the "API" and impl in FileUtils.

Thanks,
-Chris.
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Michael McMahon
Chris,

Would it be useful to add some instrumentation/logging (to System.err)
if it's taking
more than one iteration to delete a file? We could end up with degraded
test performance if there is a general problem deleting files, and
otherwise would have
no way of understanding what the problem is (eg. up to 7.5 seconds are
allowed to delete the file)

Also, just curious what is the value in throwing InterruptedException
out of the delete() method?
It seems onerous for calling code to have to catch it.

Michael

On 07/11/13 10:05, Chris Hegarty wrote:

> Virus checkers and, on more modern Windows OSes, Windows Experience
> Service, and other external processes, have been causing trouble when
> it comes to tests deleting files.
>
> On a number of occasions in the past retry logic has been added to
> tests that need to delete files. Also, the jtreg harness has similar
> logic when cleaning up temporary files.
>
> This has reared its head again with:
>   8027902: TEST_BUG: java/net/URLClassLoader/closetest/CloseTest.java
> and java/net/URLClassLoader/closetest/GetResourceAsStream.java failed
> due to dir operation failed.
>   8022213: Intermittent test failures in java/net/URLClassLoader
>
> I'd like to add file utility functions to the common jdk/testlibrary,
> starting with a "more reliable" delete. I've made minimal changes to
> the URLClassloader tests to use this ( more could be done, but this is
> just a start ).
>
> http://cr.openjdk.java.net/~chegar/fileUtils/webrev/
>
> The important part here for review is the "API" and impl in FileUtils.
>
> Thanks,
> -Chris.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
On 11/07/2013 11:19 AM, Michael McMahon wrote:
> Chris,
>
> Would it be useful to add some instrumentation/logging (to System.err)
> if it's taking
> more than one iteration to delete a file? We could end up with degraded
> test performance if there is a general problem deleting files, and
> otherwise would have
> no way of understanding what the problem is (eg. up to 7.5 seconds are
> allowed to delete the file)

I had an optional PrintStream param on delete/deleteTree, but removed
it. If the file fails to delete, then you will see all the suppressed
exceptions, but I agree if it fails a number of times and then succeeds
you do not see this.

I'm not sure that it is worth adding PrintStream ( and this is why I
removed it ) since the external interference occurs infrequently and
does not appear to last long. I just wanted to keep this as simply as
possible, but I could be convinced to reintroduce it.

> Also, just curious what is the value in throwing InterruptedException
> out of the delete() method?
> It seems onerous for calling code to have to catch it.

Right handling this is a pain. Since we may now be possibly sleeping
during a delete I wanted jtreg to be able to interrupt the test and have
the test take correct action, rather than just swallowing it. It is
really annoying to see hung tests in logs that do not respond to jtregs
attempts to stop them.

I've also received another comment offline about the method names. They
should be more descriptive, and highlight the difference between these
methods and regular delete. So maybe move to public deleteWithRetry?

-Chris.

>
> Michael
>
> On 07/11/13 10:05, Chris Hegarty wrote:
>> Virus checkers and, on more modern Windows OSes, Windows Experience
>> Service, and other external processes, have been causing trouble when
>> it comes to tests deleting files.
>>
>> On a number of occasions in the past retry logic has been added to
>> tests that need to delete files. Also, the jtreg harness has similar
>> logic when cleaning up temporary files.
>>
>> This has reared its head again with:
>>   8027902: TEST_BUG: java/net/URLClassLoader/closetest/CloseTest.java
>> and java/net/URLClassLoader/closetest/GetResourceAsStream.java failed
>> due to dir operation failed.
>>   8022213: Intermittent test failures in java/net/URLClassLoader
>>
>> I'd like to add file utility functions to the common jdk/testlibrary,
>> starting with a "more reliable" delete. I've made minimal changes to
>> the URLClassloader tests to use this ( more could be done, but this is
>> just a start ).
>>
>> http://cr.openjdk.java.net/~chegar/fileUtils/webrev/
>>
>> The important part here for review is the "API" and impl in FileUtils.
>>
>> Thanks,
>> -Chris.
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Michael McMahon
On 07/11/13 11:34, Chris Hegarty wrote:

> On 11/07/2013 11:19 AM, Michael McMahon wrote:
>> Chris,
>>
>> Would it be useful to add some instrumentation/logging (to System.err)
>> if it's taking
>> more than one iteration to delete a file? We could end up with degraded
>> test performance if there is a general problem deleting files, and
>> otherwise would have
>> no way of understanding what the problem is (eg. up to 7.5 seconds are
>> allowed to delete the file)
>
> I had an optional PrintStream param on delete/deleteTree, but removed
> it. If the file fails to delete, then you will see all the suppressed
> exceptions, but I agree if it fails a number of times and then
> succeeds you do not see this.
>
> I'm not sure that it is worth adding PrintStream ( and this is why I
> removed it ) since the external interference occurs infrequently and
> does not appear to last long. I just wanted to keep this as simply as
> possible, but I could be convinced to reintroduce it.
>

Personally, I would have thought System.err would be okay, but maybe
there are tests that need to run
without this kind of noise. I agree adding a PrintStream arg is not
worth doing.

>> Also, just curious what is the value in throwing InterruptedException
>> out of the delete() method?
>> It seems onerous for calling code to have to catch it.
>
> Right handling this is a pain. Since we may now be possibly sleeping
> during a delete I wanted jtreg to be able to interrupt the test and
> have the test take correct action, rather than just swallowing it. It
> is really annoying to see hung tests in logs that do not respond to
> jtregs attempts to stop them.
>

I wasn't suggesting to swallow the exception - rather to wrap it in an
unchecked exception. The issue
is really only whether callers need to distinguish this situation from
the various other errors and environmental
problems  that will all probably cause a test to be interrupted/stopped.

Michael

> I've also received another comment offline about the method names.
> They should be more descriptive, and highlight the difference between
> these methods and regular delete. So maybe move to public
> deleteWithRetry?
>
> -Chris.
>
>>
>> Michael
>>
>> On 07/11/13 10:05, Chris Hegarty wrote:
>>> Virus checkers and, on more modern Windows OSes, Windows Experience
>>> Service, and other external processes, have been causing trouble when
>>> it comes to tests deleting files.
>>>
>>> On a number of occasions in the past retry logic has been added to
>>> tests that need to delete files. Also, the jtreg harness has similar
>>> logic when cleaning up temporary files.
>>>
>>> This has reared its head again with:
>>>   8027902: TEST_BUG: java/net/URLClassLoader/closetest/CloseTest.java
>>> and java/net/URLClassLoader/closetest/GetResourceAsStream.java failed
>>> due to dir operation failed.
>>>   8022213: Intermittent test failures in java/net/URLClassLoader
>>>
>>> I'd like to add file utility functions to the common jdk/testlibrary,
>>> starting with a "more reliable" delete. I've made minimal changes to
>>> the URLClassloader tests to use this ( more could be done, but this is
>>> just a start ).
>>>
>>> http://cr.openjdk.java.net/~chegar/fileUtils/webrev/
>>>
>>> The important part here for review is the "API" and impl in FileUtils.
>>>
>>> Thanks,
>>> -Chris.
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Alan Bateman
In reply to this post by Chris Hegarty
On 07/11/2013 11:34, Chris Hegarty wrote:
> :
>
> I've also received another comment offline about the method names.
> They should be more descriptive, and highlight the difference between
> these methods and regular delete. So maybe move to public
> deleteWithRetry?
That might be a bit better as otherwise it's not obvious how it differs
from Files.delete. Also to keep the names consistent then
deleteTreeWithRetry should probably be deleteFileTreeWithRetry.

On interrupting the sleep then it looks like the methods aren't
consistent, one will throw InterruptedException, the other will complete
early without an exception and without the interrupt status set.

If you want then you could extend SimpleFileVisitor instead, that would
allow you to drop the preVisitDirectory method.

A passing comment on closetest/Command is that the copyFile method could
use Files.copy with REPLACE_EXISTING.

Also rm_minus_rf (amusing name) doesn't need to use isFile or
isDirectory as FilesUtil.deleteFIleTree will do the right thing.

-Alan.
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
Given both Michael and Alan's comments. I've update the webrev:
   http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/

1) more descriptive method names
2) deleteXXX methods return if interrupted, leaving the
    interrupt status set
3) Use Files.copy with REPLACE_EXISTING
4) Use SimpleFileVisitor, rather than FileVisitor

Thanks,
-Chris.

On 11/07/2013 01:24 PM, Alan Bateman wrote:

> On 07/11/2013 11:34, Chris Hegarty wrote:
>> :
>>
>> I've also received another comment offline about the method names.
>> They should be more descriptive, and highlight the difference between
>> these methods and regular delete. So maybe move to public
>> deleteWithRetry?
> That might be a bit better as otherwise it's not obvious how it differs
> from Files.delete. Also to keep the names consistent then
> deleteTreeWithRetry should probably be deleteFileTreeWithRetry.
>
> On interrupting the sleep then it looks like the methods aren't
> consistent, one will throw InterruptedException, the other will complete
> early without an exception and without the interrupt status set.
>
> If you want then you could extend SimpleFileVisitor instead, that would
> allow you to drop the preVisitDirectory method.
>
> A passing comment on closetest/Command is that the copyFile method could
> use Files.copy with REPLACE_EXISTING.
>
> Also rm_minus_rf (amusing name) doesn't need to use isFile or
> isDirectory as FilesUtil.deleteFIleTree will do the right thing.
>
> -Alan.
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Alan Bateman
On 07/11/2013 14:59, Chris Hegarty wrote:
> Given both Michael and Alan's comments. I've update the webrev:
>   http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>
> 1) more descriptive method names
> 2) deleteXXX methods return if interrupted, leaving the
>    interrupt status set
> 3) Use Files.copy with REPLACE_EXISTING
> 4) Use SimpleFileVisitor, rather than FileVisitor
>
This looks better although interrupting the sleep means that the
deleteXXX will quietly terminate with the interrupt status set (which
could be awkward to diagnose if used with tests that are also using
Thread.interrupt). An alternative might be to just throw the IOException
with InterruptedException as the cause.

-Alan.


Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Dan Xu

On 11/07/2013 11:04 AM, Alan Bateman wrote:
On 07/11/2013 14:59, Chris Hegarty wrote:
Given both Michael and Alan's comments. I've update the webrev:
  http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/

1) more descriptive method names
2) deleteXXX methods return if interrupted, leaving the
   interrupt status set
3) Use Files.copy with REPLACE_EXISTING
4) Use SimpleFileVisitor, rather than FileVisitor

This looks better although interrupting the sleep means that the deleteXXX will quietly terminate with the interrupt status set (which could be awkward to diagnose if used with tests that are also using Thread.interrupt). An alternative might be to just throw the IOException with InterruptedException as the cause.

-Alan.


Hi Chris,

In the method, deleteFileWithRetry0(), it assumes that if any other process is accessing the same file, the delete operation, Files.delete(), will throw out IOException on Windows. But I don't see this assumption is always true when I investigated this issue on intermittent test failures.

When Files.delete() method is called, it finally calls DeleteFile or RemoveDirectory functions based on whether the target is a file or directory. And these Windows APIs only mark the target for deletion on close and return immediately without waiting the operation to be completed. If another process is accessing the file in the meantime, the delete operation does not occur and the target file stays at delete-pending status until that open handle is closed. It basically implies that DeleteFile and RemoveDirectory is like an async operation. Therefore, we cannot assume that the file/directory is deleted after Files.delete() returns or File.delete() returns true.

When checking those intermittently test failures, I find the test normally succeeds on the Files.delete() call. But due to the interference of Anti-virus or other Windows daemon services, the target file changes to delete-pending status. And the immediately following operation fails due the target file still exists, but our tests assume the target file is already gone. Because the delete-pending status of a file usually last for a very short time which depends on the interference source, such failures normally happens when we recursively delete a folder or delete-and-create a file with the same file name at a high frequency.

It is basically a Windows API design or implementation issue. I have logged an enhancement, JDK-8024496, to solve it from Java library layer. Currently, I have two strategies in mind. One is to make the delete operation blocking, which means to make sure the file/directory is deleted before the return. The other is to make sure the delete-pending file does not lead to a failure of subsequent file operations. But they both has pros and cons.

Thank!

-Dan
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
Alan,

 > An alternative might be to just throw the IOException with
 > InterruptedException as the cause.

Perfect. Updated in the new webrev.

Dan,

You are completely correct. I was only catering for the case where
   "java.nio.file.FileSystemException: <your_file>: The process cannot
    access the file because it is being used by another process."

Where the delete "succeeds" then we need to wait until the underlying
platform delete completes, i.e. the file no longer exists.

Updated webrev ( with only the diff from the previous ) :
   http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/

Thanks,
-Chris.


On 08/11/2013 02:26, Dan Xu wrote:

>
> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>> Given both Michael and Alan's comments. I've update the webrev:
>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>
>>> 1) more descriptive method names
>>> 2) deleteXXX methods return if interrupted, leaving the
>>> interrupt status set
>>> 3) Use Files.copy with REPLACE_EXISTING
>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>
>> This looks better although interrupting the sleep means that the
>> deleteXXX will quietly terminate with the interrupt status set (which
>> could be awkward to diagnose if used with tests that are also using
>> Thread.interrupt). An alternative might be to just throw the
>> IOException with InterruptedException as the cause.
>>
>> -Alan.
>>
>>
> Hi Chris,
>
> In the method, deleteFileWithRetry0(), it assumes that if any other
> process is accessing the same file, the delete operation,
> Files.delete(), will throw out IOException on Windows. But I don't see
> this assumption is always true when I investigated this issue on
> intermittent test failures.
>
> When Files.delete() method is called, it finally calls DeleteFile or
> RemoveDirectory functions based on whether the target is a file or
> directory. And these Windows APIs only mark the target for deletion on
> close and return immediately without waiting the operation to be
> completed. If another process is accessing the file in the meantime, the
> delete operation does not occur and the target file stays at
> delete-pending status until that open handle is closed. It basically
> implies that DeleteFile and RemoveDirectory is like an async operation.
> Therefore, we cannot assume that the file/directory is deleted after
> Files.delete() returns or File.delete() returns true.
>
> When checking those intermittently test failures, I find the test
> normally succeeds on the Files.delete() call. But due to the
> interference of Anti-virus or other Windows daemon services, the target
> file changes to delete-pending status. And the immediately following
> operation fails due the target file still exists, but our tests assume
> the target file is already gone. Because the delete-pending status of a
> file usually last for a very short time which depends on the
> interference source, such failures normally happens when we recursively
> delete a folder or delete-and-create a file with the same file name at a
> high frequency.
>
> It is basically a Windows API design or implementation issue. I have
> logged an enhancement, JDK-8024496, to solve it from Java library layer.
> Currently, I have two strategies in mind. One is to make the delete
> operation blocking, which means to make sure the file/directory is
> deleted before the return. The other is to make sure the delete-pending
> file does not lead to a failure of subsequent file operations. But they
> both has pros and cons.
>
> Thank!
>
> -Dan
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

roger riggs
Hi,

Does renaming the file/directory suffer the same delay?

I could see a cleanup mechanism that renames them to hidden files (or
entirely out of the work directory)
and then deletes them.  That would immediately clear the namespace for
tests to proceed.

That technique should work on all platforms.

Roger

On 11/8/2013 9:47 AM, Chris Hegarty wrote:

> Alan,
>
> > An alternative might be to just throw the IOException with
> > InterruptedException as the cause.
>
> Perfect. Updated in the new webrev.
>
> Dan,
>
> You are completely correct. I was only catering for the case where
>   "java.nio.file.FileSystemException: <your_file>: The process cannot
>    access the file because it is being used by another process."
>
> Where the delete "succeeds" then we need to wait until the underlying
> platform delete completes, i.e. the file no longer exists.
>
> Updated webrev ( with only the diff from the previous ) :
>   http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>
> Thanks,
> -Chris.
>
>
> On 08/11/2013 02:26, Dan Xu wrote:
>>
>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>
>>>> 1) more descriptive method names
>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>> interrupt status set
>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>
>>> This looks better although interrupting the sleep means that the
>>> deleteXXX will quietly terminate with the interrupt status set (which
>>> could be awkward to diagnose if used with tests that are also using
>>> Thread.interrupt). An alternative might be to just throw the
>>> IOException with InterruptedException as the cause.
>>>
>>> -Alan.
>>>
>>>
>> Hi Chris,
>>
>> In the method, deleteFileWithRetry0(), it assumes that if any other
>> process is accessing the same file, the delete operation,
>> Files.delete(), will throw out IOException on Windows. But I don't see
>> this assumption is always true when I investigated this issue on
>> intermittent test failures.
>>
>> When Files.delete() method is called, it finally calls DeleteFile or
>> RemoveDirectory functions based on whether the target is a file or
>> directory. And these Windows APIs only mark the target for deletion on
>> close and return immediately without waiting the operation to be
>> completed. If another process is accessing the file in the meantime, the
>> delete operation does not occur and the target file stays at
>> delete-pending status until that open handle is closed. It basically
>> implies that DeleteFile and RemoveDirectory is like an async operation.
>> Therefore, we cannot assume that the file/directory is deleted after
>> Files.delete() returns or File.delete() returns true.
>>
>> When checking those intermittently test failures, I find the test
>> normally succeeds on the Files.delete() call. But due to the
>> interference of Anti-virus or other Windows daemon services, the target
>> file changes to delete-pending status. And the immediately following
>> operation fails due the target file still exists, but our tests assume
>> the target file is already gone. Because the delete-pending status of a
>> file usually last for a very short time which depends on the
>> interference source, such failures normally happens when we recursively
>> delete a folder or delete-and-create a file with the same file name at a
>> high frequency.
>>
>> It is basically a Windows API design or implementation issue. I have
>> logged an enhancement, JDK-8024496, to solve it from Java library layer.
>> Currently, I have two strategies in mind. One is to make the delete
>> operation blocking, which means to make sure the file/directory is
>> deleted before the return. The other is to make sure the delete-pending
>> file does not lead to a failure of subsequent file operations. But they
>> both has pros and cons.
>>
>> Thank!
>>
>> -Dan

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
On 08/11/2013 15:01, roger riggs wrote:
> Hi,
>
> Does renaming the file/directory suffer the same delay?

I have not tried, but I read that MoveFileEx does not suffer from this.

> I could see a cleanup mechanism that renames them to hidden files (or
> entirely out of the work directory)
> and then deletes them. That would immediately clear the namespace for
> tests to proceed.

Given the above, then I do think that this idea has potential, but I
haven't looked into it further, yet. More investigation needed.

We have used the retry technique in a few places in the jdk tests. All I
am trying to do here is prevent everyone from writing their own version
of this.

Maybe we could go with what I have for now (pending reviews), and
revisit later, if needed. I'm scared to open a discussion on where to
move test files to ;-)

-Chris.

>
> That technique should work on all platforms.
>
> Roger
>
> On 11/8/2013 9:47 AM, Chris Hegarty wrote:
>> Alan,
>>
>> > An alternative might be to just throw the IOException with
>> > InterruptedException as the cause.
>>
>> Perfect. Updated in the new webrev.
>>
>> Dan,
>>
>> You are completely correct. I was only catering for the case where
>> "java.nio.file.FileSystemException: <your_file>: The process cannot
>> access the file because it is being used by another process."
>>
>> Where the delete "succeeds" then we need to wait until the underlying
>> platform delete completes, i.e. the file no longer exists.
>>
>> Updated webrev ( with only the diff from the previous ) :
>> http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>>
>> Thanks,
>> -Chris.
>>
>>
>> On 08/11/2013 02:26, Dan Xu wrote:
>>>
>>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>>
>>>>> 1) more descriptive method names
>>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>>> interrupt status set
>>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>>
>>>> This looks better although interrupting the sleep means that the
>>>> deleteXXX will quietly terminate with the interrupt status set (which
>>>> could be awkward to diagnose if used with tests that are also using
>>>> Thread.interrupt). An alternative might be to just throw the
>>>> IOException with InterruptedException as the cause.
>>>>
>>>> -Alan.
>>>>
>>>>
>>> Hi Chris,
>>>
>>> In the method, deleteFileWithRetry0(), it assumes that if any other
>>> process is accessing the same file, the delete operation,
>>> Files.delete(), will throw out IOException on Windows. But I don't see
>>> this assumption is always true when I investigated this issue on
>>> intermittent test failures.
>>>
>>> When Files.delete() method is called, it finally calls DeleteFile or
>>> RemoveDirectory functions based on whether the target is a file or
>>> directory. And these Windows APIs only mark the target for deletion on
>>> close and return immediately without waiting the operation to be
>>> completed. If another process is accessing the file in the meantime, the
>>> delete operation does not occur and the target file stays at
>>> delete-pending status until that open handle is closed. It basically
>>> implies that DeleteFile and RemoveDirectory is like an async operation.
>>> Therefore, we cannot assume that the file/directory is deleted after
>>> Files.delete() returns or File.delete() returns true.
>>>
>>> When checking those intermittently test failures, I find the test
>>> normally succeeds on the Files.delete() call. But due to the
>>> interference of Anti-virus or other Windows daemon services, the target
>>> file changes to delete-pending status. And the immediately following
>>> operation fails due the target file still exists, but our tests assume
>>> the target file is already gone. Because the delete-pending status of a
>>> file usually last for a very short time which depends on the
>>> interference source, such failures normally happens when we recursively
>>> delete a folder or delete-and-create a file with the same file name at a
>>> high frequency.
>>>
>>> It is basically a Windows API design or implementation issue. I have
>>> logged an enhancement, JDK-8024496, to solve it from Java library layer.
>>> Currently, I have two strategies in mind. One is to make the delete
>>> operation blocking, which means to make sure the file/directory is
>>> deleted before the return. The other is to make sure the delete-pending
>>> file does not lead to a failure of subsequent file operations. But they
>>> both has pros and cons.
>>>
>>> Thank!
>>>
>>> -Dan
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Dan Xu
Hi Chris,

In deleteFileWithRetry0(), the following lines

79                 while (true) {
80                     if (Files.notExists(path))
81                          break;

can be combined to

while (Files.exists(path)) {
...

And L99
99                Thread.sleep(RETRY_DELETE_MILLIS);
seems not indented correctly.

Thanks,

-Dan

On 11/08/2013 07:35 AM, Chris Hegarty wrote:
On 08/11/2013 15:01, roger riggs wrote:
Hi,

Does renaming the file/directory suffer the same delay?

I have not tried, but I read that MoveFileEx does not suffer from this.

I could see a cleanup mechanism that renames them to hidden files (or
entirely out of the work directory)
and then deletes them. That would immediately clear the namespace for
tests to proceed.

Given the above, then I do think that this idea has potential, but I haven't looked into it further, yet. More investigation needed.

We have used the retry technique in a few places in the jdk tests. All I am trying to do here is prevent everyone from writing their own version of this.

Maybe we could go with what I have for now (pending reviews), and revisit later, if needed. I'm scared to open a discussion on where to move test files to ;-)

-Chris.


That technique should work on all platforms.

Roger

On 11/8/2013 9:47 AM, Chris Hegarty wrote:
Alan,

> An alternative might be to just throw the IOException with
> InterruptedException as the cause.

Perfect. Updated in the new webrev.

Dan,

You are completely correct. I was only catering for the case where
"java.nio.file.FileSystemException: <your_file>: The process cannot
access the file because it is being used by another process."

Where the delete "succeeds" then we need to wait until the underlying
platform delete completes, i.e. the file no longer exists.

Updated webrev ( with only the diff from the previous ) :
http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/

Thanks,
-Chris.


On 08/11/2013 02:26, Dan Xu wrote:

On 11/07/2013 11:04 AM, Alan Bateman wrote:
On 07/11/2013 14:59, Chris Hegarty wrote:
Given both Michael and Alan's comments. I've update the webrev:
http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/

1) more descriptive method names
2) deleteXXX methods return if interrupted, leaving the
interrupt status set
3) Use Files.copy with REPLACE_EXISTING
4) Use SimpleFileVisitor, rather than FileVisitor

This looks better although interrupting the sleep means that the
deleteXXX will quietly terminate with the interrupt status set (which
could be awkward to diagnose if used with tests that are also using
Thread.interrupt). An alternative might be to just throw the
IOException with InterruptedException as the cause.

-Alan.


Hi Chris,

In the method, deleteFileWithRetry0(), it assumes that if any other
process is accessing the same file, the delete operation,
Files.delete(), will throw out IOException on Windows. But I don't see
this assumption is always true when I investigated this issue on
intermittent test failures.

When Files.delete() method is called, it finally calls DeleteFile or
RemoveDirectory functions based on whether the target is a file or
directory. And these Windows APIs only mark the target for deletion on
close and return immediately without waiting the operation to be
completed. If another process is accessing the file in the meantime, the
delete operation does not occur and the target file stays at
delete-pending status until that open handle is closed. It basically
implies that DeleteFile and RemoveDirectory is like an async operation.
Therefore, we cannot assume that the file/directory is deleted after
Files.delete() returns or File.delete() returns true.

When checking those intermittently test failures, I find the test
normally succeeds on the Files.delete() call. But due to the
interference of Anti-virus or other Windows daemon services, the target
file changes to delete-pending status. And the immediately following
operation fails due the target file still exists, but our tests assume
the target file is already gone. Because the delete-pending status of a
file usually last for a very short time which depends on the
interference source, such failures normally happens when we recursively
delete a folder or delete-and-create a file with the same file name at a
high frequency.

It is basically a Windows API design or implementation issue. I have
logged an enhancement, JDK-8024496, to solve it from Java library layer.
Currently, I have two strategies in mind. One is to make the delete
operation blocking, which means to make sure the file/directory is
deleted before the return. The other is to make sure the delete-pending
file does not lead to a failure of subsequent file operations. But they
both has pros and cons.

Thank!

-Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Chris Hegarty
Thanks Dan,

I'll make the changes before pushing.

-Chris.

On 09/11/2013 05:43, Dan Xu wrote:

> Hi Chris,
>
> In deleteFileWithRetry0(), the following lines
>
> 79 while (true) {
> 80 if (Files.notExists(path))
> 81 break;
>
> can be combined to
>
> while (Files.exists(path)) {
> ...
>
> And L99
>
> 99                Thread.sleep(RETRY_DELETE_MILLIS);
>
> seems not indented correctly.
>
> Thanks,
>
> -Dan
>
> On 11/08/2013 07:35 AM, Chris Hegarty wrote:
>> On 08/11/2013 15:01, roger riggs wrote:
>>> Hi,
>>>
>>> Does renaming the file/directory suffer the same delay?
>>
>> I have not tried, but I read that MoveFileEx does not suffer from this.
>>
>>> I could see a cleanup mechanism that renames them to hidden files (or
>>> entirely out of the work directory)
>>> and then deletes them. That would immediately clear the namespace for
>>> tests to proceed.
>>
>> Given the above, then I do think that this idea has potential, but I
>> haven't looked into it further, yet. More investigation needed.
>>
>> We have used the retry technique in a few places in the jdk tests. All
>> I am trying to do here is prevent everyone from writing their own
>> version of this.
>>
>> Maybe we could go with what I have for now (pending reviews), and
>> revisit later, if needed. I'm scared to open a discussion on where to
>> move test files to ;-)
>>
>> -Chris.
>>
>>>
>>> That technique should work on all platforms.
>>>
>>> Roger
>>>
>>> On 11/8/2013 9:47 AM, Chris Hegarty wrote:
>>>> Alan,
>>>>
>>>> > An alternative might be to just throw the IOException with
>>>> > InterruptedException as the cause.
>>>>
>>>> Perfect. Updated in the new webrev.
>>>>
>>>> Dan,
>>>>
>>>> You are completely correct. I was only catering for the case where
>>>> "java.nio.file.FileSystemException: <your_file>: The process cannot
>>>> access the file because it is being used by another process."
>>>>
>>>> Where the delete "succeeds" then we need to wait until the underlying
>>>> platform delete completes, i.e. the file no longer exists.
>>>>
>>>> Updated webrev ( with only the diff from the previous ) :
>>>> http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>>>>
>>>> Thanks,
>>>> -Chris.
>>>>
>>>>
>>>> On 08/11/2013 02:26, Dan Xu wrote:
>>>>>
>>>>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>>>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>>>>
>>>>>>> 1) more descriptive method names
>>>>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>>>>> interrupt status set
>>>>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>>>>
>>>>>> This looks better although interrupting the sleep means that the
>>>>>> deleteXXX will quietly terminate with the interrupt status set (which
>>>>>> could be awkward to diagnose if used with tests that are also using
>>>>>> Thread.interrupt). An alternative might be to just throw the
>>>>>> IOException with InterruptedException as the cause.
>>>>>>
>>>>>> -Alan.
>>>>>>
>>>>>>
>>>>> Hi Chris,
>>>>>
>>>>> In the method, deleteFileWithRetry0(), it assumes that if any other
>>>>> process is accessing the same file, the delete operation,
>>>>> Files.delete(), will throw out IOException on Windows. But I don't see
>>>>> this assumption is always true when I investigated this issue on
>>>>> intermittent test failures.
>>>>>
>>>>> When Files.delete() method is called, it finally calls DeleteFile or
>>>>> RemoveDirectory functions based on whether the target is a file or
>>>>> directory. And these Windows APIs only mark the target for deletion on
>>>>> close and return immediately without waiting the operation to be
>>>>> completed. If another process is accessing the file in the
>>>>> meantime, the
>>>>> delete operation does not occur and the target file stays at
>>>>> delete-pending status until that open handle is closed. It basically
>>>>> implies that DeleteFile and RemoveDirectory is like an async
>>>>> operation.
>>>>> Therefore, we cannot assume that the file/directory is deleted after
>>>>> Files.delete() returns or File.delete() returns true.
>>>>>
>>>>> When checking those intermittently test failures, I find the test
>>>>> normally succeeds on the Files.delete() call. But due to the
>>>>> interference of Anti-virus or other Windows daemon services, the
>>>>> target
>>>>> file changes to delete-pending status. And the immediately following
>>>>> operation fails due the target file still exists, but our tests assume
>>>>> the target file is already gone. Because the delete-pending status
>>>>> of a
>>>>> file usually last for a very short time which depends on the
>>>>> interference source, such failures normally happens when we
>>>>> recursively
>>>>> delete a folder or delete-and-create a file with the same file name
>>>>> at a
>>>>> high frequency.
>>>>>
>>>>> It is basically a Windows API design or implementation issue. I have
>>>>> logged an enhancement, JDK-8024496, to solve it from Java library
>>>>> layer.
>>>>> Currently, I have two strategies in mind. One is to make the delete
>>>>> operation blocking, which means to make sure the file/directory is
>>>>> deleted before the return. The other is to make sure the
>>>>> delete-pending
>>>>> file does not lead to a failure of subsequent file operations. But
>>>>> they
>>>>> both has pros and cons.
>>>>>
>>>>> Thank!
>>>>>
>>>>> -Dan
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Dan Xu
Thank you, Chris. Others look good to me. Thanks for helping solve this
hard problem!

-Dan

On 11/11/2013 02:41 AM, Chris Hegarty wrote:

> Thanks Dan,
>
> I'll make the changes before pushing.
>
> -Chris.
>
> On 09/11/2013 05:43, Dan Xu wrote:
>> Hi Chris,
>>
>> In deleteFileWithRetry0(), the following lines
>>
>> 79 while (true) {
>> 80 if (Files.notExists(path))
>> 81 break;
>>
>> can be combined to
>>
>> while (Files.exists(path)) {
>> ...
>>
>> And L99
>>
>> 99                Thread.sleep(RETRY_DELETE_MILLIS);
>>
>> seems not indented correctly.
>>
>> Thanks,
>>
>> -Dan
>>
>> On 11/08/2013 07:35 AM, Chris Hegarty wrote:
>>> On 08/11/2013 15:01, roger riggs wrote:
>>>> Hi,
>>>>
>>>> Does renaming the file/directory suffer the same delay?
>>>
>>> I have not tried, but I read that MoveFileEx does not suffer from this.
>>>
>>>> I could see a cleanup mechanism that renames them to hidden files (or
>>>> entirely out of the work directory)
>>>> and then deletes them. That would immediately clear the namespace for
>>>> tests to proceed.
>>>
>>> Given the above, then I do think that this idea has potential, but I
>>> haven't looked into it further, yet. More investigation needed.
>>>
>>> We have used the retry technique in a few places in the jdk tests. All
>>> I am trying to do here is prevent everyone from writing their own
>>> version of this.
>>>
>>> Maybe we could go with what I have for now (pending reviews), and
>>> revisit later, if needed. I'm scared to open a discussion on where to
>>> move test files to ;-)
>>>
>>> -Chris.
>>>
>>>>
>>>> That technique should work on all platforms.
>>>>
>>>> Roger
>>>>
>>>> On 11/8/2013 9:47 AM, Chris Hegarty wrote:
>>>>> Alan,
>>>>>
>>>>> > An alternative might be to just throw the IOException with
>>>>> > InterruptedException as the cause.
>>>>>
>>>>> Perfect. Updated in the new webrev.
>>>>>
>>>>> Dan,
>>>>>
>>>>> You are completely correct. I was only catering for the case where
>>>>> "java.nio.file.FileSystemException: <your_file>: The process cannot
>>>>> access the file because it is being used by another process."
>>>>>
>>>>> Where the delete "succeeds" then we need to wait until the underlying
>>>>> platform delete completes, i.e. the file no longer exists.
>>>>>
>>>>> Updated webrev ( with only the diff from the previous ) :
>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>>>>>
>>>>> Thanks,
>>>>> -Chris.
>>>>>
>>>>>
>>>>> On 08/11/2013 02:26, Dan Xu wrote:
>>>>>>
>>>>>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>>>>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>>>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>>>>>
>>>>>>>> 1) more descriptive method names
>>>>>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>>>>>> interrupt status set
>>>>>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>>>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>>>>>
>>>>>>> This looks better although interrupting the sleep means that the
>>>>>>> deleteXXX will quietly terminate with the interrupt status set
>>>>>>> (which
>>>>>>> could be awkward to diagnose if used with tests that are also using
>>>>>>> Thread.interrupt). An alternative might be to just throw the
>>>>>>> IOException with InterruptedException as the cause.
>>>>>>>
>>>>>>> -Alan.
>>>>>>>
>>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> In the method, deleteFileWithRetry0(), it assumes that if any other
>>>>>> process is accessing the same file, the delete operation,
>>>>>> Files.delete(), will throw out IOException on Windows. But I
>>>>>> don't see
>>>>>> this assumption is always true when I investigated this issue on
>>>>>> intermittent test failures.
>>>>>>
>>>>>> When Files.delete() method is called, it finally calls DeleteFile or
>>>>>> RemoveDirectory functions based on whether the target is a file or
>>>>>> directory. And these Windows APIs only mark the target for
>>>>>> deletion on
>>>>>> close and return immediately without waiting the operation to be
>>>>>> completed. If another process is accessing the file in the
>>>>>> meantime, the
>>>>>> delete operation does not occur and the target file stays at
>>>>>> delete-pending status until that open handle is closed. It basically
>>>>>> implies that DeleteFile and RemoveDirectory is like an async
>>>>>> operation.
>>>>>> Therefore, we cannot assume that the file/directory is deleted after
>>>>>> Files.delete() returns or File.delete() returns true.
>>>>>>
>>>>>> When checking those intermittently test failures, I find the test
>>>>>> normally succeeds on the Files.delete() call. But due to the
>>>>>> interference of Anti-virus or other Windows daemon services, the
>>>>>> target
>>>>>> file changes to delete-pending status. And the immediately following
>>>>>> operation fails due the target file still exists, but our tests
>>>>>> assume
>>>>>> the target file is already gone. Because the delete-pending status
>>>>>> of a
>>>>>> file usually last for a very short time which depends on the
>>>>>> interference source, such failures normally happens when we
>>>>>> recursively
>>>>>> delete a folder or delete-and-create a file with the same file name
>>>>>> at a
>>>>>> high frequency.
>>>>>>
>>>>>> It is basically a Windows API design or implementation issue. I have
>>>>>> logged an enhancement, JDK-8024496, to solve it from Java library
>>>>>> layer.
>>>>>> Currently, I have two strategies in mind. One is to make the delete
>>>>>> operation blocking, which means to make sure the file/directory is
>>>>>> deleted before the return. The other is to make sure the
>>>>>> delete-pending
>>>>>> file does not lead to a failure of subsequent file operations. But
>>>>>> they
>>>>>> both has pros and cons.
>>>>>>
>>>>>> Thank!
>>>>>>
>>>>>> -Dan
>>>>
>>