Opened 5 years ago

Last modified 5 years ago

Windows: rmdirRecursive() fails if a contained file is opened

Reported by: Owned by: dl major undecided 0.8.9 windows dl.metrohm@…, reto.sonderegger@…

Description

rmdirRecursive() for runtime.platformType == 'win32' assumes (and requires), that after a successful os.remove(file_path) no file with path file_path exists (as long as it is not recreated by an other process). In fact, this is not guaranteed on Windows: http://www.osronline.com/article.cfm?article=245.

In our environment, rmdirRecursive() in the context of a SVN checkout ("clobber") quite often failed (about every third time), probably due to files temporarily opened by the installed antivirus suite: exceptions.WindowsError?: [Error 145] Das Verzeichnis ist nicht leer.

The only solution I could find, is to try several times after some seconds of waiting, ignoring this errors at every try except the last one.

comment:1 Changed 5 years ago by dl

• Cc dl.metrohm@… added
• Summary changed from Windows: rmdirRecursive() fails if a contained files is opened to Windows: rmdirRecursive() fails if a contained file is opened

comment:2 Changed 5 years ago by dustin

We attempted to address this in https://github.com/buildbot/buildbot/pull/540 but didn't end up with a complete, tested patch.

If you can pick that up and run with it, that would be great.

comment:3 Changed 5 years ago by dl

Wow, that was quick. I will certainly test it (next week, presumably).

comment:4 Changed 5 years ago by strangecorner

• Cc reto.sonderegger@… added

comment:5 Changed 5 years ago by dl

Here are my thoughts about https://github.com/buildbot/buildbot/pull/540:

• rmdir works as expected (Windows 7), when the directory to be removed contains read-only files.
• rmdir and rmdirRecursive() take approximately the same time to delete a large directory.
• rmdir fails (Windows 7) with a meaningful message, when the directory to be removed contains opened files:

path\to\file - Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird.

It removes all files and directories it can and reports all the others.

• Looking at rmdir of Windows 7 with API Monitor shows:
• It uses FindFirstFileW(), FindNextFileW() to iterate over the files/directories
• If a file has the attribute FILE_ATTRIBUTE_READONLY, it removes it with SetFileAttributesW()
• It uses DeleteFileW()/RemoveDirectoryW() to delete files and directories
So, I cannot see any magic involved here either. It looks to me, that using rmdir over rmdirRecursive() does at most mitigate the problem but not solve it.
• The change addresses only native Windows, but the problem occurs on Cygwin too.
• Calling rmdir as an external program involves the entire Windows escaping hell, as paths have to be passed as commandline arguments. An escaping error together with rmdir can cause a catastrophic failure. In test scripts I do not see much testing of the escaping for Windows. Maybe it was tested thoroughly, but if not, it should not be used for "dangerous" commands.

Personally, I would prefer a Pythonic solution (without calling external binaries) and more thorough (scripted) testing of Windows specific code. Maybe in future Python version we see a better shutil.rmtree anyway.

comment:6 Changed 5 years ago by dl

I wanted to attach an adapted commands/util.py that worked here under native Windows 7 and Cygwin. (it retries several times and ignores certain errors except for the last try). Unfortunately, attaching files does not seem to work. How should I provide the code?

comment:7 follow-up: ↓ 8 Changed 5 years ago by sa2ajj

Attaching does not work in what way?

Answering your question: if you paste the script at, for example, http://paste.pound-python.org/, I'll attach the file as soon as I see a notification on the IRC channel about this ticket being updated.

comment:8 in reply to: ↑ 7 Changed 5 years ago by dl

Replying to sa2ajj:

Attaching does not work in what way?

Error when pressing "Datei anhängen":

Oops …
Trac hat einen internen Fehler festgestellt:
IndexError: pop from empty list

Es ist ein interner Fehler aufgetreten. Bitte benachrichtigen Sie Ihren Trac-Administrator und fügen Sie eine Beschreibung an, wie der Fehler reproduziert werden kann.

Zu diesem Zweck können Sie auch ein Ticket   .

Die Aktion, die den Fehler ausgelöst hat, war:
GET: /attachment/ticket/2885/

TracGuide — Die Trac Benutzer- und Administrations-Anleitung


Answering your question: if you paste the script at, for example, http://paste.pound-python.org/, I'll attach the file as soon as I see a notification on the IRC channel about this ticket being updated.

Changed 5 years ago by sa2ajj

commands/util.py that worked here under native Windows 7 and Cygwin

comment:9 follow-up: ↓ 10 Changed 5 years ago by sa2ajj

Thanks!

(Somehow attaching files worked for me.)

comment:10 in reply to: ↑ 9 Changed 5 years ago by dl

Replying to sa2ajj: [...]

(Somehow attaching files worked for me.)

Tried attaching with different Tracs language settings. It worked with "English (United Kingdom)" and "español" but not with "Deutsch".

Version 0, edited 5 years ago by dl (next)

comment:11 Changed 5 years ago by sa2ajj

@dl, thanks for checking out different language settings.

Based on your findings I opened #2894.

comment:12 follow-up: ↓ 13 Changed 5 years ago by dustin

Returning to the topic of the bug, I totally agree that additional testing would be great. I'm not sure how best to test the interaction with virus checkers, for example. We've also had a hell of a time getting donated Windows systems to stay up and functional, so we don't currently have any windows coverage.

Regarding the patch, I like the form -- there's really no point in retrying individual files, rather than retrying the whole remove a few times.

There's still the issue of using time.sleep, though -- that will cause the entire buildslave process to stop, which means any other commands that are in progress at the time will freeze. It'd be better to use reactor.callLater, which in turn requires rmdirRecursive to return a Deferred.

comment:13 in reply to: ↑ 12 Changed 5 years ago by dl

Replying to dustin:

Returning to the topic of the bug, I totally agree that additional testing would be great. I'm not sure how best to test the interaction with virus checkers, for example. We've also had a hell of a time getting donated Windows systems to stay up and functional, so we don't currently have any windows coverage.

How could I (or we as a company) best support the improvement of Windows test coverage of the buildslave?

There's still the issue of using time.sleep, though -- that will cause the entire buildslave process to stop, which means any other commands that are in progress at the time will freeze. It'd be better to use reactor.callLater, which in turn requires rmdirRecursive to return a Deferred.

Is the return of a Deferred a viable option or should this be avoided?

comment:14 Changed 5 years ago by sa2ajj

Return of Deferred is the desired option.

comment:15 follow-up: ↓ 16 Changed 5 years ago by dustin

At some point, we should have the capacity to run Windows instances in project-owned infrastructure (in a virtual machine). We can also easily set up the master to talk to external buildslaves, if your company wanted to host one. The issue we've had with this practice is that the buildslaves themselves tend to fail in various weird ways, and because we do not have the knowledge or access to fix them, they tend to stay down when this occurs. Send me an email (dustin@…) if you'd like to discuss hosting a buildslave?

comment:16 in reply to: ↑ 15 Changed 5 years ago by dl

Replying to dustin:

Send me an email (dustin@…) if you'd like to discuss hosting a buildslave?

I did a week ago but got no response so far. If you're still interested please let me know (soon).

Note: See TracTickets for help on using tickets.