Ticket #825 (closed defect: fixed)
Zombie builds stuck in FileDownload
| Reported by: | catlee | Owned by: | |
|---|---|---|---|
| Priority: | critical | Milestone: | 0.8.+ |
| Version: | 0.7.9 | Keywords: | |
| Cc: | mook...moz+net.buildbot@… |
Description
We have slaves running buildbot 0.7.9 attached to a master running 0.7.10. On occasion, a slave will get disconnected from the master while doing a FileDownload? step.
When it reconnects to the master, the master notices a duplicate connection, and attempts to disconnect the old slave. It doesn't manage to stop the old build, however, so you can end up with one slave running old builds for weeks at a time until the master is finally restarted.
It is impossible to Stop Build these old builds.
Change History
comment:3 Changed 21 months ago by dustin
- Priority changed from minor to critical
We'll need more evidence to track this down, I think.
comment:5 Changed 18 months ago by dustin
- Milestone changed from 0.8.2 to 0.8.3
Hopefully, if this is a slave-side problem, my slave-side tests will tease it out.
comment:7 Changed 11 months ago by Dustin J. Mitchell
- Status changed from new to closed
- Resolution set to fixed
Allow transfer steps to be interrupted
This also collects .finished and .interrupted into a parent class on the slave side. Fixes #825.
Changeset: c8d1ee63f6789d63a97ef39e62e7dd9d9a912562
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
serendipitously, I saw a similar problem here today. Here's what I see in the web interface:
(view as text) Traceback (most recent call last): Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionDone?'>: Connection was closed cleanly. ]
For unrelated reasons, our buildslave logs for this no longer exist, so that's all I've got.