Ticket #500 (closed defect: fixed)
buildslaves disconnect under heavy load
| Reported by: | ipv6guru | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 0.8.+ |
| Version: | 0.7.10 | Keywords: | sourceforge 1500669 |
| Cc: |
Description
Sidnei da Silva reports a problem with buildslaves disconnecting under heavy loads. Based upon his analysis, I think the problem is when two Builders are running on the same buildslave, the first is in the middle of a build (and sending a lot of network traffic to report on its results), when the second one goes to start a build. The first thing each build does is to do a short-timeout (5 second) ping of the buildslave, to make sure it is really still there. In this case, the wire is saturated with the traffic from the first build, so the second builder appears to be unresponsive. The second builder severs the connection because it thinks it is broken, which kills off the first build.
The quick ping is useful, so the "right" fix for this involves changing the way slavepings are done to ping the slave as a whole rather than any particular builder. The master-side representative needs to keep track of how frequently it has heard from the buildslave (i.e. the last time *any* message crossed that wire) and provide a "ping if necessary" function. The pre-build ping should use this instead. Submitted: Brian Warner ( warner ) - 2006-06-05 08:10
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
Moved from sourceforge