Opened 12 years ago

Closed 9 years ago

#192 closed defect (worksforme)

Buildslave fails to timely detect command completion and hangs for 10-20 minutes

Reported by: hans Owned by:
Priority: major Milestone: 0.8.+
Version: 0.7.6 Keywords:
Cc: exarkun

Description

I am running buildbot 0.7.6 on FreeBSD 6.3 with Python 2.5.1.

buildslave often fails to detect that a command that it started has completed in time. In these cases, it takes about 20 minutes until buildslave continues:

2008/02/24 21:28 +0200 [Broker,client] SlaveBuilder.remote_print(bknr-fbsd-ccl-amd64): message from master: ping
2008/02/24 21:28 +0200 [Broker,client] SlaveBuilder.remote_ping(<SlaveBuilder 'bknr-fbsd-ccl-amd64' at 16127976>)
2008/02/24 21:28 +0200 [Broker,client] <SlaveBuilder 'bknr-fbsd-ccl-amd64' at 16127976>.startBuild
2008/02/24 21:28 +0200 [Broker,client]  startCommand:svn [id 46]
2008/02/24 21:28 +0200 [Broker,client] ShellCommand._startCommand
2008/02/24 21:28 +0200 [Broker,client]  /usr/local/bin/svn update --revision HEAD --non-interactive
2008/02/24 21:28 +0200 [Broker,client]   in dir /home/buildslave/builds/bknr-fbsd-ccl-amd64/build (timeout 1200 secs)
2008/02/24 21:28 +0200 [Broker,client]   watching logfiles {}
2008/02/24 21:28 +0200 [Broker,client]   argv: ['/usr/local/bin/svn', 'update', '--revision', 'HEAD', '--non-interactive']
2008/02/24 21:28 +0200 [Broker,client]  environment: {'USERNAME': 'buildslave', 'SUDO_COMMAND': '/usr/local/bin/buildbot start /home/buildslave/builds/', 'TERM': 'xterm', 'SHELL': '/bin/t
csh', 'MAIL': '/var/mail/hans', 'SUDO_UID': '1000', 'SUDO_GID': '1000', 'LOGNAME': 'buildslave', 'USER': 'buildslave', 'HOME': '/home/hans', 'PATH': '/home/hans/bin:/sbin:/bin:/usr/sbin:/
usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/home/hans/bin', 'SUDO_USER': 'hans', 'DISPLAY': 'localhost:10.0', 'TMPDIR': '/tmp'}
2008/02/24 21:48 +0200 [-] command finished with signal None, exit code 0

In this logfile example, the process started exits after about 30 seconds, yet the "command finished" log entry is shown 20 minutes later. The process spawned is in Zombie state until it is eventually collected.

The problem could be related to http://twistedmatrix.com/trac/ticket/791 - If there is a workaround for buildslave, I'd happily use that.

Change History (6)

comment:1 Changed 12 years ago by hans

I am sometimes seeing the problem with Linux, too - It does not seem to be FreeBSD specific.

The workaround I found was to set the keepaliveInterval and keepaliveTimeout parameters to rather low values (like 10/5 seconds) in the buildslave configuration.

comment:2 Changed 11 years ago by dustin

  • Resolution set to invalid
  • Status changed from new to closed

exarkun marked the relevant bug as invalid, so I'm assuming this is invalid too

comment:3 Changed 11 years ago by hans

  • Resolution invalid deleted
  • Status changed from closed to reopened

What is the "relevant bug" and why is this one "invalid"?

comment:4 Changed 11 years ago by dustin

  • Cc exarkun added

Sorry, I meant the twisted bug in the description. Perhaps exarkun can elaborate on whether this bug is still valid.

comment:5 Changed 11 years ago by dustin

  • Milestone changed from undecided to 0.7.+

comment:6 Changed 9 years ago by dustin

  • Resolution set to worksforme
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.