buildslave hangs trying to kill process after "1200 seconds without output"
|Reported by:||hjwp||Owned by:||Callek|
|Version:||0.8.5||Keywords:||windows, sprint, kill|
The buildbot logs show the usual message:
command timed out: 1200 seconds without output, attempting to kill
looking at the console window of the machine that's running the buildslave.bat, we see a message:
ERROR: The process "None" not found. Do you know where this message is coming from? Could it be that buildbot is trying to kill a process that's already died?
It seems that the "attempting to kill" message is the last one that makes it to the logs - Looking through the code in runprocess.py, that doesn't make any sense - it seems to me that there's no way of getting through that function without hitting at least one other log.msg call...
anyway, this hangs the build, and we're forced to go in and reboot the buildslave machine. that then produces one final line in the logs: remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.]
No doubt we should try and write better test code that doesn't cause the 1200 second timeout, but still, it would be good if buildbot didn't hang...
- buildbot-master is running on debian
- buildslave is running windows vista
- seems to be an intermittent problem - maybe one in 5 runs?
- we're using buildslave to run selenium webdriver tests, driven from python 2.7
Change History (10)
comment:1 Changed 5 years ago by dustin
- Keywords windows added
- Milestone changed from undecided to 0.8.+
- Type changed from undecided to defect