Ticket #751 (reopened enhancement)
Sending SIGTERM before SIGKILL to a remote shell command that has timed out
| Reported by: | Fabrice | Owned by: | |
|---|---|---|---|
| Priority: | minor | Milestone: | 0.8.+ |
| Version: | 0.7.12 | Keywords: | kill, sprint |
| Cc: |
Description
I have a test step that does not produce output for one hour (the test or one of its subtest hangs for some reason). My buildbot is configured to timeout this step/command after 3600 seconds of inactivity on stdout or stderr. Thus buildbot sends correctly, as expected, a signal SIGKILL(9) to it and writes in the log:
command timed out: 3600 seconds without output, killing pid <PID> process killed by signal 9
However, my problem is the following. There is no way for me to catch/trap SIGKILL(9) in my test step process running on the slave and thus, I am missing test logs. Is it possible to make buildbot send a couple of SIGTERM(15) signals before sending a SIGKILL(9) signal?
Change History
comment:2 Changed 3 years ago by eric@…
I wanted this behavior today.
I'm attempting to debug why our test script hangs randomly on one builder. I guess I'll have to write the wrapper as suggested. Would be nice if buildbot would just send SIGTERM followed by SIGKILL, even in rapid succession. That would allow me to print the stack traces of my multi-threaded python program at time of termination instead of having it just die.
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
The buildbot timeout is more of a system-stability thing than an expected-behavior thing, so it goes for the kill immediately. In other words, your testing regime should not depend on this behavior.
You could run your scripts in a wrapper that kills them "gently" after 1h, and bump the buildbot timeout up to 1.5h. You could also probably hack the buildslave to send the SIGTERMs, if you'd like.