Opened 8 years ago

Last modified 5 years ago

#1792 new enhancement

BuildStep timeout detection does not kill child processes

Reported by: cortana Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.2 Keywords: kill


I have noticed my buildslave machine becoming overloaded several times recently. I believe this is caused by the following sequence of events:

  1. 'make check' is run as part of a build
  2. buildbot sends SIGKILL to the build process because it takes too long
  3. only the top-level process is killed: child processes are not killed, so the test suite continues to run!
  4. buildbot kicks off another build...

The result is 8-9 copies of the test suite from improperly killed-off builds hanging around, until I SSH in and kill all buildslave processes by hand.

Possible solutions:

  • when killing a BuildStep?, issue it a SIGINT, instead of SIGKILL. In my case, this would have allowed make to kill off all child processes properly, as if I had hit Ctrl+C in a terminal.
  • to guard against buggy build systems, however, you probably want to send a SIGINT, then wait 10 seconds, then send a SIGKILL to the buildstep *and all its child processes*. Either by hand, or using some kind of session group magic from POSIX.
  • I believe that in modern Linux kernels, the same can be achieved with 'cgroups'. Each build would go into its own cgroup, and then the buildslave can kill all processes in a cgroup at once.

Workaround: increase 'timeout' property of the 'make check' BuildStep?.

Change History (4)

comment:1 Changed 8 years ago by dustin

  • Keywords kill added
  • Milestone changed from undecided to 0.8.+
  • Type changed from defect to enhancement

Yes, in general, killing is very difficult to get right, particularly across platforms. It's not very configurable right now, and that should be improved.

comment:2 Changed 6 years ago by tom.prince

  • Milestone changed from 0.8.+ to 0.8.8

comment:3 Changed 6 years ago by tom.prince

  • Milestone changed from 0.8.8 to 0.8.+

comment:4 Changed 5 years ago by dustin

  • Milestone changed from 0.8.+ to 0.9.+

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.