Buildbot gives up on EC2 spot instance requests before EC2 does
|Reported by:||bgilbert||Owned by:|
When Eight receives a spot request status code other than pending-evaluation, pending-fulfillment, or fulfilled, it concludes that the spot request has failed and gives up on it. However, several status codes are non-terminal, and EC2 may still fulfill the request at a later time. Nine knows that price-too-low is non-terminal, and so cancels the request when giving up on it, but does not do this for other non-terminal status codes.
As a result, EC2 may launch instances that are not tracked by Buildbot. These will remain running and costing money until the spot price exceeds the bid price, at which point EC2 will automatically terminate the instance. To avoid this, Buildbot needs to cancel spot requests when it gives up on them.
2014-10-09 01:17:00-0400 [-] EC2LatentBuildSlave el6-amd64 requesting spot instance 2014-10-09 01:18:06-0400 [-] EC2LatentBuildSlave el6-amd64 has waited 1 minutes for spot request sir-022rlcrg 2014-10-09 01:18:37-0400 [-] EC2LatentBuildSlave el6-amd64 failed to fulfill spot request sir-022rlcrg with status capacity-oversubscribed 2014-10-09 01:18:37-0400 [-] Buildslave el6-amd64 detached from testsuite-el6-amd64 2014-10-09 01:18:37-0400 [-] while preparing slavebuilder: Traceback (most recent call last): File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1155, in gotResult _inlineCallbacks(r, g, deferred) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) --- <exception caught here> --- File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/process/builder.py", line 335, in _startBuildFor ready = yield slavebuilder.prepare(self.builder_status, build) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/threadpool.py", line 196, in _worker result = context.call(ctx, function, *args, **kwargs) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/buildslave/ec2.py", line 364, in _request_spot_instance request = self._wait_for_request(reservations) File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/buildslave/ec2.py", line 443, in _wait_for_request request.id, request.status) buildbot.interfaces.LatentBuildSlaveFailedToSubstantiate: (u'sir-022rlcrg', <Status: capacity-oversubscribed>) 2014-10-09 01:18:37-0400 [-] slave <Build testsuite-el6-amd64> can't build <LatentSlaveBuilder builder='testsuite-el6-amd64'> after all; re-queueing the request [...] 2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] slave 'el6-amd64' attaching from IPv4Address(TCP, '127.0.0.1', 59401) 2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] Slave el6-amd64 received connection while not trying to substantiate. Disconnecting.