Opened 2 years ago

Last modified 3 weeks ago

#2935 new defect

Buildbot gives up on EC2 spot instance requests before EC2 does

Reported by: bgilbert Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.9 Keywords: ec2, ec2cost
Cc:

Description

When Eight receives a spot request status code other than pending-evaluation, pending-fulfillment, or fulfilled, it concludes that the spot request has failed and gives up on it. However, several status codes are non-terminal, and EC2 may still fulfill the request at a later time. Nine knows that price-too-low is non-terminal, and so cancels the request when giving up on it, but does not do this for other non-terminal status codes.

As a result, EC2 may launch instances that are not tracked by Buildbot. These will remain running and costing money until the spot price exceeds the bid price, at which point EC2 will automatically terminate the instance. To avoid this, Buildbot needs to cancel spot requests when it gives up on them.

2014-10-09 01:17:00-0400 [-] EC2LatentBuildSlave el6-amd64 requesting spot instance
2014-10-09 01:18:06-0400 [-] EC2LatentBuildSlave el6-amd64 has waited 1 minutes for spot request sir-022rlcrg
2014-10-09 01:18:37-0400 [-] EC2LatentBuildSlave el6-amd64 failed to fulfill spot request sir-022rlcrg with status capacity-oversubscribed
2014-10-09 01:18:37-0400 [-] Buildslave el6-amd64 detached from testsuite-el6-amd64
2014-10-09 01:18:37-0400 [-] while preparing slavebuilder:
        Traceback (most recent call last):
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1155, in gotResult
            _inlineCallbacks(r, g, deferred)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
        --- <exception caught here> ---
          File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/process/builder.py", line 335, in _startBuildFor
            ready = yield slavebuilder.prepare(self.builder_status, build)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/threadpool.py", line 196, in _worker
            result = context.call(ctx, function, *args, **kwargs)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
            return func(*args,**kw)
          File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/buildslave/ec2.py", line 364, in _request_spot_instance
            request = self._wait_for_request(reservations[0])
          File "/home/buildbot/env/local/lib/python2.7/site-packages/buildbot/buildslave/ec2.py", line 443, in _wait_for_request
            request.id, request.status)
        buildbot.interfaces.LatentBuildSlaveFailedToSubstantiate: (u'sir-022rlcrg', <Status: capacity-oversubscribed>)
        
2014-10-09 01:18:37-0400 [-] slave <Build testsuite-el6-amd64> can't build <LatentSlaveBuilder builder='testsuite-el6-amd64'> after all; re-queueing the request

[...]

2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] slave 'el6-amd64' attaching from IPv4Address(TCP, '127.0.0.1', 59401)
2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] Slave el6-amd64 received connection while not trying to substantiate.  Disconnecting.

Change History (3)

comment:1 Changed 2 years ago by dustin

  • Milestone changed from undecided to 0.9.+

comment:2 Changed 16 months ago by dustin

  • Keywords ec2cost added
Note: See TracTickets for help on using tickets.