Opened 6 years ago

Last modified 4 years ago

#2432 new enhancement

Error shutting down latent buildslave

Reported by: skelly Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.9 Keywords: latent, slave
Cc:

Description (last modified by sa2ajj)

I'm working on a new latent slave. When it came time to shut down the latent slave after the default timeout of ten minutes, this error was seen in the master's twistd.log.

2013-02-01 15:39:53-0600 [-] releaseLocks(<OpenStackLatentBuildSlave 'bit-build-sles11sp1-10'>): []
2013-02-01 15:49:53-0600 [-] disconnecting old slave bit-build-sles11sp1-10 now
2013-02-01 15:49:53-0600 [-] waiting for slave to finish disconnecting
2013-02-01 15:49:53-0600 [Broker,2,172.30.79.49] BuildSlave.detached(bit-build-sles11sp1-10)
2013-02-01 15:49:53-0600 [Broker,2,172.30.79.49] releaseLocks(<OpenStackLatentBuildSlave 'bit-build-sles11sp1-10'>): []
2013-02-01 15:49:56-0600 [Broker,3,172.30.79.49] slave 'bit-build-sles11sp1-10' attaching from IPv4Address(TCP, '172.30.79.49', 37712)
2013-02-01 15:49:56-0600 [Broker,3,172.30.79.49] Slave bit-build-sles11sp1-10 received connection while not trying to substantiate.  Disconnecting.
2013-02-01 15:49:56-0600 [Broker,3,172.30.79.49] waiting for slave to finish disconnecting
2013-02-01 15:49:56-0600 [Broker,3,172.30.79.49] Peer will receive following PB traceback:
2013-02-01 15:49:56-0600 [Broker,3,172.30.79.49] Unhandled Error
        Traceback (most recent call last):
        Failure: exceptions.RuntimeError: Slave bit-build-sles11sp1-10 received connection while not trying to substantiate.  Disconnecting.

Attachments (1)

custom_latent_buildslaves.py (7.5 KB) - added by extremoburo 5 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 6 years ago by dustin

  • Keywords sprint added
  • Milestone changed from undecided to 0.8.+
  • Type changed from undecided to enhancement

I think the bug here is that the slave is forcibly disconnected, *then* shut down by the latent buildslave code. In the interim, it starts back up and tries to connect.

This is harmless - the master deals with it appropriately. However, a better implementation would probably be to graceful the slave, so it doesn't try to re-start.

comment:2 Changed 5 years ago by extremoburo

Skelly,

Did you manage to solve it? I have the same issue, but this error seem to prevent the slave shutdown.

Any help would be really appreciated.

Last edited 5 years ago by extremoburo (previous) (diff)

comment:3 Changed 5 years ago by skelly

No, this isn't solved. As Dustin said, it's harmless. The slave is killed when the underlying machine is shut down; it just occasionally manages to reconnect first.

comment:4 Changed 5 years ago by extremoburo

In my case it is blocking, after the runtime exception the "stop_instance" function is not executed so the slave does not go down.

Changed 5 years ago by extremoburo

comment:5 Changed 5 years ago by extremoburo

Hi all, I can't make it work. Even tough master should manage slave's reconnection , something happens that is not correct. I've attached my custom class which is slightly different from the original ec2latent slave. I suppose it should work if the original one does. I can't really figure out if it is my fault or not. The aim of my class is to start / stop an existing instance of an ami on EC2.

comment:6 Changed 4 years ago by dustin

  • Milestone changed from 0.8.+ to 0.9.+

Ticket retargeted after milestone closed

comment:7 Changed 4 years ago by bshi

This is harmless

Are we certain this is harmless? I have an instance of a GCE-based latent build slave that disappeared on us (got into a bad state) and the only evidence I have been able to track down is the symptom described here.

the master deals with it appropriately.

How does the master handle this case?

comment:8 Changed 4 years ago by sa2ajj

  • Description modified (diff)

@bshi, what version of Buildbot do you use?

comment:9 Changed 4 years ago by bshi

0.8.9

comment:10 Changed 4 years ago by sa2ajj

  • Keywords latent slave added; sprint removed
  • Version changed from 0.8.7p1 to 0.8.9

(updated version up)

Sorry, forgot one more question: is it Open Stack latent build slave or some other?

comment:11 Changed 4 years ago by bshi

This is a custom latent slave for GCE, unfortunately - so it's very possible that some quirk of the implementation is causing this.

Note: See TracTickets for help on using tickets.