Opened 8 years ago

Closed 7 years ago

#2045 closed defect (worksforme)

Implicit limit of 1000 builders per slave

Reported by: xrg Owned by:
Priority: minor Milestone: 0.8.+
Version: 0.8.4 Keywords: master-slave
Cc: schueller.p@…

Description

It has been noted that when > ~1000 builders are configured for a single slave, the slave cannot go online. This /must be/ due to a protocol packet size limit, PB transmission size.

The message noted at logs is:

File ".../site-packages/buildbot-0.8.4_pre_741_g2089c5b-py2.6.egg/build

bot/process/slavebuilder.py", line 107, in <lambda>

self.remote.callRemote("setMaster", self))

...

File ".../twisted/spread/flavors.py", line 127, in jellyFor

return "remote", jellier.invoker.registerReference(self)

File ".../twisted/spread/pb.py", line 666, in registerReference

raise Error("Maximum PB reference count exceeded.")

twisted.spread.pb.Error: Maximum PB reference count exceeded.

Change History (5)

comment:1 Changed 8 years ago by dustin

  • Keywords performance added
  • Milestone changed from undecided to 0.8.+
  • Type changed from undecided to defect

comment:2 Changed 8 years ago by armenzg

We might be hitting the same issue in https://bugzilla.mozilla.org/show_bug.cgi?id=712244

Can the PB limit be increased?

2011-12-16 18:22:01-0800 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
2011-12-16 18:22:01-0800 [Broker,client] While trying to connect:
        Traceback from remote host -- Traceback (most recent call last):
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 363, in unpause
            self._runCallbacks()
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 397, in _continue
            self.unpause()
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 363, in unpause
            self._runCallbacks()
        --- <exception caught here> ---
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/pb.py", line 763, in serialize
            return jelly(object, self.security, None, self)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/jelly.py", line 1122, in jelly
            return _Jellier(taster, persistentStore, invoker).jelly(object)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/jelly.py", line 475, in jelly
            return obj.jellyFor(self)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/flavors.py", line 127, in jellyFor
            return "remote", jellier.invoker.registerReference(self)
          File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/pb.py", line 664, in registerReference
            raise Error("Maximum PB reference count exceeded.  "
        twisted.spread.pb.Error: Maximum PB reference count exceeded.  Goodbye.
Last edited 8 years ago by dustin (previous) (diff)

comment:3 Changed 8 years ago by dustin

Well, the value to patch is here:

http://twistedmatrix.com/trac/browser/trunk/twisted/spread/pb.py#L72

so you can monkey-patch that fairly easily at runtime:

from twisted.spread import pb
pb.MAX_BROKER_REFS = 2048

I don't necessarily think that's a good idea!

I'll leave this bug open to track finding a better solution that doesn't burden PB with so many references.

comment:4 Changed 8 years ago by peterschueller

  • Cc schueller.p@… added

ah finally I know why suddenly a part of my builders can no longer attach! :(

Thanks for the quickfix, however it is important to note that this has to be done BOTH in master and slave (quite obvious but I first failed at that nevertheless)

Another issue is the 10 second timeout for "buildbot restart" or "buildbot reconfig" commands.

I have huge amounts of builders, so this always times out which is kind of uncomfortable.

comment:5 Changed 7 years ago by dustin

  • Keywords master-slave added; performance removed
  • Resolution set to worksforme
  • Status changed from new to closed

This workaround is useful for folks running into this limit. The real fix is a new master/slave protocol.

Note: See TracTickets for help on using tickets.