Opened 9 years ago

Closed 8 years ago

#2045 closed defect (worksforme)

Implicit limit of 1000 builders per slave

Reported by: xrg Owned by:
Priority: minor Milestone: 0.8.+
Version: 0.8.4 Keywords: master-slave
Cc: schueller.p@…

Description

It has been noted that when > ~1000 builders are configured for a single slave, the slave cannot go online. This /must be/ due to a protocol packet size limit, PB transmission size.

The message noted at logs is:

File ".../site-packages/buildbot-0.8.4_pre_741_g2089c5b-py2.6.egg/build

bot/process/slavebuilder.py", line 107, in <lambda>

self.remote.callRemote("setMaster", self))

...

File ".../twisted/spread/flavors.py", line 127, in jellyFor

return "remote", jellier.invoker.registerReference(self)

File ".../twisted/spread/pb.py", line 666, in registerReference

raise Error("Maximum PB reference count exceeded.")

twisted.spread.pb.Error: Maximum PB reference count exceeded.

Change History (5)

comment:1 Changed 9 years ago by dustin

  • Keywords performance added
  • Milestone changed from undecided to 0.8.+
  • Type changed from undecided to defect

comment:2 Changed 9 years ago by armenzg

We might be hitting the same issue in https://bugzilla.mozilla.org/show_bug.cgi?id=712244

Can the PB limit be increased?

2011-12-16 18:22:01-0800 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective 2011-12-16 18:22:01-0800 [Broker,client] While trying to connect:

Traceback from remote host -- Traceback (most recent call last):

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 363, in unpause

self._runCallbacks()

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks

self.result = callback(self.result, *args, kw)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 397, in _continue

self.unpause()

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 363, in unpause

self._runCallbacks()

--- <exception caught here> ---

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks

self.result = callback(self.result, *args, kw)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/pb.py", line 763, in serialize

return jelly(object, self.security, None, self)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/jelly.py", line 1122, in jelly

return _Jellier(taster, persistentStore, invoker).jelly(object)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/jelly.py", line 475, in jelly

return obj.jellyFor(self)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/flavors.py", line 127, in jellyFor

return "remote", jellier.invoker.registerReference(self)

File "/builds/buildbot/tests1-windows/lib/python2.6/site-packages/twisted/spread/pb.py", line 664, in registerReference

raise Error("Maximum PB reference count exceeded. "

twisted.spread.pb.Error: Maximum PB reference count exceeded. Goodbye.

Version 0, edited 9 years ago by armenzg (next)

comment:3 Changed 9 years ago by dustin

Well, the value to patch is here:

http://twistedmatrix.com/trac/browser/trunk/twisted/spread/pb.py#L72

so you can monkey-patch that fairly easily at runtime:

from twisted.spread import pb
pb.MAX_BROKER_REFS = 2048

I don't necessarily think that's a good idea!

I'll leave this bug open to track finding a better solution that doesn't burden PB with so many references.

comment:4 Changed 9 years ago by peterschueller

  • Cc schueller.p@… added

ah finally I know why suddenly a part of my builders can no longer attach! :(

Thanks for the quickfix, however it is important to note that this has to be done BOTH in master and slave (quite obvious but I first failed at that nevertheless)

Another issue is the 10 second timeout for "buildbot restart" or "buildbot reconfig" commands.

I have huge amounts of builders, so this always times out which is kind of uncomfortable.

comment:5 Changed 8 years ago by dustin

  • Keywords master-slave added; performance removed
  • Resolution set to worksforme
  • Status changed from new to closed

This workaround is useful for folks running into this limit. The real fix is a new master/slave protocol.

Note: See TracTickets for help on using tickets.