Opened 5 years ago

Last modified 5 years ago

#2506 new enhancement

Add lock for a group of slaves

Reported by: A1kmm Owned by:
Priority: major Milestone: undecided
Version: master Keywords: locks
Cc: dwlocks, rutsky.vladimir@…

Description

In our buildbot configuration (https://github.com/cellmlapi/cellml-build/blob/1fcf9dce/build-master/master.cfg) we have multiple buildslaves that share hardware (for example, we have a MSVC10 buildslave that runs with an environment set up for MSVC10, and an MSVC11 buildslave set up for MSVC11, both of which share the same physical hardware). I expect it is very common for people to run multiple buildslaves in VMs or lxc containers or similar on the same physical hardware.

However, buildbot's built in locking support is limited to either having a global lock (MasterLock?), or having one lock per slave (SlaveLock?). This makes it hard to limit the number of builds by having a lock shared across slaves that run on the same hardware, but not across slaves that run on different hardware.

As a workaround, we put the following in our master.cfg, but ideally something like this should be built in to buildbot:

# This code lets us enforce a count across a group of buildslaves.
class RealSlavegroupLock:
    def __init__(self, lockid):
        self.name = lockid.name
        self.maxCount = lockid.maxCount
        self.maxCountForSlavegroup = lockid.maxCountForSlavegroup
        self.slaveToSlavegroup = lockid.slaveToSlavegroup
        self.description = "<SlavegroupLock(%s, %s, %s, %s)>" % (self.name,
                                                                 self.maxCount,
                                                                 self.maxCountForSlavegroup,
                                                                 self.slaveToSlavegroup)
        self.locks = {}
    def __repr__(self):
        return self.description
    def getLock(self, slave):
        if isinstance(slave, SlaveBuilder):
            slavename = slave.slave.slavename
        else:
            slavename = slave.slavename
        # slavegroup defaults to slavename
        slavegroup = self.slaveToSlavegroup.get(slavename, slavename)
        if not self.locks.has_key(slavegroup):
            maxCount = self.maxCountForSlavegroup.get(slavegroup,
                                                      self.maxCount)
            lock = self.locks[slavegroup] = BaseLock(self.name, maxCount)
            desc = "<SlavegroupLock(%s, %s)[%s] %d>" % (self.name, maxCount,
                                                        slavegroup, id(lock))
            lock.description = desc
            self.locks[slavegroup] = lock
        return self.locks[slavegroup]

# Note: this only inherits from SlaveLock to get past an assert. Ideally the
# buildbot code would be changed to allow other lock types.
class SlavegroupLock(SlaveLock):
    """I am a semaphore that limits simultaneous actions on each group of
    buildslaves.

    Builds and BuildSteps can declare that they wish to claim me as they run.
    Only a limited number of such builds or steps will be able to run
    simultaneously on any given group of buildslave. By default this number is one,
    but my maxCount parameter can be raised to allow two or three or more
    operations to happen across a group of buildslaves at the same time.

    Use this to protect a resource that is shared among all the builds taking
    place on a group of slaves that share resources, for example to limit CPU
    or memory load on an underpowered machine that runs multiple buildslaves.

    Each buildslave can be assigned to a group using the dictionary
    slaveToSlavegroup; buildslaves that do not appear in this dictionary are
    placed in the slavegroup with a name equal to the name of the buildslave.

    Each group of buildslaves will get an independent copy of this semaphore. By
    default each copy will use the same owner count (set with maxCount), but
    you can provide maxCountForSlavegroup with a dictionary that maps
    slavegroup to owner count, to allow some slavegroups more parallelism than
    others.
    """

    compare_attrs = ['name', 'maxCount', '_maxCountForSlavegroupList',
                     '_slaveToSlavegroupList']
    lockClass = RealSlavegroupLock
    def __init__(self, name, maxCount=1, maxCountForSlavegroup={}, slaveToSlavegroup={}):
        self.name = name
        self.maxCount = maxCount
        self.maxCountForSlavegroup = maxCountForSlavegroup
        self.slaveToSlavegroup = slaveToSlavegroup
        # for comparison purposes, turn this dictionary into a stably-sorted
        # list of tuples
        self._maxCountForSlavegroupList = self.maxCountForSlavegroup.items()
        self._maxCountForSlavegroupList.sort()
        self._maxCountForSlavegroupList = tuple(self._maxCountForSlavegroupList)
        self._slaveToSlavegroupList = self.slaveToSlavegroup.items()
        self._slaveToSlavegroupList.sort()
        self._slaveToSlavegroupList = tuple(self._slaveToSlavegroupList)

Change History (2)

comment:1 Changed 5 years ago by dustin

  • Cc dwlocks added

We solved something similar at Zmanda, where we ran builders on a massively oversubscribed ESXi host. There were a dozen or more VMs, but realistically only two could build at once. What we did is to customize the buildslave's canStartBuild method. The only tricky bit is that you need to call master.botmaster.maybeStartBuildsForSlave any time a slave might be able to start a build (so, when a build on another slave on the same ESXi host finishes).

Dan Locks may be able to share some of this config with you.

comment:2 Changed 5 years ago by rutsky

  • Cc rutsky.vladimir@… added
Note: See TracTickets for help on using tickets.