Ticket #692 (closed defect: worksforme)

Opened 3 years ago

Last modified 3 years ago

exceptions.AssertionError: assert s.number == self.nextBuildNumber - 1

Reported by: rackamx Owned by: dustin
Priority: major Milestone: 0.8.0
Version: 0.7.11 Keywords:
Cc:

Description

I am getting this assertion with 0.7.11p1. After that, the slave is hosed, waiting on a lock that is never released. I have to restart the master to get out of this.

2010-01-22 17:03:42-0800 [-] acquireLocks(step <Build foobuild>, locks [(<SlaveLock(foobuild, 10)[fooslave] 168798508>, <buildbot.locks.LockAccess instance at 0x9fcf9cc>)])
2010-01-22 17:03:42-0800 [-] Unhandled error in Deferred:
2010-01-22 17:03:42-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 186, in addCallbacks
            self._runCallbacks()
          File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 289, in _continue
            self.unpause()
          File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 285, in unpause
            self._runCallbacks()
        --- <exception caught here> ---
          File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/process/base.py", line 387, in _startBuild_2
            self.build_status.buildStarted(self)
          File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/status/builder.py", line 1232, in buildStarted
            self.builder.buildStarted(self)
          File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/status/builder.py", line 1832, in buildStarted
            assert s.number == self.nextBuildNumber - 1
        exceptions.AssertionError:

Change History

comment:1 Changed 3 years ago by dustin

  • Type changed from undecided to defect
  • Milestone changed from undecided to 0.8.0

Can you post the relevant portions of your config?

Have you tried this with 0.7.12?

comment:2 Changed 3 years ago by rackamx

I haven't tried 0.7.12 yet, but I will next time this problem happened. It happened once only so far, after more than 100days of uptime. There isn't anything really special about this config, the only thing might be how the locks are dealt with, to get per build locks. Example below. Thanks for your help, and thanks for the awesome work on this program !

c['slaves'] = [BuildSlave('fooslave', 'foopwd', max_builds = 5)]

################################################################################
#                              Schedulers                                      #
################################################################################
c['schedulers'] = [
    Scheduler('footest2', svn_mod_foo, 4 * 3600, ['footest2']),
    Nightly('foobuild', ['foobuild'], 0, 12),
    Nightly('foobuild_reg', ['foobuild_reg'], 0, 13),
    Nightly('foobuild_reg_unit', ['foobuild_reg_unit'], 0, 3),
    Nightly('footest1', ['footest1'], 0, 2),
]

footest2 = BuildFactory()
footest1 = BuildFactory()
foobuild = BuildFactory()
foobuild = BuildFactory()
foobuild_reg = BuildFactory()
foobuild_reg_unit = BuildFactory()

################################################################################
#                               Locks                                          #
################################################################################
locks  = {}
rlocks = {}
wlocks = {}

def getRWLock(name, ldict, mode) :
    if not ldict.has_key(name) :
        if not locks.has_key(name) :
            locks[name] = SlaveLock(name, maxCount = 10)
        ldict[name]      = LockAccess(locks[name], mode)
        ldict[name].name = name
    return ldict[name]

def getRLock(name) :
    ''' Get a R lock, means the builder is using this guy '''
    return getRWLock(name, rlocks, 'counting')

def getWLock(name) :
    ''' Get a W lock, means the builder is generating this guy '''
    return getRWLock(name, wlocks, 'exclusive')

################################################################################
#                                 Builders                                     #
################################################################################
c['builders'] = [
    {
        'name': 'foobuild',
        'slavename': 'fooslave', 
        'builddir': 'foobuild',
        'factory': foobuild,
        'locks': [getWLock('foobuild')],
    },
    {
        'name': 'foobuild_reg',
        'slavename': 'fooslave', 
        'builddir': 'foobuild_reg',
        'factory': foobuild_reg,
        'locks': [getRLock('foobuild'), getWLock('foobuild_reg')],
    },
    {
        'name': 'foobuild_reg_unit',
        'slavename': 'fooslave', 
        'builddir': 'foobuild_reg_unit',
        'factory': foobuild_reg_unit,
        'locks': [getRLock('foobuild'), getWLock('foobuild_reg')],
    },
    {
        'name': 'footest1',
        'slavename': 'fooslave', 
        'builddir': 'footest1',
        'factory': footest1,
        'locks': [getRLock('foobuild')],
    },
    {
        'name': 'footest2',
        'slavename': 'fooslave', 
        'builddir': 'footest2',
        'factory': footest2,
        'locks': [getRLock('foobuild')],
    },
 ]

comment:3 Changed 3 years ago by dustin

I suspect the locks, too. I don't understand them well, myself, so I'll have to look carefully.

comment:4 Changed 3 years ago by dustin

Can you see if this is still the case in 0.8.0?

comment:5 Changed 3 years ago by dustin

  • Owner set to dustin
  • Status changed from new to assigned

comment:6 Changed 3 years ago by Dustin J. Mitchell

Update interlock documentation (refs #692)

Changeset: 6064f76167ea5c90ce74381ab8fd1559ed55b26e

comment:7 Changed 3 years ago by dustin

  • Status changed from assigned to closed
  • Resolution set to worksforme

OK, I suspect that this happened because several builds for the same builder started off at nearly the same time, invalidating that assert. Since the error was from 0.7.11 in code that's changed by now, that's about the best we can do.

My confidence in the locks is renewed, having reviewed the code and the documentation.

comment:8 Changed 3 years ago by Dustin J. Mitchell

Update interlock documentation (refs #692)

Changeset: 6064f76167ea5c90ce74381ab8fd1559ed55b26e

Note: See TracTickets for help on using tickets.