Ticket #692 (closed defect: worksforme)
exceptions.AssertionError: assert s.number == self.nextBuildNumber - 1
| Reported by: | rackamx | Owned by: | dustin |
|---|---|---|---|
| Priority: | major | Milestone: | 0.8.0 |
| Version: | 0.7.11 | Keywords: | |
| Cc: |
Description
I am getting this assertion with 0.7.11p1. After that, the slave is hosed, waiting on a lock that is never released. I have to restart the master to get out of this.
2010-01-22 17:03:42-0800 [-] acquireLocks(step <Build foobuild>, locks [(<SlaveLock(foobuild, 10)[fooslave] 168798508>, <buildbot.locks.LockAccess instance at 0x9fcf9cc>)])
2010-01-22 17:03:42-0800 [-] Unhandled error in Deferred:
2010-01-22 17:03:42-0800 [-] Unhandled Error
Traceback (most recent call last):
File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 186, in addCallbacks
self._runCallbacks()
File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 289, in _continue
self.unpause()
File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 285, in unpause
self._runCallbacks()
--- <exception caught here> ---
File "/home/buildbot/local-0.7.11p1/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/process/base.py", line 387, in _startBuild_2
self.build_status.buildStarted(self)
File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/status/builder.py", line 1232, in buildStarted
self.builder.buildStarted(self)
File "/home/buildbot/local-0.7.11p1/lib/python/buildbot/status/builder.py", line 1832, in buildStarted
assert s.number == self.nextBuildNumber - 1
exceptions.AssertionError:
Change History
comment:1 Changed 3 years ago by dustin
- Type changed from undecided to defect
- Milestone changed from undecided to 0.8.0
comment:2 Changed 3 years ago by rackamx
I haven't tried 0.7.12 yet, but I will next time this problem happened. It happened once only so far, after more than 100days of uptime. There isn't anything really special about this config, the only thing might be how the locks are dealt with, to get per build locks. Example below. Thanks for your help, and thanks for the awesome work on this program !
c['slaves'] = [BuildSlave('fooslave', 'foopwd', max_builds = 5)]
################################################################################
# Schedulers #
################################################################################
c['schedulers'] = [
Scheduler('footest2', svn_mod_foo, 4 * 3600, ['footest2']),
Nightly('foobuild', ['foobuild'], 0, 12),
Nightly('foobuild_reg', ['foobuild_reg'], 0, 13),
Nightly('foobuild_reg_unit', ['foobuild_reg_unit'], 0, 3),
Nightly('footest1', ['footest1'], 0, 2),
]
footest2 = BuildFactory()
footest1 = BuildFactory()
foobuild = BuildFactory()
foobuild = BuildFactory()
foobuild_reg = BuildFactory()
foobuild_reg_unit = BuildFactory()
################################################################################
# Locks #
################################################################################
locks = {}
rlocks = {}
wlocks = {}
def getRWLock(name, ldict, mode) :
if not ldict.has_key(name) :
if not locks.has_key(name) :
locks[name] = SlaveLock(name, maxCount = 10)
ldict[name] = LockAccess(locks[name], mode)
ldict[name].name = name
return ldict[name]
def getRLock(name) :
''' Get a R lock, means the builder is using this guy '''
return getRWLock(name, rlocks, 'counting')
def getWLock(name) :
''' Get a W lock, means the builder is generating this guy '''
return getRWLock(name, wlocks, 'exclusive')
################################################################################
# Builders #
################################################################################
c['builders'] = [
{
'name': 'foobuild',
'slavename': 'fooslave',
'builddir': 'foobuild',
'factory': foobuild,
'locks': [getWLock('foobuild')],
},
{
'name': 'foobuild_reg',
'slavename': 'fooslave',
'builddir': 'foobuild_reg',
'factory': foobuild_reg,
'locks': [getRLock('foobuild'), getWLock('foobuild_reg')],
},
{
'name': 'foobuild_reg_unit',
'slavename': 'fooslave',
'builddir': 'foobuild_reg_unit',
'factory': foobuild_reg_unit,
'locks': [getRLock('foobuild'), getWLock('foobuild_reg')],
},
{
'name': 'footest1',
'slavename': 'fooslave',
'builddir': 'footest1',
'factory': footest1,
'locks': [getRLock('foobuild')],
},
{
'name': 'footest2',
'slavename': 'fooslave',
'builddir': 'footest2',
'factory': footest2,
'locks': [getRLock('foobuild')],
},
]
comment:3 Changed 3 years ago by dustin
I suspect the locks, too. I don't understand them well, myself, so I'll have to look carefully.
comment:6 Changed 3 years ago by Dustin J. Mitchell
Update interlock documentation (refs #692)
Changeset: 6064f76167ea5c90ce74381ab8fd1559ed55b26e
comment:7 Changed 3 years ago by dustin
- Status changed from assigned to closed
- Resolution set to worksforme
OK, I suspect that this happened because several builds for the same builder started off at nearly the same time, invalidating that assert. Since the error was from 0.7.11 in code that's changed by now, that's about the best we can do.
My confidence in the locks is renewed, having reviewed the code and the documentation.
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
Can you post the relevant portions of your config?
Have you tried this with 0.7.12?