Ticket #395 (closed defect: worksforme)

Opened 5 years ago

Last modified 3 years ago

when i change the vcs executable, buildslave stops being able to invoke it until I restart buildslave

Reported by: zooko Owned by: zooko
Priority: major Milestone: 0.7.12
Version: 0.7.9 Keywords:
Cc:

Description

It seems like if I run:

cd /usr/local/stow
stow -D darcs-2.1.0plus
stow darcs-2.1.2

Then all my buildslaves subsequently become unable to execute darcs:

 http://allmydata.org/buildbot/builders/zooko%20yukyuk%20hardy/builds/378

The next time I restart those buildslaves, they are able to find it again.

On a hunch I went and put back the original stowed darcs:

cd /usr/local/stow
stow -D darcs-2.1.2
stow darcs-2.1.0plus

And forced a build, but the buildslave was still unable to find it, even though now now the exactly same executable is in the same location, being pointed at by a new symlink which is in the same location.

Anyway, I went ahead and restarted that buildslave and now it works again.

Change History

comment:1 Changed 4 years ago by dustin

  • Status changed from new to closed
  • Resolution set to invalid

I don't really know how Stow works, but I'm guessing this is some peculiarity of that application. Buildbot just runs the executable via PATH, after all. I'll be interested if you can give some more details -- show logfiles, add debug output, etc.

comment:2 Changed 4 years ago by zooko

  • Status changed from closed to reopened
  • Resolution invalid deleted

GNU stow is very simple -- it just makes a symlink from /usr/local/bin/darcs to /usr/local/stow/darcs-2.2.0/bin/darcs (and similarly for all other files in a package. So from buildbot's perspective, the PATH includes /usr/local/bin, and that dir had no entry named "darcs" or else had a symlink to /usr/local/stow/darcs-2.1.0/bin/darcs at one point, and then at a later point it had a symlink to /usr/local/stow/darcs-2.2.0/bin/darcs. I can't figure out how this can go wrong, either, and I've scouted around in the buildbot source code and can't see it, unless Twisted or Python is doing some bad caching when buildbot invokes the Twisted subprocess utility.

However, it just happened again: builds 36 and 37 couldn't find the "darcs" executable, but build 38 after the buildslave was restarted could. Unfortunately I don't remember what, if anything I changed in the host system before build 36, but I am sure that I didn't change anything in the host system between build 37 and build 38.

 http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu-hardy-yukyuk/builds/36  http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu-hardy-yukyuk/builds/37  http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu-hardy-yukyuk/builds/38

The twistd.log from builds #36 and #37 includes:

2009-03-02 17:13:01-0700 [Broker,client] <SlaveBuilder 'linux-amd64-ubuntu-hardy-yukyuk' at 16545480>.startBuild
2009-03-02 17:13:02-0700 [Broker,client] Peer will receive following PB traceback:
2009-03-02 17:13:02-0700 [Broker,client] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/banana.py", line 146, in gotItem
            self.callExpressionReceived(item)
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/banana.py", line 111, in callExpressionReceived
            self.expressionReceived(obj)
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/pb.py", line 514, in expressionReceived
            method(*sexp[1:])
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/pb.py", line 826, in proto_message
            self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
        --- <exception caught here> ---
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/pb.py", line 840, in _recvMessage
            netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
          File "/usr/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-linux-x86_64.egg/twisted/spread/flavors.py", line 114, in remoteMessageReceived
            state = method(*args, **kw)
          File "/usr/lib/python2.5/site-packages/buildbot-0.7.9-py2.5.egg/buildbot/slave/bot.py", line 168, in remote_startCommand
            self.command = factory(self, stepId, args)
          File "/usr/lib/python2.5/site-packages/buildbot-0.7.9-py2.5.egg/buildbot/slave/commands.py", line 667, in __init__
            self.setup(args)
          File "/usr/lib/python2.5/site-packages/buildbot-0.7.9-py2.5.egg/buildbot/slave/commands.py", line 1633, in setup
            self.vcexe = getCommand("darcs")
          File "/usr/lib/python2.5/site-packages/buildbot-0.7.9-py2.5.egg/buildbot/slave/commands.py", line 58, in getCommand
            raise RuntimeError("Couldn't find executable for '%s'" % name)
        exceptions.RuntimeError: Couldn't find executable for 'darcs'

and from build #38 I see:

2009-03-05 16:55:42-0700 [Broker,client] <SlaveBuilder 'linux-amd64-ubuntu-hardy-yukyuk' at 16578392>.startBuild
2009-03-05 16:55:42-0700 [Broker,client]  startCommand:darcs [id 46]
2009-03-05 16:55:42-0700 [Broker,client] ShellCommand._startCommand
2009-03-05 16:55:42-0700 [Broker,client]  rm -rf /home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp/linux-amd64-ubuntu-hardy-yukyuk/build
2009-03-05 16:55:42-0700 [Broker,client]   in dir /home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp/linux-amd64-ubuntu-hardy-yukyuk (timeout 1200 secs)
2009-03-05 16:55:42-0700 [Broker,client]   watching logfiles {}
2009-03-05 16:55:42-0700 [Broker,client]   argv: ['rm', '-rf', '/home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp/linux-amd64-ubuntu-hardy-yukyuk/build']
2009-03-05 16:55:42-0700 [Broker,client]  environment: {'LANG': 'en_US.UTF-8', 'TERM': 'screen', 'SHELL': '/bin/bash', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'SHLVL': '1', 'OLDPWD': '/home/pycryptoppbuildslave', 'LESSOPEN': '| /usr/bin/lesspipe %s', 'HISTCONTROL': 'ignoreboth', 'PWD': '/home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp', 'LOGNAME': 'pycryptoppbuildslave', 'USER': 'pycryptoppbuildslave', 'MAIL': '/var/mail/pycryptoppbuildslave', 'PATH': '/usr/local/bin:/usr/bin:/bin:/usr/games', 'LS_COLORS': 'no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:', 'HOME': '/home/pycryptoppbuildslave', '_': '/usr/bin/buildbot'}
2009-03-05 16:55:42-0700 [Broker,client]   closing stdin
2009-03-05 16:55:42-0700 [Broker,client]   using PTY: False
2009-03-05 16:55:43-0700 [-] command finished with signal None, exit code 0, elapsedTime: 0.647294
2009-03-05 16:55:43-0700 [-] ShellCommand._startCommand
2009-03-05 16:55:43-0700 [-]  /usr/local/bin/darcs get --verbose --partial --repo-name build http://allmydata.org/source/pycryptopp/trunk-hashedformat
2009-03-05 16:55:43-0700 [-]   in dir /home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp/linux-amd64-ubuntu-hardy-yukyuk (timeout 1200 secs)
2009-03-05 16:55:43-0700 [-]   watching logfiles {}
2009-03-05 16:55:43-0700 [-]   argv: ['/usr/local/bin/darcs', 'get', '--verbose', '--partial', '--repo-name', 'build', 'http://allmydata.org/source/pycryptopp/trunk-hashedformat']
2009-03-05 16:55:43-0700 [-]  environment: {'LANG': 'en_US.UTF-8', 'TERM': 'screen', 'SHELL': '/bin/bash', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'SHLVL': '1', 'OLDPWD': '/home/pycryptoppbuildslave', 'LESSOPEN': '| /usr/bin/lesspipe %s', 'HISTCONTROL': 'ignoreboth', 'PWD': '/home/pycryptoppbuildslave/yukyuk-linux-hardy-amd64-pycryptopp', 'LOGNAME': 'pycryptoppbuildslave', 'USER': 'pycryptoppbuildslave', 'MAIL': '/var/mail/pycryptoppbuildslave', 'PATH': '/usr/local/bin:/usr/bin:/bin:/usr/games', 'LS_COLORS': 'no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:', 'HOME': '/home/pycryptoppbuildslave', '_': '/usr/bin/buildbot'}
2009-03-05 16:55:43-0700 [-]   closing stdin
2009-03-05 16:55:43-0700 [-]   using PTY: False
2009-03-05 16:55:47-0700 [-] command finished with signal None, exit code 0, elapsedTime: 4.313054

comment:3 Changed 4 years ago by dustin

  • Owner set to zooko
  • Status changed from reopened to new
  • Milestone changed from undecided to 0.7.11

Ah, I wonder if this is a problem with twisted.python.procutils.which (although I don't see any kind of caching in Twisted-8.1.0), or with some kind of caching of stat results in Python (related to the old statcache module)?

Can you do a little experiement with twisted.python.procutils.which to see if, indeed, something is evaluating symlinks and caching the results?

comment:4 Changed 4 years ago by dustin

  • Milestone changed from 0.7.11 to 0.7.12

comment:5 Changed 4 years ago by zooko

By the way, this still happening.

zooko@tahoe-slave-edgy:~$ buildbot --version
Buildbot version: 0.7.4
Twisted version: 2.4.0

I installed /usr/local/bin/darcs which is a symlink to /usr/local/stow/blahblahblah/bin/darcs-blahblahblah , and then I removed /usr/bin/darcs, and then builds get this exception:

 http://allmydata.org/buildbot/builders/edgy/builds/2283/steps/darcs/logs/stdio

  File "/usr/lib/python2.4/site-packages/buildbot/slave/bot.py", line 164, in remote_startCommand
    self.command = factory(self, stepId, args)
  File "/usr/lib/python2.4/site-packages/buildbot/slave/commands.py", line 604, in __init__
    self.setup(args)
  File "/usr/lib/python2.4/site-packages/buildbot/slave/commands.py", line 1235, in setup
    self.vcexe = getCommand("darcs")
  File "/usr/lib/python2.4/site-packages/buildbot/slave/commands.py", line 53, in getCommand
    raise RuntimeError("Couldn't find executable for '%s'" % name)
exceptions.RuntimeError: Couldn't find executable for 'darcs'

After restarting the buildslave, it worked fine. In answer to your question: well, sure I'd like to experiment with twisted.python.procutils.which sometime, but not right now ...

comment:6 Changed 3 years ago by dustin

I suspect that this should be a Twisted bug.

comment:7 Changed 3 years ago by warner

I see no code to perform caching in Twisted (twisted.python.procutils.which) or in buildbot.

Now, if darcs was the only thing in your stow environment, then /usr/local/bin itself would be a symlink to /usr/local/stow/blabblah/bin, and then you could get tripped up by some bit of code which abspath'ed everything in os.environ["PATH"] at process launch time. If that were happening, then unstowing darcs wouldn't be enough to get /usr/local/stow/blahblah/bin off your PATH, and if darcs were still there, it would try to run it (instead of trying to run /usr/bin/darcs or whatever).

But that neither explains the problem you're seeing nor sounds very likely.

comment:8 Changed 3 years ago by dustin

  • Status changed from new to closed
  • Resolution set to worksforme

OK, I just tried this out, moving svn from /usr/bin to /bin between builds. Works just fine. So this is something in the Python, Twisted, or Buildbot version you're using. I'm using buildbot master, twisted 8.2.0, and python 2.6.2.

Note: See TracTickets for help on using tickets.