Ticket #184 (closed enhancement: fixed)
New build step for synchronizing concurrent builds: Blocker
| Reported by: | gward | Owned by: | gward |
|---|---|---|---|
| Priority: | major | Milestone: | 0.8.4 |
| Version: | 0.7.6 | Keywords: | buildcoord |
| Cc: | dustin |
Description
Background: I use Buildbot to run related-but-distinct builds on Linux and Windows. That is, the two builds together comprise "a build" for us; the products of both builds are what we care about. There are several synchronization points between the two builds, for example:
- we have a Java application that is compiled on Linux, but then bundled into a Windows installer -- so we cannot run that part of the Windows build until the Java build on Linux completes
- all of our applications (Linux and Windows) are ultimately packaged into RPMs by the Linux build -- so we cannot run that part of the Linux build until all Windows build steps complete
(Also, I'm not actually using Buildbot for continuous integration, but as a production build server. So I'm completely oblivious to schedulers, changes, and the like: all my builds are forced by external events, like QA tells me they need a build to test new features.)
My first attempt, many months ago, was to use the Dependent scheduler. This didn't work because I don't use schedulers at all. Even if I had arranged things to use schedulers (eg. with synthetic changes rather than forced builds), I would have needed a lot more than just 2 builders. I think every synchronization point would have forced a new scheduler and a new builder. I think I would have had something like 5 or 6 builders for my 2 builds with 3 synchronization points (can't remember exactly -- I never worked it out, I just ran screaming in horror).
So then I tried Dustin Mitchell's "trigger" patch (tickets 56 and 57). For a long time, I thought it would do the trick just fine. But it ended up falling down as well because each trigger can only trigger one scheduler. First, the use of schedulers as one end of a synchronization point means that again I would have to split my build up into 5 or 6 builders with almost as many TriggerableSchedulers?. Then, once I did that, I would have hit a wall: my "build RPMs" steps must be triggered by success of "linux C++ build", "linux java build", and "windows build". Bzzzt. Can't do that. Triggers don't work that way. Fail.
So I wrote Blocker, a new kind of build step that blocks its build until all upstream steps successfully complete. See attached patch. This is a surprisingly complex and subtle patch; if you start to read it, keep your wits about you and don't get distracted. Also, it's not done yet -- there's at least one sneaky bug in there that sometimes lets Blocker succeed even though one of its upstream steps failed. And the tests are inadequate, there's no doc patch, etc. Someone familiar with Buildbot internals should be able to find their way around from reading the comments and docstrings though.
I'm sharing it now to try to get some feedback from the community, see if anyone else needs such a build step, etc. Enjoy!
Attachments
Change History
Changed 5 years ago by gward
-
attachment
buildbot-blocker-20080123.patch
added
comment:1 Changed 5 years ago by dustin
- Cc dustin added
Greg -- this sounds neat. Any progress? Anything I can do to help?
comment:2 Changed 4 years ago by gward
- Owner set to gward
Dustin -- I'm finally reviving this feature. I have ported the patch to the current master branch and pushed the patches to http://github.com/gward/buildbot/commits/bug184 . I verified that the unit tests still run, but have not done ANY other testing at all. (We use this feature in our production buildbot at work, but it's still running 0.7.6.)
You offered help a year ago: yes please! I would greatly appreciate:
- a careful, critical code review: this is tricky code and it took me a while to get it right, and it's entirely possible that I did not get it 100% right after all.
- any ideas on how to improve the tests. Currently the tests only cover trivial logic, because I don't know how to setup enough of the Buildbot infrastructure to actually do meaningful tests
Also, my 'bug184' branch preserves the full play-by-play history, but I doubt that's very interesting for a review. I would just look at the tip of the branch: after all, I just added two files and then hacked on them for a while. Did not modify any existing code.
Thanks!
comment:4 Changed 4 years ago by dustin
One comment I have immediately is that your exception definition is too long:
+class BadStepError(Exception): + """Raised by Blocker when it is passed an upstream step that cannot + be found or is in a bad state.""" + def __init__(self, message): + self.message = message + + def __str__(self): + return self.message
you can leave off the __init and __str; the default constructor for Exception stores its arguments in the exc.args tuple.
I think this will make your assertRaises unnecessary, but I'm not entirely sure.
As for the rest, I need some more time to look more deeply at it.
comment:5 Changed 4 years ago by gward
Oops, quite right about that exception class. Fixed and pushed.
BTW: after fighting some more with git this morning, I ended up reforking your repo on github and recommitting this series of patches. The same URL as before still works ( http://github.com/gward/buildbot/commits/bug184), but it's a new repo/new branch/new commits.
comment:6 Changed 4 years ago by gward
More progress on this feature: there are now tests for it! Unfortunately, they aren't proper automated unit tests: you have to run a couple of BB daemons and inspect the outcome of several test builds. But this is good progress, and I now have some idea of what automated tests for Blocker would look like.
Anyone interested in trying out the tests for Blocker should clone or pull from git://github.com/gward/buildbot.git, branch 'bug184', and take a look in blockertest/README.txt.
comment:7 Changed 4 years ago by dustin
I'm still planning to take a look at this for 0.7.10. I'm just focusing on triaging all of the other tickets first.
comment:8 Changed 4 years ago by dustin
- Keywords review added
- Milestone changed from undecided to 0.7.10
comment:10 Changed 4 years ago by dustin
In looking at this in-depth, I'm concerned about the buildsMatch -- I think that makes the class pretty deeply dependent on build numbering, which isn't guaranteed to mean anything much. Overall, the patch is organized around named build/step pairs, while other use cases may need a more general approach (for example, if one of several builders can supply the condition that you're blocking on)
I have another way of thinking about this. Unfortunately, it's probably not something we can implement for 0.7.10, but relatively shortly thereafter. This is similar to the notion of condition variables.
First, we define a BuildCondition? class that tracks the occurrence of a particular condition across builds. In master.cfg, this would look like:
cpp_done = BuildCondition() java_done = BuildCondition() windows_done = BuildCondition()
Each of these is essentially a dictionary of condition variables, keyed by information about the build. By default, that's the sourcestamp. This class could take, as a constructor, a function to generate any other kind of key.
We then define a SetCondition? buildstep that sets a condition, and a BlockOnConditions? buildstep that blocks on a set of conditions:
win_factory.addStep(SetCondition(java_done)) # .. do the windows build win_factory.addStep(SetCondition(windows_done)) rpm_factory.addStep(BlockOnConditions(windows_done, java_done, cpp_done)) # .. package up the RPM
Note that "real" condition variables don't usually allow you to block on a set of conditions like this. Consider it an "added bonus" :)
What do you think? I'm not entirely sure how to proceed -- I would like to have something more like I described, rather than the Blocker step as defined. That said, I don't have a big problem with adding a new buildstep to the repository -- it won't hurt anyone who doesn't use it, even if it does have bugs. Put another way, I'm happy to merge this so that your builds can proceed if you'll consider working on a condition-based implementation in the post-0.7.10 timeframe. What do you think?
comment:11 Changed 4 years ago by dustin
- Milestone changed from 0.7.10 to 0.7.+
No word back from gward - pushing out to 0.7.+
comment:12 Changed 4 years ago by gward
Dustin -- sorry for the delay. I'm terrible with email.
Anyways: the BuildCondition idea sounds interesting. I'll have to read your proposal over again a few more times and think deeply about it.
And I'm glad that Blocker did *not* get merged in its current state: if the design is too specific to how I use BuildBot, then it doesn't belong in BuildBot proper. Let's get the design right first. Making big changes is not a big deal as long as I'm the only user. But if it gets merged, then I have to worry about backwards compatibility. No thanks.
comment:13 Changed 4 years ago by dustin
- Milestone changed from 0.7.+ to 0.7.11
Well, this can be the first bug slated for 0.7.11, which should give you ~3mo :)
comment:16 Changed 3 years ago by dustin
- Milestone changed from 0.7.12 to 0.8.+
gward - I haven't heard from you latelyi!
comment:17 Changed 3 years ago by dustin
- Keywords buildcoord added; sprint removed
- Milestone changed from 0.8.+ to 0.8.1
This is a part of a larger project to figure out how to coordinate multiple builds. See {34}.
comment:19 Changed 3 years ago by chrb
dustin, your link to buildbot.net/trac/report/34 in the last comment links to a non-existent report.
I would like to see something like this that supports merging the buildpath. I was thinking of using triggers as gward suggested - creating some trigger that requires n previous buildsteps to have succeeded, like:
from buildbot.process.buildstep import SUCCESS
class SyncTrigger(Trigger):
def __init__(self, required):
self.required = required
self.triggers = 0
def start(self):
p = self.build.getProperties()
if 'reset' in p:
self.triggers = 0
if 'done' in p and self.triggers < self.required :
self.triggers += 1
i = self.required - self.triggers
self.step_status.setText(['waiting for '+i+' builds'])
return self.finished(SUCCESS)
else:
return Trigger.start(self)
You would need to send a trigger with 'reset' keyword from the Build that initiates the flow-control split, and triggers with 'done' keyword at the end of each concurrent Build.
comment:20 Changed 3 years ago by dustin
An interesting idea - please open a new bug!
My worry here is that synchronizing builds using time alone runs into problems with overlapping builds, or when particular tasks take longer than expected.
comment:21 Changed 3 years ago by dustin
And here's the better version of what used to be {34}: http://buildbot.net/trac/query?status=!closed&order=priority&keywords=~buildcoord
comment:22 Changed 2 years ago by dustin
- Milestone changed from 0.8.+ to 0.8.4
This is under review in https://github.com/buildbot/buildbot/pull/77 and will be merged soon.
comment:23 Changed 2 years ago by Dustin J. Mitchell
- Status changed from assigned to closed
- Resolution set to fixed
Merge branch 'bug184' of git://github.com/gward/buildbot
- 'bug184' of git://github.com/gward/buildbot: (41 commits) (#184) Blocker: add copyright message, move test, cosmetics. (#184) Blocker: fix unit test by removing unused imports. (#184) Blocker: make timeout handling a little tighter. (#184) Blocker: explain why trawling through the build cache is bad. (#184) Blocker: do not provide a default implementation of buildsMatch(). (#184) Blocker: make test outcomes match expections again. - set flunkOnFailure true in my custom NoOp? step (looks like the upstream default changed, breaking my implicit assumption) - update test README to match current Blocker output (#184) Blocker: adapt to changes on current trunk (pre-0.8.4). (#184) Blocker: adapt test infrastructure to current trunk (pre-0.8.4). - separate buildbot and buildslave scripts - recommend use of virtualenv for running test daemons - remove manhole from test master config (doesn't work: Twisted dependencies seem to be incorrect) (#184) Blocker: adjust to source reorg by moving into master/buildbot. (#184) Blocker: add a comment about callbacks after timeout. (#184) Blocker: define str() and repr(); tweak logging. (#184) Blocker test: tweak README.txt (#184) Blocker test: remove pointless constructors. (#184) Blocker test: add 'restart' Makefile target. (#184): Blocker test: replace remaining ShellCommands? with custom NoOp? step. (#184) Blocker test: replace "date" ShellCommands? with custom Timestamp step. (#184) Blocker test: add 'reconfig' Makefile target. (#184): Blocker test: replace "sleep" ShellCommands? with custom Delay step. (#184) Blocker: Trim comments from testing master.cfg. (#184) Blocker: flesh out testing infrastructure. This should enough for any clueful Buildbot hacker on a Unix box to systematically (not automatically!) test Blocker. ...
Fixes #184.
Changeset: 20e598d1b45832c354d5b3b519e84d323a5383d2
comment:24 Changed 2 years ago by dustin
Bug for tests is #1901
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
initial version (2008-01-23) of the patch to implement Blocker