wiki:Meeting29April2010

Agenda

  1. Buildbot's Successes - where do we stand?
  2. What should we do (and not do) in 0.8.1?
  3. Who will do it?

Buildbot's Successes

0.8.0 has a lot of great new code!

  • Database backend (schedulers, changes, buildsets, buildrequests, sourcestamps)
  • Jinja template engine for the web status interface
  • Authentication / Authorization for web status
  • HTTP Push Status
  • JSON status
  • Project and repository in SourceStamps and Changes (+ ChangeFilter filtering)
  • Reorganized documentation

Potential 0.8.1 Projects

  • Web UI as first-class citizen
    • Tomas, Marcus: JS, ETA, better interactivity
    • Flexible filtering for SourceStamps?
    • External to buildbot?
  • Build status in DB
    • better / faster access to build history
  • Expand Latent-Worker support
    • KVM/VMWare/Xen/qemu via libvirt?
  • Project / Repository
    • Support for cross-project builds and dependencies
    • Dynamic manipulation of SourceStamps
  • Build coordination
    • Beyond Dependent and Triggerable
    • traversing multiple builds in a DAG (top-down? bottom-up?)
  • Windows compatibility
    • need a windows guru
  • Source step mode cleanup (#669)
    • new set of Source steps with a new model?
  • Expanded notification framework
    • flexible filtering
    • flexible notification channels
  • Logging
    • Compression of logs on the wire
    • Worker-side logging
  • Graceful Shutdown of Master
  • Worker Administrative help
    • Graceful don't-schedule-stuff-here option (lower worker priority) (with some kind of scheduling)
    • communication with worker admin
    • remote shell

Meeting Video

Mozilla was kind enough to record the proceedings; see http://videos.mozilla.org/serv/air_mozilla/buildbot080event.ogg

Notes from the Meeting

These are hardly minutes, since I was taking notes while talking, but here goes. Others who were present: please add your own notes. "I" in the below refers to Dustin.

We had a few technical difficulties. I learned one thing: use a laptop which can connect to IRC (really SSH in this case): it turns out I was unable to see half of the conversation! John O'Duinn will be uploading a video of the meeting, although apparently the video system died a few times, so it may be partial. Also, the overhead microphones were not working, so it was difficult to hear a lot of the people in the room. So I will try to summarize here. The notes I displayed onscreen (with Safari! Horror of horrors!) are above.

We began by talking about the major improvements to Buildbot in the 0.8.0 release, as a basis for where Buildbot stands and what projects are in motion. This provoked little discussion.

Then we moved into looking at proposed work for 0.8.1, looking both at *whether* to do it and *how* to do it.

Web UI as a first-class citizen

The web interface used to be a simple status display, a "peer" to other status displays. Over the years, it's become a significant piece of Buildbot, and its official position should probably be updated. There are some proposed enhancements to the existing web status, which saw no opposition. We agreed that it should be possible to add other, more sophisticated web frontends to Buildbot, but didn't really discuss how to go about making that pluggable.

We talked quite a bit about the various web services interfaces that Buildbot now sports: HTTP push, HTTP/HTML, XMLRPC, and JSON. It was proposed to add a REST API, and I suggested that at least one other API should be removed. So far, it looks like XMLRPC is on the chopping block (this is an open mailing-list thread right now). As far as I know, nobody has stepped up to take on this task. If someone does, I would like to see all of the APIs sport the same set of methods, parameters, and results, with a common implementation and documentation.

Remote Server Shutdown

This got added to the list during the meeting, with dicussion focusing on adding an API method to shut down the master gracefully. There is already code to support this, although it is not merged. Authentication is obviously important here! Brian suggested that authentication here should be identical to "who can access the buildmaster from the shell", and thus SSH and a command-line interface should be adequate. Others pointed out that in a distributed environment SSH can be difficult to script.

This brought up the question of control via web services APIs - that is, doing more than just reading status. The web UI already has a number of control features built in, and their implementation is pretty natural, so I don't have significant objections to adding these features to the web services APIs - as long as they are properly authenticated and authorized.

Build status in DB

There was a lot of discussion here of writing to the status db via a status listener, rather than writing to the DB directly from the buildbot core. This was seen as useful for those (hi, Mozilla) creating a highly distributed buildmaster, because each buildmaster could feed status to a single status-writer. This would simplify the core, since any status events would simply be calls to a notification system, but it's not clear how the core would deal with events created while the listener was not available. Likewise, it would preclude the core using any data in the history, e.g. building only builders whose most recent build failed.

There was some brief discussion of support for other databases - Murali suggested berkeley db, but there are no Python bindings beyond the simple key/value interface.

Mozilla will be working on this project, so we'll see how it turns out soon enough.

Latent-Worker Support

Nobody objected to this idea. Chris Atlee mentioned wanting the ability to do generate an arbitrary number of workers without naming them all individually: currently every EC2 worker must have a distinct AMI that knows, at a minimum, its unique worker name.

Multi-project / Multi-repository support

We now have the capacity to build multiple projects in Buildbot, even from different repositories. There are lots of optimizations to make to this support, mostly in the Source steps. But we also need to think about how to support interaction *between* projects - bundling projects together, projects that are dependencies of other projects, and so on. There was general agreement that some sort of "aggregate" SourceStamp? is a good idea, the hard part being *building* the SourceStamp? in the scheduler and then *interpreting* it in the Source steps.

VC support for submodules / externals came up as an alternative, but these options are not useful in all cases.

Zmanda will be working on this.

Build Coordination

Similar to the multi-project / multi-repository support, we need more expressive tools to describe the relationship of builds to one another. We have Dependent and Triggerable schedulers, but these offer limited flexibility. There have been discussions of describing builds as DAGs, or inventing a domain-specific language for the purpose.

Murali brought up the idea of providing Buildbot configuration in some structured format other than Python, giving the relationship between builds that are required. He also mentioned putting a web-based, drag-and-drop GUI on top of this format. I suggested that it's already quite possible for users to generate Buildbot configuration from structured data (by parsing that data in the master.cfg). I also said that a configuration GUI would be interesting, but will not be a part of Buildbot, because it would of necessity reduce the flexibility of the configuration. There are already some configuration-generation apps out there (notably Loki), and certainly within more narrow environments it may even make business sense to provide such a frontend.

Windows Compatibility

Buildbot needs a Windows coordinator - someone who can judge the sanity of a patch, test it out locally, and so on. We also need some Windows workers. I heard a volunteer for this who turns out not to be the person I thought -- who was that? Hopefully Mozilla will be supplying a dozen or so workers, so we'll be able to run Windows on some of those in a controlled fashion.

Source Step Mode Cleanup

I think that everyone wants to see this happen, but nobody has yet stepped up to the challenge.

An interesting tangent came up here: moving the VC smarts from the worker to the master. There are several possible approaches here: continue to use buildbot on the worker side, with the master telling the worker to run 'git this' and 'svn that'; or replace the PB connection with a simpler, non-Python-specific protocol that could be implemented with minimal worker-side requirements (e.g., http://github.com/djmitche/remsh/). It would be possible to implement the latter in a "transitional" fashion, so that remsh workers could act just like regular buildbot workers, without running buildbot/twisted/python on the worker side.

We briefly touched on the idea of "splitting" buildbot into a master (requiring Jinja, Sqlite, etc.) and a worker (with minimal requirements).

Expanded Notification Framework

It would be nice to see the various status listeners use a common filtering mechanism to decide which changes are important, and perhaps even a common formatting mechanism to describe those changes. Nice, but nobody spoke up to work on this.

Logging

Chris has already added a good deal of optimization to the log-handling code, and while some were surprised and excited to find out about this, I didn't hear any suggestions of new features to be added here.

Community Supplied Workers

A few ideas were suggested to make it easier for a buildmaster admin to manage a set of community-supplied workers (see above). Nobody spoke up to work on these ideas, though. As I understand it, most of the organizations using Buildbot supply their own workers.

Tests

I failed utterly to mention this during the meeting, but Buildbot's tests are lacking, and it's my fault. So I'll fix those up.

Releases

Going into this meeting, I hoped to get a small scope set for 0.8.1, and then make the release when those features were done. I was convinced, instead, to plan a release when each feature gets merged. John O'Duinn referred to this as a "short cadence" - a phrase I like! The list of adopted projects is, roughly and with no authority:

  • master shutdown - mozilla
  • build database {33}/{29} - mozilla
  • build coordination {34} - zmanda
  • multi-project {35} - zmanda
  • windows {17} - ?? + mozilla workers
  • tests - dustin

0.8.1 will be released when any one of these (or anything else, really) is completed.

Last modified 23 months ago Last modified on Jan 4, 2017, 2:24:10 AM