Opened 7 years ago

Last modified 3 years ago

#2437 assigned enhancement

New Master-Slave Protocol

Reported by: dustin Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.7p1 Keywords:
Cc: jaredgrubb@…, rutsky.vladimir@…

Description (last modified by tardyp)

Buildbot currently uses Twisted Python's perspective broker to communicate with slaves. This is a remote procedure call library that operates on TCP connections, and has a few disadvantages:

  • It is Python-only, so slaves must be implemented in Python
  • It keeps the TCP connection open for many hours, and does not handle connection failure well
  • The RPC model is complex and does not map well to the operations Buildbot performs
  • The RPC implementation is inefficient and imposes some arbitrary limits on Buildbot's flexibility.

Scope

Here be dragons! Several people have attempted this before:

either of these may be a great starting point for this project, but at any rate this is a challenging project that will require a lot of thoughtful design work. The current plan is to use amp over ssh as the default slave protocol.

It's probably worth looking at off-the-shelf Message Queuing projects like zeromq, RabbitMQ or Celery.

The best approach is to find a way to get some working code put together quickly, while still allowing Buildbot to do everything its users expect. For example, if you can write your new protocol such that Buildbot can use perspective broker or your protocol, but you only get as far as a simple slave that can run echo hello world, that's great! We can ship that code, and someone else can pick up where you've left off to add more capabilities to your protocol -- assuming your design does not make that impossible.

It would be great to select a communication protocol that is not Python-specific, so that a non-Python slave could be used to run Buildbot on more limited hardware (e.g., mobile devices).

Remaining Work

Stop leaking PB references out of the protocol. grep pb.referenceable gives a good idea of what's remaining

  • process.builder.Builder: <tardyp> thinks this is not needed anymore. There is a docstring also in that file documenting removed method argument
  • process.remotecommand.RemoteCommand:

protocol.Base.startCommand takes a RemoteCommand argument on which it calls remote_* methods. Needs to create a proxyRemoteCommand that forwards the call to the process implementation.

  • process.slavebuilder.AbstractSlaveBuilder:

<tardyp> did not find any reason why this is referenceable

  • schedulers.trysched: This is the old pb interface to try commandline. should be removed as part of #896
  • steps.slave , steps.transfer: There is a FileWriter/FileReader? interface used for transferring files, that is passed via the RemoteCommand? args for commands 'uploadDirectory', 'uploadFile' and 'downloadFile'. Should create a proxy of those interfaces

Change History (11)

comment:1 Changed 7 years ago by jaredgrubb

  • Cc jaredgrubb@… added

comment:2 Changed 7 years ago by marchael

Greetings!

I'm interesting in this idea as possible candidate for my GSoC project and hope that ideas said here will be used as base for my proposal.

The first question will be about choosing protocol which master and slave will use for exchanging messages.

I go through djmitche code and found that it construct protocol messages very similar to the way described here http://amp-protocol.net/ (sorry if I wrong).

But anyway, twisted already has AMP support https://twistedmatrix.com/documents/current/core/howto/amp.html and we could just realize message exchange without writing AMP client-server.

AMP over SSH seems good choice because twisted already supports it and twisted allows usage SSH for authentication and encryption(and if it possible to store user credentials separately from e.g. /etc/passwd in UNIX-like or SAM files in Windows systems then that would be awesome)

From my view, that's most preferable way for transfer messages between master and slave.

But before we go next(or back, if I missed something), I want to propose possible alternative for twisted AMP - JSON over TCP.

Protocol is quite simple. Code below illustrate the idea of working with messages and not pretends to be working at real project.

import json
import struct

message = {"key1": "value1", "key2": "value2"}

# "s" is socket.socket() instance

def send_message(s, message):
    message_json = json.dumps(message)
    packed_message = struct.pack("L%ds" % len(message_json), len(message_json), message_json)
    s.send(packed_message)

def receive_message(s):
    raw_msg_len = s.recv(4) # unsigned long
    msg_len = struct.unpack("L", raw_msg_len)[0]
    raw_msg = s.recv(msg_len)
    message_json = struct.unpack("%ds" % msg_len, raw_msg)[0]
    message = json.loads(message_json)
    return message

So, first 4 bytes(on x86 systems) is message's length(n) and all n-bytes after is message. To protect service against some kinds network attacks, we could encrypt message (for instance symmetric encryption with pre-shared key) before send it via wire.

As I said, twisted AMP is more preferable, but it would be very interesting to me to discuss JSON-over-TCP way with buildbot's developers.

Last edited 7 years ago by marchael (previous) (diff)

comment:3 Changed 7 years ago by dustin

  • Description modified (diff)

You actually mention *four* protocols - AMP, SSH, JSON, and TCP. I'd like to focus on AMP over sockets (so, TCP or TLS) for Buildbot's default implementation, but write the specification and implementation in a generic fashion that could also be implemented with JSON-over-AMQP or XML-over-HTTP or YAML-over-email or whatever a user would like. So the choice of protocol isn't really the question to be focusing on up-front. Rather, let's describe a message-based interaction between master and slave that can then be implemented in one of several on-the-wire formats.

In IRC, you asked

09:10 < marchael> for now we determined protocol for exchanging messages, but I still confused about how I should try to implement this without breaking existing perspective broker

and this is indeed a difficult question. In general, the approach should be:

  • maintain a PB implementation on the master side, but deprecate it
  • add a new AMP-based implementation on the master side, parallel to the PB implementation
  • replace the slave-side PB implementation with an AMP implementation

The tricky bits will be in maintaining both the existing PB implementation (which has complex interactions with lots of classes on the master) and the AMP-based implementation (which should avoid much of that mess) in parallel. It's hard to say what that will look like without digging into the code. Tom, do you have some thoughts on that based on last year's work?

comment:4 Changed 6 years ago by rutsky

  • Cc rutsky.vladimir@… added

comment:5 Changed 6 years ago by dustin

  • Owner set to tomprince
  • Status changed from new to assigned
  • Type changed from project-idea to enhancement

I'm removing this from the project ideas list, as marchael implemented most of it in GSoC 2013, and tomprince is working on cleaning that up (right, tomprince?) before we merge it.

comment:6 Changed 5 years ago by dustin

  • Description modified (diff)
  • Owner tomprince deleted

Tom isn't actively working on this. There are a few issues (which should probably break out into bugs) remaining here. I've updated the description accordingly.

comment:7 Changed 5 years ago by tardyp

  • Description modified (diff)
  • Owner set to tardyp

I'll be trying to pick the low hanging fruits here, as it looks like there are a lot of needs for better slave protocol. I've been at least trying to see what's really missing here. please @dustin, and @tomprince help and tell if I'm wrong.

comment:8 Changed 4 years ago by tardyp

  • Owner tardyp deleted

comment:9 Changed 3 years ago by rutsky

Is this wiki page related to this ticket? http://trac.buildbot.net/wiki/MasterSlaveCommunication

comment:10 Changed 3 years ago by rutsky

comment:11 Changed 3 years ago by dustin

Yes, both are related. I tried to rename, but there's a Trac bug preventing it. I think the updates you've made are sufficient.

Note: See TracTickets for help on using tickets.