Opened 5 years ago

Closed 4 years ago

#2855 closed task (fixed)

Use a configuration-management system for *all* configuration

Reported by: dustin Owned by: dustin
Priority: major Milestone: sys - on-bb-infra
Version: 0.8.9 Keywords:
Cc: koobs


I'm familiar with puppet, but happy to learn Chef or any other system.

Putting our infrastructure in code would greatly facilitate administration by more than one person:

  • ease of adding/removing accounts, updating passwords
  • self-documenting configuration ("who changed this? when?")
  • disaster recovery (just rebuild)
  • change review

Change History (14)

comment:1 Changed 5 years ago by sa2ajj

I would not rule out ansible :) It's in Python after all.

comment:2 Changed 4 years ago by sa2ajj

And, as a side note, Fedora Project uses Ansible to maintain their infrastructure: git repo.

comment:3 Changed 4 years ago by ewong

I've looked at Ansible, Chef, cfEngine and puppet. (by look, I literally mean look - via docs.. I've only had a miniscule tidbit experience with Ansible and puppet).

Ansible is the only cms that doesn't require anything to be installed on the hosts, aside for ssh. (Windows support, however, is in beta stage, whereas Windows support on chef and cfEngine seem to be 'good' and puppet works as well).

Ansible is in python. chef and puppet are in ruby. cfengine is in C. (dunno if it's relevant to the issue here).

Puppet is 'proven' to work (ask Dustin :) ).

AIUI, all configuration changes would be just pushed to a central repo, which the main 'master' machine would poll and update hosts accordingly.

Maybe set up a test environment to try out all four?

comment:4 Changed 4 years ago by sa2ajj

Test environment would be necessary in any case :)

We are just waiting for verm to sort out initial setup up/access control.

comment:5 Changed 4 years ago by ewong

dustin mentioned that windows management is also a requirement.

At first I thought Ansible was in beta stage; but, it 'looks' like it works on Windows as well ( or the referring doc: It just needs winrm module on the master and windows powershell 3.0 installed on the Windows host. (Though I don't know the extent of the support; but I guess if you use powershell remote, what is limited I guess is how the winrm module interacts with powershell.)

Last edited 4 years ago by ewong (previous) (diff)

comment:6 Changed 4 years ago by dustin

  • Summary changed from Use a configuration-management system to Use a configuration-management system for *all* configuration

I'd like to very quickly get to the point where all changes to the buildbot infra are made by ansible, and not by hand. I don't really want to burn time trying multiple tools -- Ansible puts us all on an equal (n00b!) footing, so let's just use that.

That will involve:

  • setting up ansible
  • building out ansible configurations for each host and each jail within the host
  • rebuilding jails one by one from ansible
  • rebuilding hosts (perhaps in a "fake" way in a jail) to ensure that all details are handled by ansible. In other words, we need to demonstrate or prove convincingly that the *entire* infrastructure can be rebuilt from the buildbot-infra repository. This is particularly important since we're running on non-redundant hardware and should expect a catastrophic failure at some point. Recovery by re-installation in that case is OK, but recovery by guessing is not.

comment:7 Changed 4 years ago by dustin

  • Milestone changed from sys - future to sys - on-bb-infra
  • Owner set to dustin
  • Status changed from new to assigned

I've played with Ansible a little, and here's my basic idea:

  • We use a single playbook + hosts file, in the buildbot-infra repo
  • All hardware systems - service hosts and vm hosts - configure themselves with 'ansible-pull' on a crontask. We should be able to write a quick shell script that we can run on a newly installed FreeBSD host which can install ansible and run ansible-pull for the first time, which will re-install the crontask (and everything else).
  • In order to avoid running sshd in jails, we'll configure jails using the jexec connector for Ansible, from the service host, also on a crontask. So basically each service host will be responsible for keeping its jails up to date.
  • Virtual machines will be configured by the VM host using regular SSH-based ansible.

I still have a lot to learn about password vaults, etc. etc., but how does this sound for a start?

comment:8 Changed 4 years ago by skelly

This sounds like a good start.

For initializing a host, I think setting up a playbook to run from another host in the network or even someone's personal computer would be best. Multiple things will need to be installed (ansible, git) in order to even make ansible-pull work.

comment:9 Changed 4 years ago by skelly

Another thing that is nice to have: a staging environment. It looks like nested jails are possible since 8.0 so would not necessarily need a VM to host staging.

Couple of ways to do this in ansible.

  1. Single playbook, multiple segregated inventories. Here, each environment gets its own inventory (e.g. inventory/production, inventory/staging) and selection of the inventory determines destination.
  2. Multiple playbooks, single inventory. This is convenient from a variables point of view. The playbooks would be largely the same, possibly using a variable to choose the destination. Keeping duplication (or worse, differences) minimized between the two can be simplified by including tasks or using roles.

comment:10 Changed 4 years ago by Ben

I'm coming late to the party, but I didn't saw Salt mentioned anywhere. It's python also, and doing well with windows ...

comment:11 Changed 4 years ago by sa2ajj

It's never late :)

comment:12 Changed 4 years ago by sa2ajj

As a really first step, I'd suggest to have a simple inventory and a simple playbook to do 'ping': making sure that ansible can access those hosts that it should.

I understand that we'll have two groups of host:

  • those that can/should be deployed using ssh
  • those that will have to use ansible-pull mechanism

I'd start with the first group (if we have any, of course).

comment:13 Changed 4 years ago by dustin

We landed a start today ( similar to comment 12, so I'm going to call this fixed.

comment:14 Changed 4 years ago by dustin

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.