Give BigPanda
a try
by Hagai Kariti | July 14, 2014

Easy Modeling of Distributed Production with Vagrant & Ansible

Modeling your production environment correctly is very important for development. Developers need to be able to run and test their code locally for the development process to be efficient, and many times this requires setting up infrastructure that exists in production on their local machines. The basic solution is a simple Vagrant box containing all your infrastructure and application code, like the one we mentioned in our Devbox post. 

In our previous post on Vagrant we covered a basic everything-on-one-server setup. In production, you may have 3 database servers, 2 application servers and 2 caching servers. Pretending a one-machine-to-rule-them-all model would be accurate is misleading. You can’t test for scaling issues, catch race conditions, spot poorly distributed design decisions, etc. until you reach production.

What if you could model clustered or distributed systems as multiple machines as they would be in real life? While making it easy enough to customize that notoriously lazy developers actually use it? Without duplicating your production scripts? This post provides a solution to this problem using Vagrant and Ansible.

Before we start, here’s what won’t be discussed, as they could each require a post of their own:

  • What is Vagrant (seriously, read the previous devbox post)
  • Ansible roles
  • Using Ansible to provision a new machine and add it to a cluster (including config updates/generation)

Ok then. We can start with the multi-machine feature of Vagrant, and create this Vagrantfile for a 4 machine environment, with 2 app servers and 2 database servers:

# -*- mode: ruby -*- # vi: set ft=ruby : VAGRANTFILE_API_VERSION = "2"  Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| = "ubuntu/trusty64"  config.vm.provision "ansible" do |ansible| ansible.playbook = "playbooks/dev/devbox.yml" end  config.vm.define "appserver-1" config.vm.define "appserver-2"  config.vm.define "dbserver-1" config.vm.define "dbserver-2" end

Notice the path for the playbook. In my case, ‘playbooks’ is a copy of the playbooks git repository I use for production. It contains roles, playbooks, custom modules, you name it. The ‘dev’ dir inside contains the playbooks and vars relevant to the Vagrant environment. I also have ‘prod’ and ‘stage’ dirs, if you’re wondering. Side note: If you put your Vagrantfile in git, you may want to use git submodules for the ‘playbooks’ dir.

The playbook devbox.yml is fairly simple:

--- - hosts: appserver-* roles: - appserver  - hosts: dbserver-* roles: - dbserver

While simple and short, it’s actually not that helpful. Because we aren’t using hostgroups, we can’t:

  • Generate configuration that uses lists of hosts
  • Use group vars

Both are pretty bad if we want to use the same scripts that manage production. The Vagrant Ansible provisioner allows us to specify groups, but I don’t like it:

ansible.groups = { "appserver" => ["appserver-1", "appserver-2"], "dbserver" => ["dbserver-1", "dbserver-2"], }

Why don’t I like it? Remember that developers should be able to customize their modeled environment easily. Specifically I see these usecases:

  • Adding/removing a machine
  • Adding/removing a hostgroup (i.e. a type of machine)

Let’s see how we accomplish these. To add a machine, say a third database server, we need to:

  • Define a new machine. That means adding this line: config.vm.define "dbserver-3”
  • Update the group memberships. That means adding dbserver-3 to the dbserver group.

To later remove this machine:

  • vagrant destroy it, then remove its definition line.
  • Update the group memberships. That means removing dbserver-3 from the dbserver group.

To add a new type of machine to our model, say a caching server, we need to:

  • Define a new machine: config.vm.define “cacheserver-1”
  • Add a new group to the group list that contains the new machine. That’s a line like this in the ansible.groups variable: “cacheserver” => [“cacheserver-1”]
  • Update our devbox.yml with a play that configures cacheservers

Ugh. That’s doesn’t fly with my laziness. It’s easy to imagine people forgetting one of these steps and having problems. The proper solution is a Vagrant inventory plugin for Ansible that groups machines based on their Vagrant name. Until I write that however, there’s always the warm embrace of ugly hacks. I changed my playbook to this:

--- - name: Devbox Galore hosts: all gather_facts: no tasks: - name: Grouping hosts group_by: key="{{ inventory_hostname | regex_replace('-[0-9]+$', '') }}" tags: groups  - hosts: appserver roles: - appserver  - hosts: dbserver roles: - dbserver

The magic is around the group_by module, which dynamically adds a host to a group. This play takes each machine in our Vagrant env, strips the numbered suffix and treats what’s left as the hostgroup name; after that it adds the machine to that hostgroup. So appserver-1 and appserver-2 will be put in the appserver hostgroup. dbserver-1 and -2 in the dbserver hostgroup, and so on. Note that we also changed the host pattern in the following playbooks to use the hostgroup instead of the wildcard pattern we had used before. If you followed the best practices, this playbook may start to look like your master site.yml. No need for duplication:

--- - name: Devbox Galore hosts: all gather_facts: no tasks: - name: Grouping hosts group_by: key="{{ inventory_hostname | regex_replace('-[0-9]+$', '') }}" tags: groups  - include: ../site.yml

Now let’s revisit our customization cases:

  • If we want to add another db machine, just add one line to our Vagrantfile: config.vm.define "dbserver-3”
  • Want to remove it? vagrant destroy it and remove that line.
  • Added a new cacheserver hostgroup? You can already model it with vagrant. Just add a cacheserver-1 machine. No need to update Vagrant’s playbooks, they’re the same as production’s.

We still have one problem left. In production, we have a load balancer. Sometimes we manage it, sometimes we use ELB or something similar and we don’t manage it. Either way, we have to use a load balancer if we model two or more application servers. We can solve it by adding one machine with nginx, haproxy or whatever, and generate it’s configuration using Ansible templates. A simple site template for nginx can be:

upstream appservers { {% for h in groups[‘appserver’] %} server {{h}}; {% endfor %} } server { listen 80; location / { proxy_pass http://appservers/;<br< a=""> /> } }

You can create a role for it and add it to the devbox.yml file. When adding to the Vagrantfile, don’t forget to forward the port:

config.vm.define "loadbalancer" do |loadbalancer| :forwarded_port, host: 8000, guest: 80 end

Now you can access your not-so-much-unrealistic model through http://localhost:8000/. If you need a proper hostname (e.g. for multiple sites), use the vagrant-hostmanager plugin:

config.vm.define "loadbalancer" do |loadbalancer| :forwarded_port, host: 8000, guest: 80 loadbalancer.hostmanager.aliases = %w( end

I hope the tricks in this post can help you solve your modeling problems. As you might have noticed, I didn’t really invent anything new here, just tied some useful tools together. Happy hacking!

Hagai Kariti

Hagai works in Operations at BigPanda.