Like at most other places, we at Chartbeat strive to have the development environment resemble the production environment as much as possible. It should also be easy to maintain and keep up to date. So we set out to implement a “Development Virtual Machine” (Dev VM), with the following requirements:
same OS as production
same software as production
same networking access as inside EC2
easy to set up for new devs
easy way to reset to “known good” state
easy to keep up-to-date as the production environment changes
serve as the day-to-day development environment for all development (ops, backend, and frontend)
be able to clone/emulate all server/services types in our production environment
This blog post describes our setup. A setup that gives us an easily maintainable environment, that makes it possible for new developers to push code on their first day.
Our production infrastructure is all Ubuntu Linux 10.04 running on Amazon EC2. Everything is managed by Puppet, and all Puppet manifests are kept in a git repository. So to get the same configuration on a local VM, it is basically just a question of:
booting an Ubuntu 10.04 image in the VM
getting the Puppet repo checked out in the VM
starting a puppet server serving from that repo
assigning the right puppet type to the VM hostname in puppet
We could also be running masterless puppet, but since we are using a puppet server in production, we want to do the same inside the VM.
Code /Repo Checkout
For getting access to our repo inside the VM, the natural thing would be to check out the repo inside the VM. There are some downsides to that though. First, it means that ssh key/git access will have to be setup for each developer inside the VM. Second, it would also mean that all code editing would need to happen inside the VM (or the VM fs be exposed out to the host). Thus, we opted for setting up a Virtual Box shared folder, and mounting in the standard user’s home directory inside the VM. This makes setup much easier, and also allows for each developer to keep using their existing editor / tools on their host VM.
For networking, it gets a little bit more interesting. The reason is that we want to be able to:
access the VM locally from our internal network (so everybody internally can hit your server/API/etc)
give it access to our internal Amazon network
1) is pretty easy to solve as it is just a question of making the network interface a “bridged network”, making the VM show up as any other machine on the network. Using avahi/bonjour, it will also automatically be accessible through .local.
2) is a a question of setting up a VPN. Ideally we don’t want to have to run the VPN software inside each VM though, as a) it goes against the idea of trying to keep the machine as much as a production machine as possible, plus b) it means having to set up the VPN config + connection inside each VM too. After some pondering, we figured out that we could use the host machine’s VPN by a little “trick”: Setting up a secondary network interface on the VM that is “NATed”. Traffic that goes through that interface is treated just like traffic from the host itself, so it’ll go through any VPN solution that the host has. Then it was just a question of routing all Amazon traffic through that network interface (i.e. 10.0.0.0/8). Naturally, this will of course only work if your own local network is not 10.0.0.0/8.
Our final setup ended up being a VirtualBox Appliance (basically a exported disk image), which is a plain Ubuntu 10.04 install with the above things added. The steps for setting up a new user is thus:
set up vpn
checkout git repo
download the virtual box appliance
run the ‘vmnew script’ (clones the template image into a new vm, and changes the hostname for the new vm)
set the puppet node type for their vm to ‘localdev’ in their repo
ssh into the vm
Voila, the developer now has a fully working machine with all dependencies and tools for developing.
(Step 4 is not strictly necessary, but allows for the developer to set up multiple VMs on the same machine, or easily start completely over from scratch)
What is Not Working
Overall the Dev VM is working really well for us, but we have these three known bugs:
The i/o throughput of the shared folder file system inside the VM seems incredible low. Mass erasing files for example, or doing almost any git operation, is close to impossible because of this.
Our host (OS X) file system is case insensitive, so the shared folder exposed in the VM exhibits the same behavior. This is definitely not what you see on Linux normally, and has caused issues at least once. It is unclear how we solve this easiest.
There have been some weird networking issues occasionally, where it seems like one or both of the guest network interfaces loses contact with the network. This can sometimes be solved by doing a reconnect of the interface (VBoxManage controlvm $VM setlinkstate1 off && sleep 10 && VBoxManage controlvm $VM setlinkstate1 on). It’s slightly annoying, but it seems like it gets better with each VirtualBox release.
All the scripts we use can be found here: https://github.com/chartbeat/vmutils
For VM software we started with VirtualBox, for no other reason than that it is free. We could just as well have used VMWare, but have seen no good reason to switch to that. Unless of course it could solve some of the above issues…
Why did we go ahead and do all this, and not just use Vagrant? Good question. We actually looked at it in the start, but there were a few magic things happening like the automatic ssh key setup and the chosen user name that conflicted with our production environment. We could probably have extended/tweaked Vagrant to suit our needs, but it didn’t seem worth the effort at the time. We should/could probably revisit that decision at some point. Mitchell has done a lot of work on it since we looked at it first, and it almost always pays off leveraging a community backed project.
Instead of using a local VM, we could also just have spun up development EC2 instances and gotten almost the same functionality. A few things kept us from that:
snapshotting (and restoring) a VM disk state is a lot easier than managing EBS volumes
network connectivity to EC2 is not always optimal from a plane/train/rooftop
latency to EC2 can occasionally be annoying
We’re pretty happy with the current setup, but of course nothing is perfect. The two main things that would be nice to fix are:
1) Making the initial setup a bit quicker. The download of the VM image, the cloning, and initial puppet run takes a while.
2) Fixing some of the issues that is, potentially, caused by VirtualBox. The most annoying one being the slow i/o. It would be nice to be able to run git on the shared git folder inside the VM.
The dev vm makes it possible to give new employees a clean machine, and have them up and running and pushing code on the first day (really!). It also removes the necessity for manual fiddling with packages and config files to keep your environment up to date (i.e. you just need to update your git repo and run puppet). It has also made it possible for us to test puppet manifest changes in a sane way, whereas before it was a question of commit and pray it worked (Of course, usually it didn’t, and a couple of follow up commits ensued). Plus so many more benefits…
It is hard to imagine we ever developed without the dev vm.
(Oh, and by the way, we are hiring)