Automated Raspbian Setup for Raspberry Pi

Update (2019): https://github.com/schlomo/rpi-image-creator is the new home of the code.

Recently we got a whole bunch of Raspberry Pi systems at work - the cheapest platform for building Dashboards.

Everybody loves those little cute boxes - but nobody wants to deal with the setup and maintenance. The kiosk-browser package is of course also the base of our Pi-based setup. But how does it get onto the Pi?

The solution is a Bash script: rpi-image-creator available on the ImmobilienScout24 GitHub project. It automates the setup of a Pi with Raspbian by downloading a Raspbian image, customizing it a bit and writing it to a SD card. The reason to write my own script where the following features:

  1. It creates the file systems on the SD card instead of dumping a raw image onto the SD card. That way the partitions are all aligned properly and have the right size from the beginning. No later resizing needed
  2. It removes all the stuff that Raspbian runs at the first boot so that the resulting SD cards are production ready (IMHO the goal of any automation)
  3. It does some basic setup, adds my SSH keys, disables the password of the standard user
  4. It configures fully automated updates
  5. It adds our internal DEB repo with the internal GPG key and installs packages from there. In our case those packages do most of the customization work so that this script stays focused on the Pi-specific stuff
As a result I have now a one-step process that turns a fresh SD card into a ready system.

The script uses some tricks to do its work:
  • Use qemu-arm-static to chroot into the Raspbian image on my Desktop. There are a whole load of other recipes on the Internet for doing that.
  • Temporarily disable the optimized memcpy and memset functions in /etc/ld.so.preload as otherwise qemu-arm-static will just crash without a clear error message
  • Mount the SD card file systems with aggressive caching to speed up writing to it. The ext4 filesystem is configured with the journal_data mount option to increase the resilience against sudden power losses
The script can be customized through a rpi-image-creator.config file in the current directory:


RSH Pitfall: Remote Exit Code

While writing some test scripts that use rsh (see below about why) to run commands on a remote test server I noticed that rsh and ssh have a significant difference:

ssh reports the remote exit code and rsh does not

As a result all my tests did not test anything, the error condition was never triggered:

My solution is this rsh wrapper:

The reason for using rsh instead of ssh is very simple: In a fully automated environment it provides the same level of security as ssh without the added trouble of maintaining it:

I need to make sure that ssh continues to work after all the SSH host keys change (e.g. after I reinstall the target). Also, to allow unattended operation I must deploy the SSH private keys so that in the end others could also extract them and use them. In the end I would be using IP/hostname restrictions on the SSH server side to restrict the use of the private key.

With rsh I don't need to worry about deploying or maintaining keys and just configure the RSH server to allow connections from my build hosts. Reinstalling either the build hosts or the target host does not change the name so that the RSH connection continues to work.

A couple of months ago I also co-authored an article about SSH security in automated environments in the German Linux Magazin called "Automatische Schlüsselverifikation". The article goes deeper into thchallange of using SSH in a fully automated environment and suggests to explore Kerberized RSH as an easier to maintain alternative.


Test Driven Infrastructure

Yesterday I was at the Berlin DevOps meetup and we had a very nice fishbowl about Test Driven Infrastructure (TDI). I used my Lightning Talk from the PyCon in Köln as an introduction to the topic, but quickly realized that the term does not fully explain itself.

Test Driven Development in itself is not a new thing, maybe it is not yet common to apply it to platform operations. As an old Ops guy I had a lot to learn when I started to work at ImmobilienScout24, which is a real software development company.

The bottom line is really simple:
Untested = Broken
My idea of TDI is to apply most of the basic ideas of TDD also to the development process of the software that runs our platform. Again, the same thing as the developers do with their code already for a long time. Let's just say that we start to test all code that goes on a server, no matter who wrote it or what it actually does.

Some specific examples that we did in the last month:

  • A service that mounts SAN LUNs (via file system labels) is tested on a server with a mock SAN LUN. The mock is just a loop-back device with a filesystem on it that has a suitable label.

    The tests check that the services gets installed and activated properly and that starting/stopping the service actually mounts/umounts the file system.
  • A package containing configuration files for a Squid proxy server is tested on a server to make sure that the rules actually apply correctly. The test server has no Internet access because even a 5xx return code means that the proxy allowed the request to pass.
  • The packages that make up a cluster of subversion servers are put through a large integration test chain on a test cluster to make sure that the automated backup and restore, the cluster failover, the various post-commit hooks and other scripts involved there actually work correctly.
All these cases (and many others) have one thing in common: If the packages pass the tests then they are automatically propagated to our production environment and automatically installed on the respective servers.

And exactly that is the main motivation for investing the effort to create those tests for infrastructrue components:
  • Continuous Delivery for everything that goes on a server.
  • Trusting the tests means trusting other people - even those who don't know so much about a system - to fix things and add features. If their changes pass the tests, most likely no harm will be done. If something breaks in production, we first fix the test and then the code.
  • Testing each commit and deploying all successful commits to production gives us small changes with very little risk for each change.
  • Having tests means that our system run much more stable and reliably. Even if no customer is hurt by a short outage of a server, our colleagues really value a stable environment for their own work.
DevOps is not only about Devs doing more Ops work, it is much more also about Ops starting to think like Devs about their work.

Some ideas that helped us to start doing more tests with our infrastructure components:
  • Do unit tests at package build time: Syntax check everything. Run tools on test data and check the result.
  • Do system tests on test servers: Try to mock everything that is not directly relevant to the test subject. Linux is a great tool for mocking stuff. Be creative in using firewall tricks, loop-back connections / mounts, fake services or data etc. to reduce the need for external systems to run a test.
  • Do integration tests by running more complex scenarios on test servers with test data.
The longer a test runs the later in the delivery chain it should come. Beware of making huge test scenarios where you will spend a lot of time debugging problems. Small tests fail faster and tell you immediately where to look for the problem.

I hope to collect more information about TDI and to present my findings at a conference next year.

Update Ocotober 2014: I gave an improved talk about DevOps Risk Mitigation at the EuroPython 2014 (video).
Update August 2014: I wrote a Linux Magazin article "Testgetrieben".
Update April 2014: I gave a talk about TDI at the Open Source Data Center Conference 2014 (video).