Automated Raspbian Setup for Raspberry Pi

Update (2019): https://github.com/schlomo/rpi-image-creator is the new home of the code.

Recently we got a whole bunch of Raspberry Pi systems at work - the cheapest platform for building Dashboards.

Everybody loves those little cute boxes - but nobody wants to deal with the setup and maintenance. The kiosk-browser package is of course also the base of our Pi-based setup. But how does it get onto the Pi?

The solution is a Bash script: rpi-image-creator available on the ImmobilienScout24 GitHub project. It automates the setup of a Pi with Raspbian by downloading a Raspbian image, customizing it a bit and writing it to a SD card. The reason to write my own script where the following features:

  1. It creates the file systems on the SD card instead of dumping a raw image onto the SD card. That way the partitions are all aligned properly and have the right size from the beginning. No later resizing needed
  2. It removes all the stuff that Raspbian runs at the first boot so that the resulting SD cards are production ready (IMHO the goal of any automation)
  3. It does some basic setup, adds my SSH keys, disables the password of the standard user
  4. It configures fully automated updates
  5. It adds our internal DEB repo with the internal GPG key and installs packages from there. In our case those packages do most of the customization work so that this script stays focused on the Pi-specific stuff
As a result I have now a one-step process that turns a fresh SD card into a ready system.

The script uses some tricks to do its work:
  • Use qemu-arm-static to chroot into the Raspbian image on my Desktop. There are a whole load of other recipes on the Internet for doing that.
  • Temporarily disable the optimized memcpy and memset functions in /etc/ld.so.preload as otherwise qemu-arm-static will just crash without a clear error message
  • Mount the SD card file systems with aggressive caching to speed up writing to it. The ext4 filesystem is configured with the journal_data mount option to increase the resilience against sudden power losses
The script can be customized through a rpi-image-creator.config file in the current directory:


RSH Pitfall: Remote Exit Code

While writing some test scripts that use rsh (see below about why) to run commands on a remote test server I noticed that rsh and ssh have a significant difference:

ssh reports the remote exit code and rsh does not

As a result all my tests did not test anything, the error condition was never triggered:

My solution is this rsh wrapper:

The reason for using rsh instead of ssh is very simple: In a fully automated environment it provides the same level of security as ssh without the added trouble of maintaining it:

I need to make sure that ssh continues to work after all the SSH host keys change (e.g. after I reinstall the target). Also, to allow unattended operation I must deploy the SSH private keys so that in the end others could also extract them and use them. In the end I would be using IP/hostname restrictions on the SSH server side to restrict the use of the private key.

With rsh I don't need to worry about deploying or maintaining keys and just configure the RSH server to allow connections from my build hosts. Reinstalling either the build hosts or the target host does not change the name so that the RSH connection continues to work.

A couple of months ago I also co-authored an article about SSH security in automated environments in the German Linux Magazin called "Automatische Schlüsselverifikation". The article goes deeper into thchallange of using SSH in a fully automated environment and suggests to explore Kerberized RSH as an easier to maintain alternative.


Test Driven Infrastructure

Yesterday I was at the Berlin DevOps meetup and we had a very nice fishbowl about Test Driven Infrastructure (TDI). I used my Lightning Talk from the PyCon in Köln as an introduction to the topic, but quickly realized that the term does not fully explain itself.

Test Driven Development in itself is not a new thing, maybe it is not yet common to apply it to platform operations. As an old Ops guy I had a lot to learn when I started to work at ImmobilienScout24, which is a real software development company.

The bottom line is really simple:
Untested = Broken
My idea of TDI is to apply most of the basic ideas of TDD also to the development process of the software that runs our platform. Again, the same thing as the developers do with their code already for a long time. Let's just say that we start to test all code that goes on a server, no matter who wrote it or what it actually does.

Some specific examples that we did in the last month:

  • A service that mounts SAN LUNs (via file system labels) is tested on a server with a mock SAN LUN. The mock is just a loop-back device with a filesystem on it that has a suitable label.

    The tests check that the services gets installed and activated properly and that starting/stopping the service actually mounts/umounts the file system.
  • A package containing configuration files for a Squid proxy server is tested on a server to make sure that the rules actually apply correctly. The test server has no Internet access because even a 5xx return code means that the proxy allowed the request to pass.
  • The packages that make up a cluster of subversion servers are put through a large integration test chain on a test cluster to make sure that the automated backup and restore, the cluster failover, the various post-commit hooks and other scripts involved there actually work correctly.
All these cases (and many others) have one thing in common: If the packages pass the tests then they are automatically propagated to our production environment and automatically installed on the respective servers.

And exactly that is the main motivation for investing the effort to create those tests for infrastructrue components:
  • Continuous Delivery for everything that goes on a server.
  • Trusting the tests means trusting other people - even those who don't know so much about a system - to fix things and add features. If their changes pass the tests, most likely no harm will be done. If something breaks in production, we first fix the test and then the code.
  • Testing each commit and deploying all successful commits to production gives us small changes with very little risk for each change.
  • Having tests means that our system run much more stable and reliably. Even if no customer is hurt by a short outage of a server, our colleagues really value a stable environment for their own work.
DevOps is not only about Devs doing more Ops work, it is much more also about Ops starting to think like Devs about their work.

Some ideas that helped us to start doing more tests with our infrastructure components:
  • Do unit tests at package build time: Syntax check everything. Run tools on test data and check the result.
  • Do system tests on test servers: Try to mock everything that is not directly relevant to the test subject. Linux is a great tool for mocking stuff. Be creative in using firewall tricks, loop-back connections / mounts, fake services or data etc. to reduce the need for external systems to run a test.
  • Do integration tests by running more complex scenarios on test servers with test data.
The longer a test runs the later in the delivery chain it should come. Beware of making huge test scenarios where you will spend a lot of time debugging problems. Small tests fail faster and tell you immediately where to look for the problem.

I hope to collect more information about TDI and to present my findings at a conference next year.

Update Ocotober 2014: I gave an improved talk about DevOps Risk Mitigation at the EuroPython 2014 (video).
Update August 2014: I wrote a Linux Magazin article "Testgetrieben".
Update April 2014: I gave a talk about TDI at the Open Source Data Center Conference 2014 (video).


Magic ISO Image Booting with GNU GRUB 2

Recently I needed to prepare a USB thumb drive with several Ubuntu installations. A little research quickly yielded many setup instructions, for example like the one from Pendrivesystem.com.  I was really surprised at how well this works and wanted to understand it better.

In essence all recipes rely on GNU GRUB in version 2 and the loopback feature that it contains and on the OS's ability to work off an ISO image. The loopback command mounts a CD or HDD image that contains the kernel and initrd from an ISO image. As a result one can put several ISO images on the boot media without the need to extract them. The OS then also mounts the ISO image and uses that instead of a CD/DVD drive.

So here is my version of the recipe, the USB thumb drive is in /dev/sdc in my examples:

1. partition & format device

I prefer to partition the device with parted because it aligns the partition at 1MB so that it leaves enough space for GRUB to embed itself into the first sectors of the disk:

$ export $DEV=/dev/sdc
$ sudo parted -s $DEV mklabel msdos mkpart primary ext2 0% 100% print
Model: Generic Flash Disk (scsi)
Disk /dev/sdc: 1035MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1035MB  1034MB  primary  ext4

$ sudo mkfs.ext4 -qL INSTALL ${DEV}1

Note: It seems that many BIOS boot from this only if the partition is not marked active!

2. mount & install grub

$ sudo mount -vL INSTALL /mnt
mount: you didn't specify a filesystem type for /dev/sdc1
       I will try type ext4
/dev/sdc1 on /mnt type ext4 (rw)

$ sudo grub-install --boot-directory /mnt /dev/sdc
Installation finished. No error reported.

3. Add some ISOs that support loopback booting

$ sudo wget -P /mnt \
Length: 726663168 (693M) [application/x-iso9660-image]
Saving to: ‘/mnt/xubuntu-12.04.3-desktop-i386.iso’

100%[==============>] 726,663,168 2.71MB/s   in 6m 12s 

4. Create GRUB menu entry

$ sudo tee /mnt/grub/grub.cfg >/dev/null <<EOF
menuentry "XUbuntu 12.04.3 i386" {
    set iso=/xubuntu-12.04.3-desktop-i386.iso
    loopback loop $iso
    linux (loop)/casper/vmlinuz boot=casper iso-scan/filename=$iso noeject noprompt splash locale=de_DE bootkbd=de console-setup/layoutcode=de --
    initrd (loop)/casper/initrd.lz

Here the trick is to set a variable (iso) with the ISO file, loopback mount the iso image and load kernel/initrd from the ISO image. The same ISO image is also passed as a parameter to the booting OS.

5. Umount & try it out

Thermionix published a nice example of a fancy grub.cfg, it contains many examples for loopback booting with various distros and tools.


Setting hostname from DHCP in Debian

For our team monitors that use the Kiosk Browser on a Raspberry Pi I am building a Raspbian wheezy image. One requirement is that the system will configure the hostname from DHCP so that we can use the same image for all devices.

Sadly, setting the hostname from DHCP is not trivial in Debian, here is the result of my research into the topic. I found 2 things to be essential and learned both of them by analysing the dhclient script.

1. Set the hostname to localhost

The first thing to do is to set the system hostname to localhost:

    # echo localhost >/etc/hostname

2. Workaround for broken dhclient-script

The dhclient-script has (IMHO) a bug: If there is an old DHCP lease with a hostname in the lease database (e.g. in /var/lib/dhcp/dhclient.eth0.leases), then the script will not set the hostname from DHCP even if the system name is still localhost.

To circumvent this bug simply create a dhclient enter hook to unset the old host name variable:

    # echo unset old_host_name \

The result is that now the hostname will be always set from DHCP, even if something went wrong or if there are old lease infos.


Idea: Electric Family Van

Today I was walking a bit on the way home. After the third silent taxi (Toyota Prius Hybrid) passed by I started to think about what would be the ideal electric car for my use case. We are a large household and won't do with the standard 4 or 5 seat cars that are now beeing offered as electric or hybrid cars. Since the car makers don't offer what I need I am posting my idea here.

This is an approximation of our current family car:

It is nice, fairly large, seats 8 people comfortably and even has some extra luggage space in the back. We use it for short trips in the city to go shopping, drive the kids around and do the occasional family sunday trip to the surrounding country side.

When we go travelling, we usually need a larger trunk so we have this add-on trunk:

This is a trailer which is about half as long as the car and of the same height and width. It fits everything we need even for a long vacation. Or all our bikes for a shorter one. Or the insanely large family shopping at IKEA. Or ...

So, in our case it almost never happens that we travel far without our trailer attached. So why not use the trailer for the extra engine to boost an electric van? It would reduce the trailers capacity by a little and increase the range of the electric family van by a lot.

For us the combination of an electric car with an add-on range extender would be the best. In the city we charge the car from a wall socket and drive not too much per day. For trips we hook up the trailer which extends both the luggage carrying capacity and the driving range. With this combination we could still pass a good distance on the batteries, e.g. in a electric-only zone.

At the destination of our trip we could again take off the trailer and do short trips purely electrical.

Dear car makers: Start to produce useful and affordable electric cars and I'll be most happy to buy one. Think about modularization and reusablity to go beyond electrifying standard cars.

Update: I just found www.eptender.com who offer a tiny trailer (for small cars) with this idea, which proves that the idea is working :-)


Always good for a surprise: PyConDE 2013

I was again at the PyConDE, this year in Cologne.

As before, the conference was a mix between different types of talks. It seems like this year we had less people attending compared to last year, at least the crowd in the main hall looked much smaller (see photo).

Many talsk where really interesting, but the lighning talks where the real highlight with lots of funny, useful or astonishing talks.b

The conference included a beginners programming competition for school students (13-21), the 2 winners showed their project in the opening keynotes. The project had to do something with Blender and Python and one of the winners (a 13-year old boy!) presented a generated animation of a Skat game with a very solid software design. All that after just 9 months of learning Python was really stunning.

Andreas Schreiber most certainly gave up a lot of personal information in his very practical talk about the "Internet of Things", in which he showed how to connect various devices to the cloud via the Message Queuing Telemetry Transport and the Raspberry Pi. In a live demo he even checked his weight on an Internet-enabled scale that pushes the result to his phone. Besides giving this talk Andreas was the main organizer of the event.

Maybe one of the more useful talks was about using the Cython compiler to speed up Python code. The talk by Stefan Behnel gave a practical introduction into using Cython and showed with a diffsort example that Cython can already bring a 30% speed improvement by just compiling the Python code into C. Further manual optimization can yield even further speedups. In many cases it is enough to map locally used variables like array indices onto simple C data types (e.g. int or long).

The PyCon is always good for a surprise. Andreas Klostermann showed us in his talk Brain Waves for Hackers how to use Open Source tools to anaylize brain waves from common brain wave detectors. The live demo showed how this technology can help to imrove mental abilities like concentration and attention. He even programmed his own presentation software to embed the live output from his brain!

Several people on this years organization team work at the DLR, the national aeronautics and space research centre of the Federal Republic of Germany. As a result there was a whole bunch of really interesting talks from their field of work, e.g. about programming robots, using Python for scientifc purposes and other topics. Not widely known, the DLR actually publishes a lot of interesting stuff as open source on http://software.dlr.de and seems to be an open source friendly place.

ImmobilienScout24 also sponsored the event, we put up some recruiting info and brought a big box of giveaways. I tried a new slogan:


My hope is that this slogan will give people something to think about and maybe help them to think more about DevOps-style IT.

I was there with two colleagues. Maximilien Riehl and Marcel Wolf gave a talk about "Continuous building with Pybuilder". Pybuilder is an open source project to simplify automated building and testing of Python software.

Max also gave a lightning talk with the demo video about YADT, our deployment toolkit. Marcel gave a lightning talk about "Agile Software Development for small teams".

My own contributen was a talk about Open Source Sponsoring - how to convince your boss. This is the expanded version (thanks for the full hour!) of my talk at the last LinuxTag with much more focus on the "boss" part. The main thing to do is to understand how the boss thinks and what worries him/her.

My lightning talks where about Distributed Monitoring Configuration and Test Driven Infrastructure:

 Both will probably return as full conference talks next year.


Video Converter

Modern video cameras produce beautiful videos - but the resulting files take up a lot of space. For an internal video portal I had to automatically convert such camera videos (.m2ts or .mov at up to 30MBit/s) to "regular" mp4 videos that can be used with web-bases video players like Flow Player.

The result is a conversion script that uses HandBrake to convert the videos. I run it as a nightly cron job on the file share with the videos. People drop their camera videos onto the share and my Linux box automatically converts them to reasonable sizes.

Example Sizes for 23min video:
1080p Camera Video3.9 GBM2TS with AVC Video and AC3 Audio24 MBit/s
320p for iPod141 MBMP4 with AVC Video and AAC Audio900 KBit/s
720p for Small Screen373 MBMP4 with AVC Video and AAC Audio2.2 MBit/s
1080p for Large Screen1.2 GBMP4 with AVC Video and AAC Audio7.4 MBit/s

I am no expert on H264 tuning, so these values can be probably optimized quite a bit (Please provide your feedback as a comment).


Thank You LinuxTag 2013!

LinuxTag is over and I am almost dead.

Besides my own talk about Open Source Sponsoring I also gave the Data Center Automation with YADT and the Relax and Recover for UEFI Systems talk, one talk each day.

But, despite the smaller venue this year, LinuxTag is still an Open Source highlight for me. Our YADT booth was quite busy, even during the sessions.

It is really good to meet so many people again, who come every year. Even better yet, I each time also meet new people who make the whole effort worthwhile. I also think that we nicely managed to raise the general awareness for YADT.

So, big thanks to the organization team and see you next year!



How To Create a Debian/Ubuntu Repository for DEB Packages

Initial Setup

Required Debian packages: reprepro
  1. Create a directory for the repository and its configuration:
    $ mkdir -p repo/conf
  2. Create a conf/distributions configuration file like this:
  3. Put my putinrepo script into the repo or next to it:
  4. "Export" the repo to create the metadata files for the empty repo:
    $ repo/putinrepo.sh
    $ tree repo
    ├── putinrepo.sh
    ├── conf
    │   └── distributions
    ├── db
    │   ├── checksums.db
    │   ├── contents.cache.db
    │   ├── packages.db
    │   ├── references.db
    │   ├── release.caches.db
    │   └── version
    └── dists
        ├── precise
        │   ├── main
        │   │   ├── binary-amd64
        │   │   │   ├── Packages
        │   │   │   ├── Packages.gz
        │   │   │   └── Release
        │   │   ├── binary-armel
        │   │   │   ├── Packages
        │   │   │   ├── Packages.gz
        │   │   │   └── Release
        │   │   └── binary-i386
        │   │       ├── Packages
        │   │       ├── Packages.gz
        │   │       └── Release
        │   └── Release
        ├── quantal
        │   ├── main
        │   │   ├── binary-amd64
        │   │   │   ├── Packages
        │   │   │   ├── Packages.gz
        │   │   │   └── Release
        │   │   ├── binary-armel
        │   │   │   ├── Packages
        │   │   │   ├── Packages.gz
        │   │   │   └── Release
        │   │   └── binary-i386
        │   │       ├── Packages
        │   │       ├── Packages.gz
        │   │       └── Release
        │   └── Release
        └── raring
            ├── main
            │   ├── binary-amd64
            │   │   ├── Packages
            │   │   ├── Packages.gz
            │   │   └── Release
            │   ├── binary-armel
            │   │   ├── Packages
            │   │   ├── Packages.gz
            │   │   └── Release
            │   └── binary-i386
            │       ├── Packages
            │       ├── Packages.gz
            │       └── Release
            └── Release
    18 directories, 37 files
Now the repo is ready for usage. To add the repo to the local system (assuming you use quantal):
$ sudo tee /etc/apt/sources.list.d/my_repo.list <<EOF
deb file:///$(pwd)/repo quantal main

Adding Packages

putinrepo.sh <deb-files> is used to add packages to a distribution in the repo. It will automatically add the package to all codenames. Set REPREPRO_CODENAMES to an array of codenames if you want to select the targets.

Signing the Repository

apt complains if you don't sign the repo. Luckily that is really simple: Just add SignWith: yes to each section in the conf/distributions file to sign the repo with your default GPG key.

Adding a new Distribution

Adding a new distribution, e.g. wheezy, is simple:
  1. Create a new block in conf/distributions
  2. Run reprepro copymatched wheezy raring \* to copy all packages from raring to wheezy or use one of the other copy* functions to populate the new codename with content
  3. Run putinrepo.sh to export and sign the repo

See Also

These links were useful for me:


Simple Command Line Download Speed Test

I got a server that started to have a bad network connection. To help debugging this I needed to collect some data, e.g. of running a download test every 5 minutes.

A quick search revealed nothing so I found a simple way to do that with curl and Google Drive.

1. Google Drive Form

I set up a simple Google Drive form that accepts two text values:
Next I need to find out the HTML form field names of these two fields. This takes a little bit of digging.   There is always a label attached to each input field. They are connected via the for attribute of the label.
With this information I can now construct a simple curl script.

2. CURL script to collect some data and post

I want curl to try to download a URL and record the download speed. The -w, -y and -Y options come in handy for this purpose:
  • -w lets me output only the information I really care about (the speed of the download in Bytes per second)
  • -y gives a timeout after which curl will abort the download
  • -Y gives a minimum download speed (in B/s). If after -y seconds this is not achieved then the download is aborted. Choosing a very high number ensures that the download will never run longer than 60 seconds.
curl -w "%{speed_download}" -y 60 -Y 100000000000000 -s -o /dev/null "$1"
runs at most 60 seconds and prints the average download speed (e.g. 546746.000) to standard out.

The result is posted to Google Drive with another curl call. I use -s -o /dev/null as I don't care about the output of the download or the data post. The complete script looks like this:
This is it! The relevant part is curl -d <form field id>=<form field data> and the right URL of the form. Filling the form fields with a subshell $(...) is just a short form that does not need extra variables.

Call the script from a cron job and watch how data starts collecting in your Google Drive. I can even run it on different systems (giving me different probes to provide data) and later aggregate or filter the data.

I don't need to send a timestamp as this is added automatically by Google Drive Forms.

3. Google Drive spreadsheet for data analysis

In the Google Drive form one must select a destination spreadsheet for the results. In that spreadsheet I created a Chart, mark the two columns timestamp and speed and set the chart type to Trend and Timeline. The result looks already very useful:
The chart has its own publish button which gives a JavaScript block. In this block I only had to adjust the width: and height: settings to format the chart and replace the range with A:B so that it would take all the data.

Note: The spreadsheet should be set to a US or UK locale so that the XXX.000 numbers created by curl will be parsed correctly. For other locales you might have to strip the .000 before posting the data.

4. Doing this for multiple targets

The whole point of this exercise was to monitor the service quality of a VPS hoster. As the service quality is still below my needs I got another VPS at another hoster so that I had to setup the same monitoring for the other hoster as well. 

It turns out that this is really simple. I can just copy the form in Google Drive and get a new form that uses the same keys for the input fields! That makes it really easy to customize my little script like this:
And the results of the other hoster look like this:

As we can see here, the other hoster is doing much better :-)