2015-03-25

Exploring Academia

Last week I attended the Multikonferenz Software Engineering & Management 2015 in Dresden hosted by the Gesellschaft für Informatik:

My topic was Test Driven Development, but I had to rework my original talk to fit into 20 minutes and to be much less technical. As a result I created a completely new fast paced talk which draws a story line from DevOps over Test Driven Infrastructure Development into Risk Mitigation:

The conference is very different from the tech conferences I usually attend. First, I really was the only person in a T-Shirt :-/. Second, I apparently was invited as the "practitioner" while everybody else was there to talk about academic research, mostly in the form of a bachelor or master thesis.

As much as the topics where interesting, as little was there anything even remotely related to my "practical" work :-(

I still find it interesting to better combine the different worlds (academic and practical), this conference still has some way to go if it wants to achieve this goal. Maybe it would help to team up with an established tech conference and simply hold two conferences at the same time and place to allow people to freely wander between the worlds.

I also had some spare time and visited the Gläserne Manufaktur where VW assembles Phaeton and Bentley cars. They take pride in the fact that 95% of the work is done manually, but sadly nobody asked me about my T-Shirt:
I am squinting so much because that days had a really bright sun. In the background is a XL1, a car that consumes less than 1ℓ of fuel per 100km.

2015-03-17

A Nice Day at CeBIT 2015

After many years of abstinence I went back to visit the CeBIT today. And actually enjoyed it a lot. It is funny to see how everything is new but nothing changed. From the oversized booths of the big players like IBM and Microsoft to the tiny stalls of Asian bank note counting machine vendors. From the large and somewhat empty government-IT-oriented booths to meeting old acquaintances and friends.
But there are also several notably new things to see: For example Huawei shows itself being an important global player with a huge booth next to IBM.
I managed only to visit a third of the exhibition but it was more than I could absorb in a single day. Nevertheless, my missing was accomplished with giving a talk about “Open Source, Agile and DevOps at ImmobilienScout24”. The talk is much more high-level than my usual talks and tries to give a walk through overview. There were about 60-80 people attending my talk and the questions showed that the topic was relevant for the audience. So maybe giving management-level talks is the right thing to do for CeBIT.
Meeting people is the other thing that still works really well at the CeBIT. Without a prior appointment I was able to meet with Jürgen Seeger from iX magazine about my next ideas for articles and with people from SEP about better integrating their backup tool SESAM and Relax-and-Recover.
The new CeBIT concept of focusing on the professional audience seems to work, I noticed much less bag-toting swag-hunting people than last time. All in all I think that attending for one day is worth the trouble and enough to cover the important meetings.

Random Impressions

IBMs Watson wants to be a physician.

Video conferencing with life-sized counterparts. 4K really does make a difference!

Why buy 4 screens if you can buy 1 (QM85D)? Samsung has a lot more to offer than just phones.

Definitively my next TV (QM105D). 105", 21:9 ratio and 2.5 meters wide.

Another multimedia vendor? WRONG! This is "just" a storage box!

Though is seems like storage is no longer the main focus for QNAP.

Cyber crime is big - cyber police still small

Virtual rollercoaster at Heise - barf bags not included.

Deutsche Telekom always has a big booth and represents the top of German IT development. To underline the "Internet of Things" a bunch of robot arms was dancing with magenta umbrellas.

Dropbox comes to CeBIT in an attempt to win business customers. The data is still hosted in the USA, but the coffee was great.

And finally, even the weather was nice today.

2015-03-05

Injecting a Layer of Automation

Relax and Recover is the leading Open Source solution for automated Linux disaster recovery. It was once the pride of my work and is now totally irrelevant at my current job at ImmobilienScout24.

Why? Simply because at ImmobilienScout24 we invest our time into automating the setup of our servers instead of investing into the ability to automatically recover a manually configured system. Sounds simple but this is actually a large amount of work and not done in a few days. However, if you persist and manage to achieve the goal the rewards are much bigger: Don't be afraid of troubles, based on our automation we can be sure to reinstall our servers in a very short time.

The following idea can help to bridge the gap if you cannot simply automate all your systems but still want to have a simplified backup and disaster recovery solution:

Inject a layer of automation under the running system.

The provisioning and configuration of the automation layer should be of course fully automated. The actual system stays manually configured but runs inside a Linux container (LXC, docker, plain chroot ...) and stays as it was before. The resource loss introduced by the Linux container and an additional SSH daemon is negligible for most setups.

The problem of backup and disaster recovery for systems is converted to a problem of backup and restore for data, which is fundamentally simpler because one can always restore into the same environment of a Linux container. The ability to run the backup in the automation layer also allows using smarter backup technologies like LVM or file system snapshots with much less effort.

I don't mean to belittle the effort that it takes to build a proper backup and restore solution, especially for servers that have a high change rate in their persistent data. This holds true for any database like MySQL and is even more difficult for distributed database systems like MongoDB. The challange of creating a robust backup and restore solution stays the same regardless of the disaster recovery question. Disaster recovery is always an on-top effort that complements the regular backup system.

The benefit of this suggestion lies in the fact that it is possible to replace the effort for disaster recovery with another effort investing into systems automation. That approach will yield much more value: A typical admin will use systems automation much more often than disaster recovery. Another way to see this difference is that disaster recovery is optimizing the past while systems automation is optimizing the future.

The automation layer can also be based on one of the minimal operation systems like CoreOS, Snappy Ubuntu Core or Red Hat Atomic Host. In that case new services can be established with full automation as docker images opening up a natural road to migrate the platform to be fully automated. And to gracefully handle the manually setup legacy systems without disturbing the idea of an automated platform.

If you already have a fully automated platform but suffer from a few manually operated legacy systems then this approach can also serve as a migration strategy to encapsulate those legacy systems in order to keep them running as-is.

Update 12.03.2015: Added short info about Relax and Recover and explain better why it pays more to invest into automation instead of disaster recovery.

2015-02-27

mod_remoteip backport for Apache HTTPD 2.2

Apache HTTPD 2.4 has a very useful new feature for large deployments: Replacing the remote IP of a request from a request header, e.g. set by a load balancer or reverse proxy. Users of Apache HTTPD 2.2 as found on RHEL6 can now use the backport found on https://github.com/ImmobilienScout24/mod_remoteip-httpd22.

I pulled this backport together from various sources found on the Internet and "it seems to work". Working with C code (which I did not do for 14 years!) tought me again the value of test driven development and modern programming languages. Unfortunately I still can't explain a change like this without a lot of thinking:
You can easily build an RPM from the code on GitHub. The commit history shows the steps I had to undertake to get there. Configuration is as simple as this:


LoadModule remoteip_module modules/mod_remoteip.so
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 10.100.15.33

with the result that a reverse proxy on 10.100.15.33 can set the X-Forwarded-For header. Apache configuration like Allow From can then use the regular client IP even though the client does not talk directly to the web server.

2015-02-18

Simplified DEB Repository



2 years ago I wrote about creating a repository for DEB packages with the help of reprepro. And since then I suffer from the complexity of the process and cumbersome reprepro usage:
  • Complicated to add support for new Ubuntu version which happens every 6 months
  • Need to specifically handle new architectures
  • I actually don't need most of the features that reprepro supports, e.g. managing multiple repos in one or package staging
This week I realized that for there is a much simpler solution for my needs: apt-ftparchive. This tool creates a trivial repo with just enough information to make apt happy. For my purposes that is enough. All what I want from a DEB repo is actually
  • Work well with 50-500 packages
  • Easy to add new Debian/Ubuntu/Raspbian versions or architectures
  • Simple enough for me to understand
  • GPG signatures
It turns out that the trivial repo format is enough for that, it makes it even simpler to add new distro versions because the repo does not contain any information about the distro versions. That means that the repo does not change for new distros and that I don't need to change the sources.list line after upgrades.

The following script maintains the repo. It will copy DEB packages given as command line parameters into the repository or simply recreate the metadata. The script uses gpg to sign the repo with your default GPG key. If you want to maintain a GPG key as part of the repo then you can create a key sub-directory which will be used as GPG home directory.


The script expects a config_for_release file in the repo that contains some extra information:

To add this repo to your system add something like this to your /etc/apt/sources.list:

2015-02-07

Ubuntu Guest Session Lockdown

The guest session is a very important feature of Ubuntu Linux. It makes it very simple to give other people temporary computer or Internet access without compromising the permanent users of the computer.

Unfortunately the separation is not perfect, the guest user can actually modify critical configuration settings on the computer and even access the files of the other users, if they don't take precautions.

The following scripts and files help to lock down the guest session so that no harm can be done.

How It Works

The guest session is actually a feature of the LightDM Display Manager that is used in Ubuntu and in Xubuntu. The guest session is enabled by default.

When a user chooses a guest session the following happens:
  1. LightDM uses the /usr/sbin/guest-account script to setup a temporary guest account. The home directory is created in memory (via tmpfs) and can occupy at most half the RAM of the computer.
    Optionally, /etc/guest-session/prefs.sh is included as root to further customize the guest account.
  2. LightDM starts a new session as this account.
  3. The guest account runs the /usr/lib/lightdm/guest-session-auto.sh start script.
    Optionally, /etc/guest-session/auto.sh is included to run other start tasks as the guest user.
The guest session behaves like a regular Ubuntu session including full access to removable storage media, printers etc.

Upon session termination LightDM uses the /usr/sbin/guest-account script to remove the temporary account and its home directory.

Securing Other Users

The default umask is 022 so that by default all user files are world-readable. That makes them also readable by the guest session user. For more privacy the umask should be set to 007 for example in /etc/profile.d/secure-umask.sh:

Preventing System Modifications

Much worse is the fact that by default every user - including the guest account - can modify a lot of system settings like the network configuration. The following PolicyKit Local Authority policy prevents the guest account from changing anything system related. It still permits the user to handle removable media and even to use encrypted disks. It should be installed as /etc/polkit-1/localauthority/90-mandatory.d/guest-lockdown.pkla:

Preventing Suspend and Hibernate

LightDM has IMHO a bug: If one opens a guest session and then locks that, it is impossible to go back to that guest session. Choosing "guest session" from the LightDM unlock menu will actually create a new guest session instead of taking the user back to the existing one. In the customization below we disable session locking for the guest altogether for this reason.

The same happens after a system suspend or hibernate because that also locks all sessions and shows the LightDM unlock login screen. The only "safe" solution is to disable suspend and hibernate for all users with this policy. It should go to /etc/polkit-1/localauthority/90-mandatory.d/disable-suspend-and-hibernate.pkla:

Customizing the Guest Session

To customize the guest session the following /etc/guest-session/prefs.sh and /etc/guest-session/auto.sh scripts are recommended. The prefs.sh script is run as root before switching to the guest account. It creates a custom Google Chrome icon without Gnome Keyring integration (that would ask for a login password which is not set) and disables various autostart programs from running. They are not needed for non-admin users.

This auto.sh script is run at the start of the guest session under the guest account. It configures the session behaviour. Most important is to disable the screen locking because it is impossible to return to a locked guest session. I decided to also completely disable the screen saver since I expect guest users to terminate their session and not let it run for a long time.

The other customizations are mostly for convenience or to set useful defaults:

  • Disable the shutdown and restart menu items.
  • Configure the keyboard layout with 2 languages (you probably want to adjust that as needed).
  • Show full date and time in the top panel.
  • Set the order of launcher icons in the Unity launcher.
It is fairly easy to find out other settings with gsettings list-recursively if required.

With these additions the guest session can be a very useful productivity tool. At our school the students use only the guest session of the Ubuntu computers. They got quickly used to the fact that they must store everything on their USB thumb drives and the teachers enjoy "unbreakable" computers that work reliably.

2015-01-29

No Site VPN for Cloud Data Centers

A site to site VPN is the standard solution for connecting several physical data center locations. Going to the Cloud, the first idea that comes to mind is to also connect the Cloud "data center" with a site VPN to the existing physical data centers. All Cloud providers offer such a feature.

But is such a VPN infrastructure also a "good idea"? Will it help us or hinder us in the future?

I actually believe that for having many data centers a site VPN infrastructure is a dangerous tool. On the good side it is very convenient to have and to set up and it simplifies a lot of things. On the other side it is also very easy to build a world-wide mesh of dependencies where a VPN failure can severly inhibit data center operations or even take down services. It also lures everybody into creating undocumented backend connection between services.

The core problem is in my opinion one of scale. Having a small number (3 to 5) of locations is fundamentally different from having 50 or more locations. With 3 locations it is still feasable to build a mesh layout, each location talking to each other. With 5 locations a mesh already needs 10 connection, which starts to be "much". But a star layout always has a central bottleneck. With 50 locations a mesh network already needs 1225 connections.

With the move from the world of (few) physical data centers to the world of cloud operations it is quite common to have many data centers in the cloud. For example, it is "best practice" to use many accounts to separate development and production or to isolate applications or teams (See Yavor Atanasov from BBC explaining this at the Velocity Europe 2014). What is not so obvious from afar is that actually each cloud account is a separate data center of its own! So having many teams can quickly lead to having many accounts, I have talked to many companies who have between 30 and 100 cloud accounts!

Another problem is the fact that all the data centers would need to have different IP ranges plus one needs another (small) IP range for each connection etc. All of this is a lot of stuff to handle.

As an alternative approach I suggest not using site VPNs altogether. Instead, each data center should be handled as an independant data center altogether. To make this fact transparent, I would also suggest to use the same IP range in all data centers!

I see many advantages to such a setup:

  • All connections between services in different data centers must use the public network.
    As a result all connections have to be secured and audited and supervised properly. They have to be part of the production environment and will be fully documented. If a connection for one application fails, other applications are not impaired.
  • Standard ressources can be offered under standard IPs to simplify bootstrapping (e.g. 192.168.10.10 is always the outgoing proxy or DNS server or something other vital).
  • If development and production are in separate "data centers" (e.g. cloud accounts), then they can be much more similar.
  • A security breach in one account does not easily spill over into other accounts.
  • Operations and setup of the cloud accounts and physical data centers becomes much simpler (less external dependencies).
  • It will be easy to build up systemic resilience as a failure in one data center or account does not easily affect other data centers or accounts.
  • Admins will be treated as road warriors and connect to each data center independantly as needed.
Currently I am looking for arguments in favor and against this idea, please share your thoughts!



2015-01-22

PPD - Pimp your Printer Driver

I recently got myself a new printer, the HP Officejet Pro X476dw. A very nice and powerful machine, it can not only print double sided but also scan, copy and send faxes.

And of course it has very good Linux support, thanks to the HP Linux Printing and Imaging Open Source project. On my Ubuntu 14.10 desktop everything is already included to use the printer.

However, the first printouts where very disappointing. They looked coarse and ugly, much worse than prints from my old HP LaserJet 6 printer. After overcoming the initial shock I realized that only prints from my Ubuntu desktop where bad while prints over Google Cloud Print where crisp and good looking.

So obviously something has to be wrong with the printer drive on Ubuntu!

After some debugging I was able to trace this down to the fact that by default CUPS converts the print job to 300 dpi PostScript before giving it to the hp driver, as it shows in the CUPS logs:

D [Job 261] Printer make and model: HP HP Officejet Pro X476dw MFP
D [Job 261] Running command line for pstops: pstops 261 schlomo hebrew-test.pdf 1 'finishings=3 media=iso_a4_210x297mm output-bin=face-down print-color-mode=color print-quality=4 sides=one-sided job-uuid=urn:uuid:c1da9224-d10b-3c2f-6a99-487121b8864c job-originating-host-name=localhost time-at-creation=1414128121 time-at-processing=1414128121 Duplex=None PageSize=A4'
D [Job 261] No resolution information found in the PPD file.
D [Job 261] Using image rendering resolution 300 dpi
D [Job 261] Running command line for gs: gs -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=ps2write -sOUTPUTFILE=%stdout -dLanguageLevel=3 -r300 -dCompressFonts=false -dNoT3CCITT -dNOINTERPOLATE -c 'save pop' -f /var/spool/cups/tmp/066575456f596

I was able to fix the problem by adding this resolution setting to the PostScript Printer Definitions (PPD):

*DefaultResolution: 600x600dpi

As a result the print job is converted at 600 dpi instead of 300 dpi which leads to the expected crisp result:

D [Job 262] Printer make and model: HP HP Officejet Pro X476dw MFP
D [Job 262] Running command line for pstops: pstops 262 schlomo hebrew-test.pdf 1 'Duplex=None finishings=3 media=iso_a4_210x297mm output-bin=face-down print-color-mode=color print-quality=4 sides=two-sided-long-edge job-uuid=urn:uuid:83e69459-c350-37e5-417d-9ca00f8c6bd9 job-originating-host-name=localhost time-at-creation=1414128153 time-at-processing=1414128153 PageSize=A4'
D [Job 262] Using image rendering resolution 600 dpi
D [Job 262] Running command line for gs: gs -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=ps2write -sOUTPUTFILE=%stdout -dLanguageLevel=3 -r600 -dCompressFonts=false -dNoT3CCITT -dNOINTERPOLATE -c 'save pop' -f /var/spool/cups/tmp/0666d544aec68

Isn't it really nice that one only needs a text editor to fix printer driver problems on Linux (and Mac)?

On github.com/schlomo/HP_Officejet_Pro_X476dw I maintain an improved version of the PPD file with the following features:
  • Set printing resolution to 600dpi
  • Use printer for multiple copies, not CUPS
  • Default to duplex printing
The corresponding Launchpad bug is still open und unresolved. Apparently it is not simply to submit improvements upstream.

2015-01-16

Comparing Amazon Linux

Since ImmobilienScout24 decided to migrate to a public cloud I have been busy looking at various cloud offerings in detail. Amazon Web Services (AWS) has a special feature which is interesting: Amazon Linux is a fully supported, "RHEL like", RPM-based Linux distribution.

While not beeing a true Red Hat Enterprise Linux clone like CentOS or Scientific Linux (which is the standard OS for the ImmobilienScout24 data centers), it is derived from some Fedora version and comes with a nice choice of current software. To me it feels like "RHEL +" because so far all our internal stuff worked well but a lot of software packages are much newer than on RHEL 6 or RHEL 7. The 2014.09 release updated a lot of components to very recent versions.

On the other hand, we also found packages missing from Amazon Linux, most notably desktop-file-utils. This package is required to install Oracle Java RPMs. I found a thread about this on the AWS Forums and added a request for desktop-file-utils in September 2014. In the mean time the package was added to Amazon Linux, although the forum thread does not mention it (yet).

To find out in advance if there are any other surprises waiting for us on Amazon Linux, I created a little tool to collect RPM Provides lists from different Linux distros on AWS. github.com/ImmobilienScout24/aws-distro-rpm-comparison takes a VPC and one or several AMI IDs and spins up an EC2 instance for each to collect the list of all the RPM provides from all available YUM repositories.

$ ./aws-distro-rpm-comparison.py -h
Create EC2 instances with different Linux distros and compare
the available RPMs on them.

Usage:
  aws-distro-rpm-comparions.py [options] VPC_ID USER@AMI_ID...

Arguments:
  VPC_ID        VPC_ID to use
  USER@AMI_ID   AMI IDs to use with their respective SSH user

Options:
  -h --help            show this help message and exit
  --version            show version and exit
  --region=REGION      use this region [default: eu-west-1]
  --type=TYPE          EC2 instance type [default: t2.micro]
  --defaultuser=USER   Default user to use for USER@AMI_ID [default: ec2-user]
  --verbose            Verbose logging
  --debug              Debug logging
  --interactive        Dump SSH Key and IPs and wait for before removing EC2 instances

Notes:

* The AMI_IDs and the EC2 instance type must match (HVM or PV)
This list can then be used to compare with the RPM Requires in our data centers. To get a better picture of Amazon Linux I created such lists for Red Hat Enterprise Linux 6 and 7, CentOS 6 and Amazon Linux in the Github project under results. For online viewing I created a Google Spreadsheet with these list, you can copy that and modify it for your own needs.

Open the results in Google Drive Sheets.
At a first glance it seems very difficult to say how compatible Amazon Linux really is as there are a lot of RPM Provides missing on both sides. But these lists should prove useful in order to analyze our existing servers and to understand if they would also work on Amazon Linux. The tools can be also used for any kind of RPM distro comparison.

In any case, Amazon Linux is exactly that what RHEL cannot be: A stable RPM-based distribution with a lot of recent software and regular updates.

2014-10-26

DevOpsDays Berlin 2014

Update: Read my (German) conference report on heise developer.

Last week I was at the DevOps Days Berlin 2014. This time at the Kalkscheune, a much better location than the Urania from last year. With 250 people the conference was not too full and the location was also well equipped to handle this amount.

Proving DevOps to be more about people and culture, most talks where not so technical but emphasized the need to take along all the people on the journey to DevOps.

An technical bonus was the talk by Simon Eskildsen about "Docker at Shopify" which was the first time that I heard about a successful Docker implementation in production.

Always good to know is the difference between effective and efficient as explained by Alex Schwartz in "DevOps means effectiveness first". DevOps is actually a way to optimize for effectiveness before optimizing for efficience.

Microsoft and SAP gave talks about DevOps in their world - quite impressive to see DevOps beeing main stream.

My own contribution was an ignite talk about ImmobilienScout24 and the Cloud:


And I am also a certified DevOps now: