2015-12-30

Docker Appliance as Linux Service RPM

Docker provides a convenient way to package entire applications into runnable containers. OTOH in the data center we use RPM packages to deliver software and configuration to our servers.

This wrapper build a bridge between Docker appliances and Linux services by packaging a Docker image as a Linux service into an RPM package.

The resulting Linux service can be simply used like any other Linux service, for example start the service with service schlomo start.


See the GitHub repo at https://github.com/ImmobilienScout24/docker-service-rpm for code and more details and please let me know if you find this useful.

2015-08-21

Signet Ring = Early 2 Factor Authentication

Photo: A. Reinkober / pixelio.de
I recently met somebody who had a signet ring and suddenly realized that this is a very early form of 2-factor-authentication (2FA):

Signet Ring2FA
UniqueUnique
Difficult to copySupposedly impossible to copy
Seal proves personal involvement of bearer2FA token proves personal interaction of owner

The main difference is of course that 2FA is commonly available to everybody who needs it while signet rings where and remain a special feature. But it is still nice to know that the basic idea is several thousands years old.

2015-08-07

Cloud Exit Strategy

As ImmobilienScout24 moves to the cloud a recurring topic is the question about the exit strategy. An exit strategy is a plan for migrating away from the cloud, or at least from the chosen cloud vendor.

Opinions range from "why would I need one?" to "how can we not have one?" with a heavy impact on our cloud strategy and how we do things in the cloud.

When talking about exit scenarios it is worth to distinguish between a forced and a voluntary exit. A forced exit happens due to external factors that don't leave you any choice when to go. A voluntary exit happens at your own choice, both when and how.

Why would one be force to have an exit strategy? Simple because running a business on cloud services carries other types of risks compared to running a business in your own data center:
  • Cloud accounts can be disabled for alleged violation of terms
  • Cloud accounts can be terminated
  • There are no guaranteed prices. Running costs can explode as a result of a new pricing model
  • The cloud vendor can discontinue a service that you are based on
  • Lost cloud credentials combined with weak security can be desastrous (learn from Codespaces)
  • If the cloud vendor is down you can either hope and wait or start your website somewhere else, if you where prepared. In the data center you can try all sorts of workarounds and fixes - but you must do that all yourself.
  • ... fill in your own fear and bias against the cloud ...
A voluntary exit can easily happen after some time because:
  • Another cloud vendor is cheaper, better or solves problems that your current vendor doesn’t care about
  • You are bought by another company and they run everything in another cloud, forcing you to migrate
  • ... who knows what the future will bring?
Probably there is no perfect answer that fits everybody. Besides just ignoring the question I personally see two major options:
  1. Use only IaaS (e.g. servers, storage, network) or PaaS (fancy services) from the cloud so that it is easy to migrate to another cloud vendor or to a private cloud. The big disadvantage is that you won't be able to benefit from all the cool managed services that make the cloud an interesting place to be.
  2. Use many cloud providers or accounts (e.g. matching your larger organisational units) to reduce the "blast radius" and keep the communication between them vendor independant. If something happens to one of them the damage is limited in scope and everything else keeps working. The disadvantage is that you add complexity and other troubles by dealing with a widely distributed platform.
I prefer the second option because it lays the ground for a voluntary exit while still keeping most of the advantages of the cloud as an environment and ecosystem. In case of a forced exit there is a big problem, but that could be solved with lots of resources. A forced exit for a single account can be handled without harming the other accounts and their products. As another benefit there is not much premature optimization for the exit case.

Whatever you do - I believe that having some plan is better than not having any plan.

2015-07-15

DevOps Berlin Meetup 2015-07

Is Amazon good for DevOps? Maybe yes, maybe no. But for sure the new Berlin office is good for a Berlin DevOps Meetup.

Jonathan Weiss gave a short overview over the engineering departments found here: AWS OpsWorks, AWS Solution Architects, Amazon EC2, Machine Learning.

Michael Ducy (Global Partner Evangelist at Chef Software) talks about DevOps and tells the usual story. Michael uses goats and silos as a metaphor and builds his talk from the famous goat and silo problem. He sees the "IT manufacturing process" as silos (read History of Silos for more about that) and DevOps minded people as goats: Multi-purpose, versatile, smart and stubborn at reaching their goals.
The attendees of the DevOps event probably did not need much convincing, but the talk was nevertheless very entertaining. Michael has an MBA and also gave some useful insights into how organisations evolve into silos and how organisational "kingdoms" develop.

The talk is available as video: 15min from Jan 2015 and 24min from Dec 2013. The slides are available on Slideshare.

As a funny side note it turns out that Amazon even rents out goats: Amazon Hire a Goat Grazer. However it seems that this offer is about real goats and not DevOps engineers.

2015-07-10

ImmobilienScout24 Social Day at the GRIPS Theater

Today I went to the GRIPS Theater (English) instead of the office. Once a year ImmobilienScout24 donates the work force to social projects, called Social Day. I used the opportunity to catch a glimpse behinde the stage. The theater in turn got a workshop from us about their web site and social media channels.

But first we watched a very nice children show (Ein Fest bei Baba Dengiz) about a German guy who learned respect for foreigners - from another German with Turkish background. The show was well adapted to the school-age audience.

The theater follows a somewhat unusual concept and places the stage in the middle of the audience:
Foto mit freundlicher Genehmigung des GRIPS Theaters
This was my first visit to the GRIPS Theater, but not the last. Besides a rich children programme the theater also offers shows for adults and is most famously known for the show Linie 1.

2015-05-22

Meetup Marathon

This week was my Meetup Marathon:

Software Memories and Simulated Machines was above my head. Scaling Logstash made me wonder how many engineers you actually need to run that "properly". Nix is something we hopefully don't need, Rok actually said that if you package everything you don't need it.

STUPS is the "Cloud Ops" stack from Zalando, nicely published on GitHub:

The STUPS platform is a set of tools and components to provide a convenient and audit-compliant Platform-as-a-Service (PaaS) for multiple autonomous teams on top of Amazon Web Services (AWS).

It contains a lot of tools that work together to solve a lot of the challanges related to running a large company on AWS. For me that was most definitively the highlight of this week.

Hennig explaining STUPS at the AWS User Group.

2015-05-15

OpenTechSummit 2015

Yesterday was the first OpenTechSummit in Berlin, a new conference that came partially in place of the LinuxTag. The conference squeezed a large amount of talks into a single day. The talks where either 10 or 20 minutes long and covered many non-technical topics related to open knowledge or open technology.

One thing impressed me especially: All day long there where workshops for children and youth. While some kids took their first steps in coding, others came to work together on advanced programming or hardware projects.
The date (a German state holiday) made sure that children had time to attend, many IT people came together with their children. The organizers where actually surprised by the large amount of children who registered for a free kids ticket.

I gave my "DevOps, Agile and Open Source at ImmobilienScout24" talk and put up some ImmobilienScout24 posters for our sponsoring.

2015-04-09

Better Package Than Copy



Today I realized that for me it easier to create a small package than to copy a single file.

The example is glabels-schlomo, a Debian package I created just now to store extra gLabels templates for the label sheets that I use at home. The motivation was that I spend half an hour looking through old backups to find a template definition that I had not copied over when I reinstalled my Desktop.

Creating the package took another half an hour and now I can be sure that I won't forget to copy that file again. And I will also have the template definition at work in case I need to print a sheet of labels there.

If you also feel that packaging is better than copying then feel free to use this package as a template for you own stuff. It contains a Makefile and uses git-dch to automatically build a DEB release from the git commits.

2015-04-03

WARNING is a waste of my time

How many log levels do you know? How many log levels are actually useful? At Relax and Recover we had an interesting discussion about the use of the WARNING log level.

I suddenly realized that in a world of automation, I need only two log levels:

ERROR and everthing else.

ERROR means that I as a human should take action. Everything else is irrelevant for me.

So far for the user side. As a programmer the choice of log level is sometimes much more difficult. As a programmer I might not want to decide for the user if some problem is an ERROR or not. The obvious solution is to issue a WARNING in an attempt to shed the responsibility of making a decision.

But in an automated world that does not help me as an admin to run the software better. WARNINGS for most cases only create extra manual work because somebody needs to go and check some log file and decide if there actually is a problem. I would rather have the software make that decision and I would be happy to fix or readjust the software if that decision was wrong. So, please no WARNINGs.

Apparently others see that different and prefer to get a large amount of WARNINGs. The only way out is that software should be written so that the user can configure the behaviour of WARNINGs. If neccessary, it should be even possible to configure the behaviour for different events.

So why are there so many logging levels? I think that the main reason is that it is simpler and less work for software developers to use many log levels than to implement a sophisticated configuration scheme for which events should be considered an ERROR and which not.

Together with a Zero-Bug-Policy, eliminating WARNINGs goes a long way towards beeing more productive.

DevOpsDays 2015 Presentation:

2015-03-25

Exploring Academia

Last week I attended the Multikonferenz Software Engineering & Management 2015 in Dresden hosted by the Gesellschaft für Informatik:

My topic was Test Driven Development, but I had to rework my original talk to fit into 20 minutes and to be much less technical. As a result I created a completely new fast paced talk which draws a story line from DevOps over Test Driven Infrastructure Development into Risk Mitigation:

The conference is very different from the tech conferences I usually attend. First, I really was the only person in a T-Shirt :-/. Second, I apparently was invited as the "practitioner" while everybody else was there to talk about academic research, mostly in the form of a bachelor or master thesis.

As much as the topics where interesting, as little was there anything even remotely related to my "practical" work :-(

I still find it interesting to better combine the different worlds (academic and practical), this conference still has some way to go if it wants to achieve this goal. Maybe it would help to team up with an established tech conference and simply hold two conferences at the same time and place to allow people to freely wander between the worlds.

I also had some spare time and visited the Gläserne Manufaktur where VW assembles Phaeton and Bentley cars. They take pride in the fact that 95% of the work is done manually, but sadly nobody asked me about my T-Shirt:
I am squinting so much because that days had a really bright sun. In the background is a XL1, a car that consumes less than 1ℓ of fuel per 100km.

2015-03-17

A Nice Day at CeBIT 2015

After many years of abstinence I went back to visit the CeBIT today. And actually enjoyed it a lot. It is funny to see how everything is new but nothing changed. From the oversized booths of the big players like IBM and Microsoft to the tiny stalls of Asian bank note counting machine vendors. From the large and somewhat empty government-IT-oriented booths to meeting old acquaintances and friends.
But there are also several notably new things to see: For example Huawei shows itself being an important global player with a huge booth next to IBM.
I managed only to visit a third of the exhibition but it was more than I could absorb in a single day. Nevertheless, my missing was accomplished with giving a talk about “Open Source, Agile and DevOps at ImmobilienScout24”. The talk is much more high-level than my usual talks and tries to give a walk through overview. There were about 60-80 people attending my talk and the questions showed that the topic was relevant for the audience. So maybe giving management-level talks is the right thing to do for CeBIT.
Meeting people is the other thing that still works really well at the CeBIT. Without a prior appointment I was able to meet with Jürgen Seeger from iX magazine about my next ideas for articles and with people from SEP about better integrating their backup tool SESAM and Relax-and-Recover.
The new CeBIT concept of focusing on the professional audience seems to work, I noticed much less bag-toting swag-hunting people than last time. All in all I think that attending for one day is worth the trouble and enough to cover the important meetings.

Random Impressions

IBMs Watson wants to be a physician.

Video conferencing with life-sized counterparts. 4K really does make a difference!

Why buy 4 screens if you can buy 1 (QM85D)? Samsung has a lot more to offer than just phones.

Definitively my next TV (QM105D). 105", 21:9 ratio and 2.5 meters wide.

Another multimedia vendor? WRONG! This is "just" a storage box!

Though is seems like storage is no longer the main focus for QNAP.

Cyber crime is big - cyber police still small

Virtual rollercoaster at Heise - barf bags not included.

Deutsche Telekom always has a big booth and represents the top of German IT development. To underline the "Internet of Things" a bunch of robot arms was dancing with magenta umbrellas.

Dropbox comes to CeBIT in an attempt to win business customers. The data is still hosted in the USA, but the coffee was great.

And finally, even the weather was nice today.

2015-03-05

Injecting a Layer of Automation

Relax and Recover is the leading Open Source solution for automated Linux disaster recovery. It was once the pride of my work and is now totally irrelevant at my current job at ImmobilienScout24.

Why? Simply because at ImmobilienScout24 we invest our time into automating the setup of our servers instead of investing into the ability to automatically recover a manually configured system. Sounds simple but this is actually a large amount of work and not done in a few days. However, if you persist and manage to achieve the goal the rewards are much bigger: Don't be afraid of troubles, based on our automation we can be sure to reinstall our servers in a very short time.

The following idea can help to bridge the gap if you cannot simply automate all your systems but still want to have a simplified backup and disaster recovery solution:

Inject a layer of automation under the running system.

The provisioning and configuration of the automation layer should be of course fully automated. The actual system stays manually configured but runs inside a Linux container (LXC, docker, plain chroot ...) and stays as it was before. The resource loss introduced by the Linux container and an additional SSH daemon is negligible for most setups.

The problem of backup and disaster recovery for systems is converted to a problem of backup and restore for data, which is fundamentally simpler because one can always restore into the same environment of a Linux container. The ability to run the backup in the automation layer also allows using smarter backup technologies like LVM or file system snapshots with much less effort.

I don't mean to belittle the effort that it takes to build a proper backup and restore solution, especially for servers that have a high change rate in their persistent data. This holds true for any database like MySQL and is even more difficult for distributed database systems like MongoDB. The challange of creating a robust backup and restore solution stays the same regardless of the disaster recovery question. Disaster recovery is always an on-top effort that complements the regular backup system.

The benefit of this suggestion lies in the fact that it is possible to replace the effort for disaster recovery with another effort investing into systems automation. That approach will yield much more value: A typical admin will use systems automation much more often than disaster recovery. Another way to see this difference is that disaster recovery is optimizing the past while systems automation is optimizing the future.

The automation layer can also be based on one of the minimal operation systems like CoreOS, Snappy Ubuntu Core or Red Hat Atomic Host. In that case new services can be established with full automation as docker images opening up a natural road to migrate the platform to be fully automated. And to gracefully handle the manually setup legacy systems without disturbing the idea of an automated platform.

If you already have a fully automated platform but suffer from a few manually operated legacy systems then this approach can also serve as a migration strategy to encapsulate those legacy systems in order to keep them running as-is.

Update 12.03.2015: Added short info about Relax and Recover and explain better why it pays more to invest into automation instead of disaster recovery.

2015-02-27

mod_remoteip backport for Apache HTTPD 2.2

Apache HTTPD 2.4 has a very useful new feature for large deployments: Replacing the remote IP of a request from a request header, e.g. set by a load balancer or reverse proxy. Users of Apache HTTPD 2.2 as found on RHEL6 can now use the backport found on https://github.com/ImmobilienScout24/mod_remoteip-httpd22.

I pulled this backport together from various sources found on the Internet and "it seems to work". Working with C code (which I did not do for 14 years!) tought me again the value of test driven development and modern programming languages. Unfortunately I still can't explain a change like this without a lot of thinking:
You can easily build an RPM from the code on GitHub. The commit history shows the steps I had to undertake to get there. Configuration is as simple as this:


LoadModule remoteip_module modules/mod_remoteip.so
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 10.100.15.33

with the result that a reverse proxy on 10.100.15.33 can set the X-Forwarded-For header. Apache configuration like Allow From can then use the regular client IP even though the client does not talk directly to the web server.

2015-02-18

Simplified DEB Repository



2 years ago I wrote about creating a repository for DEB packages with the help of reprepro. And since then I suffer from the complexity of the process and cumbersome reprepro usage:
  • Complicated to add support for new Ubuntu version which happens every 6 months
  • Need to specifically handle new architectures
  • I actually don't need most of the features that reprepro supports, e.g. managing multiple repos in one or package staging
This week I realized that for there is a much simpler solution for my needs: apt-ftparchive. This tool creates a trivial repo with just enough information to make apt happy. For my purposes that is enough. All what I want from a DEB repo is actually
  • Work well with 50-500 packages
  • Easy to add new Debian/Ubuntu/Raspbian versions or architectures
  • Simple enough for me to understand
  • GPG signatures
It turns out that the trivial repo format is enough for that, it makes it even simpler to add new distro versions because the repo does not contain any information about the distro versions. That means that the repo does not change for new distros and that I don't need to change the sources.list line after upgrades.

The following script maintains the repo. It will copy DEB packages given as command line parameters into the repository or simply recreate the metadata. The script uses gpg to sign the repo with your default GPG key. If you want to maintain a GPG key as part of the repo then you can create a key sub-directory which will be used as GPG home directory.


The script expects a config_for_release file in the repo that contains some extra information:

To add this repo to your system add something like this to your /etc/apt/sources.list:

2015-02-07

Ubuntu Guest Session Lockdown

The guest session is a very important feature of Ubuntu Linux. It makes it very simple to give other people temporary computer or Internet access without compromising the permanent users of the computer.

Unfortunately the separation is not perfect, the guest user can actually modify critical configuration settings on the computer and even access the files of the other users, if they don't take precautions.

The following scripts and files help to lock down the guest session so that no harm can be done.

How It Works

The guest session is actually a feature of the LightDM Display Manager that is used in Ubuntu and in Xubuntu. The guest session is enabled by default.

When a user chooses a guest session the following happens:
  1. LightDM uses the /usr/sbin/guest-account script to setup a temporary guest account. The home directory is created in memory (via tmpfs) and can occupy at most half the RAM of the computer.
    Optionally, /etc/guest-session/prefs.sh is included as root to further customize the guest account.
  2. LightDM starts a new session as this account.
  3. The guest account runs the /usr/lib/lightdm/guest-session-auto.sh start script.
    Optionally, /etc/guest-session/auto.sh is included to run other start tasks as the guest user.
The guest session behaves like a regular Ubuntu session including full access to removable storage media, printers etc.

Upon session termination LightDM uses the /usr/sbin/guest-account script to remove the temporary account and its home directory.

Securing Other Users

The default umask is 022 so that by default all user files are world-readable. That makes them also readable by the guest session user. For more privacy the umask should be set to 007 for example in /etc/profile.d/secure-umask.sh:

Preventing System Modifications

Much worse is the fact that by default every user - including the guest account - can modify a lot of system settings like the network configuration. The following PolicyKit Local Authority policy prevents the guest account from changing anything system related. It still permits the user to handle removable media and even to use encrypted disks. It should be installed as /etc/polkit-1/localauthority/90-mandatory.d/guest-lockdown.pkla:

Preventing Suspend and Hibernate

LightDM has IMHO a bug: If one opens a guest session and then locks that, it is impossible to go back to that guest session. Choosing "guest session" from the LightDM unlock menu will actually create a new guest session instead of taking the user back to the existing one. In the customization below we disable session locking for the guest altogether for this reason.

The same happens after a system suspend or hibernate because that also locks all sessions and shows the LightDM unlock login screen. The only "safe" solution is to disable suspend and hibernate for all users with this policy. It should go to /etc/polkit-1/localauthority/90-mandatory.d/disable-suspend-and-hibernate.pkla:

Customizing the Guest Session

To customize the guest session the following /etc/guest-session/prefs.sh and /etc/guest-session/auto.sh scripts are recommended. The prefs.sh script is run as root before switching to the guest account. It creates a custom Google Chrome icon without Gnome Keyring integration (that would ask for a login password which is not set) and disables various autostart programs from running. They are not needed for non-admin users.

This auto.sh script is run at the start of the guest session under the guest account. It configures the session behaviour. Most important is to disable the screen locking because it is impossible to return to a locked guest session. I decided to also completely disable the screen saver since I expect guest users to terminate their session and not let it run for a long time.

The other customizations are mostly for convenience or to set useful defaults:

  • Disable the shutdown and restart menu items.
  • Configure the keyboard layout with 2 languages (you probably want to adjust that as needed).
  • Show full date and time in the top panel.
  • Set the order of launcher icons in the Unity launcher.
It is fairly easy to find out other settings with gsettings list-recursively if required.

With these additions the guest session can be a very useful productivity tool. At our school the students use only the guest session of the Ubuntu computers. They got quickly used to the fact that they must store everything on their USB thumb drives and the teachers enjoy "unbreakable" computers that work reliably.

2015-01-29

No Site VPN for Cloud Data Centers

A site to site VPN is the standard solution for connecting several physical data center locations. Going to the Cloud, the first idea that comes to mind is to also connect the Cloud "data center" with a site VPN to the existing physical data centers. All Cloud providers offer such a feature.

But is such a VPN infrastructure also a "good idea"? Will it help us or hinder us in the future?

I actually believe that for having many data centers a site VPN infrastructure is a dangerous tool. On the good side it is very convenient to have and to set up and it simplifies a lot of things. On the other side it is also very easy to build a world-wide mesh of dependencies where a VPN failure can severly inhibit data center operations or even take down services. It also lures everybody into creating undocumented backend connection between services.

The core problem is in my opinion one of scale. Having a small number (3 to 5) of locations is fundamentally different from having 50 or more locations. With 3 locations it is still feasable to build a mesh layout, each location talking to each other. With 5 locations a mesh already needs 10 connection, which starts to be "much". But a star layout always has a central bottleneck. With 50 locations a mesh network already needs 1225 connections.

With the move from the world of (few) physical data centers to the world of cloud operations it is quite common to have many data centers in the cloud. For example, it is "best practice" to use many accounts to separate development and production or to isolate applications or teams (See Yavor Atanasov from BBC explaining this at the Velocity Europe 2014). What is not so obvious from afar is that actually each cloud account is a separate data center of its own! So having many teams can quickly lead to having many accounts, I have talked to many companies who have between 30 and 100 cloud accounts!

Another problem is the fact that all the data centers would need to have different IP ranges plus one needs another (small) IP range for each connection etc. All of this is a lot of stuff to handle.

As an alternative approach I suggest not using site VPNs altogether. Instead, each data center should be handled as an independant data center altogether. To make this fact transparent, I would also suggest to use the same IP range in all data centers!

I see many advantages to such a setup:

  • All connections between services in different data centers must use the public network.
    As a result all connections have to be secured and audited and supervised properly. They have to be part of the production environment and will be fully documented. If a connection for one application fails, other applications are not impaired.
  • Standard ressources can be offered under standard IPs to simplify bootstrapping (e.g. 192.168.10.10 is always the outgoing proxy or DNS server or something other vital).
  • If development and production are in separate "data centers" (e.g. cloud accounts), then they can be much more similar.
  • A security breach in one account does not easily spill over into other accounts.
  • Operations and setup of the cloud accounts and physical data centers becomes much simpler (less external dependencies).
  • It will be easy to build up systemic resilience as a failure in one data center or account does not easily affect other data centers or accounts.
  • Admins will be treated as road warriors and connect to each data center independantly as needed.
Currently I am looking for arguments in favor and against this idea, please share your thoughts!



2015-01-22

PPD - Pimp your Printer Driver

I recently got myself a new printer, the HP Officejet Pro X476dw. A very nice and powerful machine, it can not only print double sided but also scan, copy and send faxes.

And of course it has very good Linux support, thanks to the HP Linux Printing and Imaging Open Source project. On my Ubuntu 14.10 desktop everything is already included to use the printer.

However, the first printouts where very disappointing. They looked coarse and ugly, much worse than prints from my old HP LaserJet 6 printer. After overcoming the initial shock I realized that only prints from my Ubuntu desktop where bad while prints over Google Cloud Print where crisp and good looking.

So obviously something has to be wrong with the printer drive on Ubuntu!

After some debugging I was able to trace this down to the fact that by default CUPS converts the print job to 300 dpi PostScript before giving it to the hp driver, as it shows in the CUPS logs:

D [Job 261] Printer make and model: HP HP Officejet Pro X476dw MFP
D [Job 261] Running command line for pstops: pstops 261 schlomo hebrew-test.pdf 1 'finishings=3 media=iso_a4_210x297mm output-bin=face-down print-color-mode=color print-quality=4 sides=one-sided job-uuid=urn:uuid:c1da9224-d10b-3c2f-6a99-487121b8864c job-originating-host-name=localhost time-at-creation=1414128121 time-at-processing=1414128121 Duplex=None PageSize=A4'
D [Job 261] No resolution information found in the PPD file.
D [Job 261] Using image rendering resolution 300 dpi
D [Job 261] Running command line for gs: gs -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=ps2write -sOUTPUTFILE=%stdout -dLanguageLevel=3 -r300 -dCompressFonts=false -dNoT3CCITT -dNOINTERPOLATE -c 'save pop' -f /var/spool/cups/tmp/066575456f596

I was able to fix the problem by adding this resolution setting to the PostScript Printer Definitions (PPD):

*DefaultResolution: 600x600dpi

As a result the print job is converted at 600 dpi instead of 300 dpi which leads to the expected crisp result:

D [Job 262] Printer make and model: HP HP Officejet Pro X476dw MFP
D [Job 262] Running command line for pstops: pstops 262 schlomo hebrew-test.pdf 1 'Duplex=None finishings=3 media=iso_a4_210x297mm output-bin=face-down print-color-mode=color print-quality=4 sides=two-sided-long-edge job-uuid=urn:uuid:83e69459-c350-37e5-417d-9ca00f8c6bd9 job-originating-host-name=localhost time-at-creation=1414128153 time-at-processing=1414128153 PageSize=A4'
D [Job 262] Using image rendering resolution 600 dpi
D [Job 262] Running command line for gs: gs -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=ps2write -sOUTPUTFILE=%stdout -dLanguageLevel=3 -r600 -dCompressFonts=false -dNoT3CCITT -dNOINTERPOLATE -c 'save pop' -f /var/spool/cups/tmp/0666d544aec68

Isn't it really nice that one only needs a text editor to fix printer driver problems on Linux (and Mac)?

On github.com/schlomo/HP_Officejet_Pro_X476dw I maintain an improved version of the PPD file with the following features:
  • Set printing resolution to 600dpi
  • Use printer for multiple copies, not CUPS
  • Default to duplex printing
The corresponding Launchpad bug is still open und unresolved. Apparently it is not simply to submit improvements upstream.

2015-01-16

Comparing Amazon Linux

Since ImmobilienScout24 decided to migrate to a public cloud I have been busy looking at various cloud offerings in detail. Amazon Web Services (AWS) has a special feature which is interesting: Amazon Linux is a fully supported, "RHEL like", RPM-based Linux distribution.

While not beeing a true Red Hat Enterprise Linux clone like CentOS or Scientific Linux (which is the standard OS for the ImmobilienScout24 data centers), it is derived from some Fedora version and comes with a nice choice of current software. To me it feels like "RHEL +" because so far all our internal stuff worked well but a lot of software packages are much newer than on RHEL 6 or RHEL 7. The 2014.09 release updated a lot of components to very recent versions.

On the other hand, we also found packages missing from Amazon Linux, most notably desktop-file-utils. This package is required to install Oracle Java RPMs. I found a thread about this on the AWS Forums and added a request for desktop-file-utils in September 2014. In the mean time the package was added to Amazon Linux, although the forum thread does not mention it (yet).

To find out in advance if there are any other surprises waiting for us on Amazon Linux, I created a little tool to collect RPM Provides lists from different Linux distros on AWS. github.com/ImmobilienScout24/aws-distro-rpm-comparison takes a VPC and one or several AMI IDs and spins up an EC2 instance for each to collect the list of all the RPM provides from all available YUM repositories.

$ ./aws-distro-rpm-comparison.py -h
Create EC2 instances with different Linux distros and compare
the available RPMs on them.

Usage:
  aws-distro-rpm-comparions.py [options] VPC_ID USER@AMI_ID...

Arguments:
  VPC_ID        VPC_ID to use
  USER@AMI_ID   AMI IDs to use with their respective SSH user

Options:
  -h --help            show this help message and exit
  --version            show version and exit
  --region=REGION      use this region [default: eu-west-1]
  --type=TYPE          EC2 instance type [default: t2.micro]
  --defaultuser=USER   Default user to use for USER@AMI_ID [default: ec2-user]
  --verbose            Verbose logging
  --debug              Debug logging
  --interactive        Dump SSH Key and IPs and wait for before removing EC2 instances

Notes:

* The AMI_IDs and the EC2 instance type must match (HVM or PV)
This list can then be used to compare with the RPM Requires in our data centers. To get a better picture of Amazon Linux I created such lists for Red Hat Enterprise Linux 6 and 7, CentOS 6 and Amazon Linux in the Github project under results. For online viewing I created a Google Spreadsheet with these list, you can copy that and modify it for your own needs.

Open the results in Google Drive Sheets.
At a first glance it seems very difficult to say how compatible Amazon Linux really is as there are a lot of RPM Provides missing on both sides. But these lists should prove useful in order to analyze our existing servers and to understand if they would also work on Amazon Linux. The tools can be also used for any kind of RPM distro comparison.

In any case, Amazon Linux is exactly that what RHEL cannot be: A stable RPM-based distribution with a lot of recent software and regular updates.