2016-02-07

Go Faster - DevOps & Microservices

At the microXchg 2016 last week Fred George - who takes pride having been called a hand grenade - gave a very inspiring talk about how all the things that we do right now have one primary goal:


Go Faster

Reducing cycle time for deployments, automation everywhere, down-sizing services to "microservices", building resilient and fault-tolerant platforms and more are all facets of a bigger journey: Provide value faster and find out faster what works and what not.

DevOps

DevOps is seen by most developers as beeing an Ops movement to catch on with developers before their jobs become obsolete. Attending various DevOps Days in Germany and the USA, the developers who where also there always complained about the lack of developers and the lack of developer topics. They observed that the conference seems to be by and for Ops people. Consequently, DevOps conferences usually have two tracks: Methods and Tools.

Methods teach us how to do "proper" software development also in infrastructure engineering and to follow agile software development practices. Tools talks try to make us believe that you cannot be a good DevOps unless you use Puppet, Chef or Ansible. The success story talks all emphazise how "DevOps Tools", shared responsability and a newly formed "DevOps Team" saved the day. In more recent years the tools focus on building private clouds with Docker and on managing distributed storage.

In fact, DevOps is all about beeing faster through shared responsibility, mutual respect between different knowledge bearers and building cross-functional teams with full vertical responsibility for their work.

Microservices

Microservices is definitively an important hype amongst developers. Seasoned Ops people see it as the obvious thing to do, just like the well-known Unix philosophy teaches:

The Unix philosophy emphasizes building simple, short, clear, modular, and extensible code that can be easily maintained and repurposed by developers other than its creators. The Unix philosophy favors composability as opposed to monolithic design. Source: Wikipedia
Applying all that to systems design is a straight road to microservices. When going from millions of Lines-of-Code to just a few thousand and when going from 5 applications to 500 the glue code between all those applications suddenly becomes the governing system.

Service discovery, managing large amounts of micro instances, network latency, cyclic dependency trees etc. are all areas of expertise of Ops people. They where dealing with these questions for the last 20 years!

Microservice success stories, like the one of SoundCloud shown also at the microXchg 2016, show how more and more glue and abstraction layers where introduced into the emerging microservices architecture to compensate the degradation that came along with the exploding complexity of their microservices landscape.

Much of that could also be learned from modern Linux operating systems. A nice example is systemd which drives its own "microservices" revolution, just on a smaller scale within a single Linux computer.

Looking a the tools track of Microservices events it is no surprise to also find Docker in a dominating role here as well.

Common Values & Concepts

I don't want to argue that Docker is the common topic that everybody should care about. After all, Docker is just this years hype implementation of an operating concept. Savy sysadmins where doing the same thing with chroot or OpenVZ a long time ago. And in a few years we will probably have something even better for the same job.

What really brings these topics together are a lot of shared values and concepts (in no particular order):
  • KISS approach
  • Right-sizing everything to an easily managed size: microservices, two pizza teams, iterative solutions to problems
  • Full stack responsibility
  • Automate everything, especially the glue between all those small components
  • Observe-Orient-Decide-Act loops in different forms and fashions
As long as keep our core values in mind the actual technology or methodology doesn't matter so much. We will still achieve our goals. Just much faster.

2016-02-05

Cloud Migration ≈ Microservices Migration

Day two at the microXchg 2016 conference. After listening to yet another talk detailing the pitfalls and dangers of "doing it wrong" I see more and more similarities between the Cloud migration at ImmobilienScout24 and the microservices journey that most speakers present.
The Cloud migration moves us from a large data center into many smaller AWS accounts. A (legacy) monolithic application is cut into many smaller microservices.

Internal data center communication becomes exposed communication between different AWS accounts and VPCs. Internal function calls are replaced with remote API calls. Both require much more attention to security, necessitate an authentication framework and add significant latency to the platform.

A failed data center takes down the entire platform while a failed AWS account will only take down some function. An uncaught exception will crash the entire monolith while a crashed microservice will leave the others running undisturbed.

Internal service dependencies turn into external WAN dependencies. Library dependencies inside the monolith turn into service dependencies between microservices. Cyclic dependencies remain a deadly foe.

Team responsibilities shift from feeling responsible for a small part of the platform to being responsible for entire AWS accounts or only their own microservices.

And much more...

Learnings

If it looks similar, maybe we can learn something from this. I strongly believe that many structural and conceptional considerations apply equally to a Cloud migration and to a microservices journey:
  • Fighting complexity through downsizing.
  • Complexity shifts from inside to outside. New ways to manage this complexity emerge.
  • Keeping latency in check is key factor to success.
  • Need much more advanced tooling to properly handle the scale out of managed entities.
  • Less centralization of common concerns leads to more wasted effort and resources. Accept this.
  • Success and failure hangs on finding the right seams to cut.
  • "Just put it somewhere" usually doesn't work at all.
  • Integration tests become more important and difficult.
I learned a lot at this conference, both about microservices and about the direction our Cloud migration should go.

Please add your learnings in the comments.

2016-02-04

AWS Account Right-Sizing

Today I was attending the Microxchg 2016 conference in Berlin. I suddenly realized that going to the cloud allows to ask completely new questions that are impossible to ask in the data center.

One such question is this: What is the optimum size for a data center? Microservices are all about downsizing - and in the cloud we can and should downsize the data center!

In the world of physical data centers the question is usually goverened by two factors:

  • Ensuring service availability by having at least two physical data centers.
  • Packing as much hardware into as little space as possible to keep the costs in check.
As long as we are smaller than the average Internet giant there is no point to ask about the optimum size. The tooling which we build has to be designed for both large data centers and for having more than one. But in the "1, 2, many" series "2" is just the worst place to be. It entails all the disadvantages of "more than 1" without any of the benefits of "many".

In the cloud the data center is purely virtual. On AWS the closest thing to a "data center" is a Virtual Private Cloud (VPC) in an AWS Region in an AWS Account. But unlike a physical data center that VPC is already highly available and offers solid redundancy through the concept Availability Zones.

If an AWS Account has multiple VPCs (either in the same region or in different regions), then we should see it has actually beeing several separate data centers. All the restrictions of multiple data centers also apply to having multiple VPCs: Higher (than local) latency, traffic costs, traversing the public Internet etc.

To understand more about the optimum size of a cloud data center we can compare three imaginary variants. I combine EC2 instances, Lambda functions, Beanstalk etc. all into "code running" resources. IMHO it does not matter how the code runs in order to estimate the management challanges involved.



Small VPC
Medium VPC
Large VPC
Number of code running resources
50
200
1000
Number of CloudFormation stacks
(10 VMs per stack)
5
20
100
Service Discovery
manually
simple tooling e.g. git repo with everything in it
elaborate tooling, Etcd, Consul, Puppet ...
Which application is driving the costs?
Eyeball inspection - just look at it
Tagging, Netflix ICE ...
Complex tagging, maintain an application registry, pay for Cloudhealth ...
Deployment
CloudFormation manually operated viable option
Simple tooling like cfn-sphere, autostacker24 ...
Multi-tiered tooling like Spinnaker or other large solutions
Security model
Everyone related is admin
Everyone related is admin, must have strong traceability of changes
Probably need to have several levels of access, separation of duty and so on
… whatever ...
dumb and easy
simple
complex and complicated

Having a large VPC with a lot of resources obviously requires much more elaborate tooling while a small VPC can be easily managed with simple tooling. In our case we have a 1:1 relationship between a VPC and an AWS account. Accounts that work in two regions (Frankfurt and Ireland) have 2 VPCs but that's it.

I strongly believe that scaling small AWS accounts together with the engineering teams who use them will still allow us to keep going with simple tooling. Even if the absolute total of code running resources is large, splitting it into many small units reduces the local complexity and allows the responsible team to manage their area with fairly simple tooling. Here we use the power of "many" and invest into handling many AWS accounts and VPCs efficiently.

On the overarching level we can then focus on aggregated information (e.g. costs per AWS account) without bothering about the internals of each small VPC.

I therefore strongly advise to keep your data centers small. This will also nicely support an affordable Cloud Exit Strategy.

2015-12-30

Docker Appliance as Linux Service RPM

Docker provides a convenient way to package entire applications into runnable containers. OTOH in the data center we use RPM packages to deliver software and configuration to our servers.

This wrapper build a bridge between Docker appliances and Linux services by packaging a Docker image as a Linux service into an RPM package.

The resulting Linux service can be simply used like any other Linux service, for example start the service with service schlomo start.


See the GitHub repo at https://github.com/ImmobilienScout24/docker-service-rpm for code and more details and please let me know if you find this useful.

2015-08-21

Signet Ring = Early 2 Factor Authentication

Photo: A. Reinkober / pixelio.de
I recently met somebody who had a signet ring and suddenly realized that this is a very early form of 2-factor-authentication (2FA):

Signet Ring2FA
UniqueUnique
Difficult to copySupposedly impossible to copy
Seal proves personal involvement of bearer2FA token proves personal interaction of owner

The main difference is of course that 2FA is commonly available to everybody who needs it while signet rings where and remain a special feature. But it is still nice to know that the basic idea is several thousands years old.

2015-08-07

Cloud Exit Strategy

As ImmobilienScout24 moves to the cloud a recurring topic is the question about the exit strategy. An exit strategy is a plan for migrating away from the cloud, or at least from the chosen cloud vendor.

Opinions range from "why would I need one?" to "how can we not have one?" with a heavy impact on our cloud strategy and how we do things in the cloud.

When talking about exit scenarios it is worth to distinguish between a forced and a voluntary exit. A forced exit happens due to external factors that don't leave you any choice when to go. A voluntary exit happens at your own choice, both when and how.

Why would one be force to have an exit strategy? Simple because running a business on cloud services carries other types of risks compared to running a business in your own data center:
  • Cloud accounts can be disabled for alleged violation of terms
  • Cloud accounts can be terminated
  • There are no guaranteed prices. Running costs can explode as a result of a new pricing model
  • The cloud vendor can discontinue a service that you are based on
  • Lost cloud credentials combined with weak security can be desastrous (learn from Codespaces)
  • If the cloud vendor is down you can either hope and wait or start your website somewhere else, if you where prepared. In the data center you can try all sorts of workarounds and fixes - but you must do that all yourself.
  • ... fill in your own fear and bias against the cloud ...
A voluntary exit can easily happen after some time because:
  • Another cloud vendor is cheaper, better or solves problems that your current vendor doesn’t care about
  • You are bought by another company and they run everything in another cloud, forcing you to migrate
  • ... who knows what the future will bring?
Probably there is no perfect answer that fits everybody. Besides just ignoring the question I personally see two major options:
  1. Use only IaaS (e.g. servers, storage, network) or PaaS (fancy services) from the cloud so that it is easy to migrate to another cloud vendor or to a private cloud. The big disadvantage is that you won't be able to benefit from all the cool managed services that make the cloud an interesting place to be.
  2. Use many cloud providers or accounts (e.g. matching your larger organisational units) to reduce the "blast radius" and keep the communication between them vendor independant. If something happens to one of them the damage is limited in scope and everything else keeps working. The disadvantage is that you add complexity and other troubles by dealing with a widely distributed platform.
I prefer the second option because it lays the ground for a voluntary exit while still keeping most of the advantages of the cloud as an environment and ecosystem. In case of a forced exit there is a big problem, but that could be solved with lots of resources. A forced exit for a single account can be handled without harming the other accounts and their products. As another benefit there is not much premature optimization for the exit case.

Whatever you do - I believe that having some plan is better than not having any plan.

2015-07-15

DevOps Berlin Meetup 2015-07

Is Amazon good for DevOps? Maybe yes, maybe no. But for sure the new Berlin office is good for a Berlin DevOps Meetup.

Jonathan Weiss gave a short overview over the engineering departments found here: AWS OpsWorks, AWS Solution Architects, Amazon EC2, Machine Learning.

Michael Ducy (Global Partner Evangelist at Chef Software) talks about DevOps and tells the usual story. Michael uses goats and silos as a metaphor and builds his talk from the famous goat and silo problem. He sees the "IT manufacturing process" as silos (read History of Silos for more about that) and DevOps minded people as goats: Multi-purpose, versatile, smart and stubborn at reaching their goals.
The attendees of the DevOps event probably did not need much convincing, but the talk was nevertheless very entertaining. Michael has an MBA and also gave some useful insights into how organisations evolve into silos and how organisational "kingdoms" develop.

The talk is available as video: 15min from Jan 2015 and 24min from Dec 2013. The slides are available on Slideshare.

As a funny side note it turns out that Amazon even rents out goats: Amazon Hire a Goat Grazer. However it seems that this offer is about real goats and not DevOps engineers.

2015-07-10

ImmobilienScout24 Social Day at the GRIPS Theater

Today I went to the GRIPS Theater (English) instead of the office. Once a year ImmobilienScout24 donates the work force to social projects, called Social Day. I used the opportunity to catch a glimpse behinde the stage. The theater in turn got a workshop from us about their web site and social media channels.

But first we watched a very nice children show (Ein Fest bei Baba Dengiz) about a German guy who learned respect for foreigners - from another German with Turkish background. The show was well adapted to the school-age audience.

The theater follows a somewhat unusual concept and places the stage in the middle of the audience:
Foto mit freundlicher Genehmigung des GRIPS Theaters
This was my first visit to the GRIPS Theater, but not the last. Besides a rich children programme the theater also offers shows for adults and is most famously known for the show Linie 1.

2015-05-22

Meetup Marathon

This week was my Meetup Marathon:

Software Memories and Simulated Machines was above my head. Scaling Logstash made me wonder how many engineers you actually need to run that "properly". Nix is something we hopefully don't need, Rok actually said that if you package everything you don't need it.

STUPS is the "Cloud Ops" stack from Zalando, nicely published on GitHub:

The STUPS platform is a set of tools and components to provide a convenient and audit-compliant Platform-as-a-Service (PaaS) for multiple autonomous teams on top of Amazon Web Services (AWS).

It contains a lot of tools that work together to solve a lot of the challanges related to running a large company on AWS. For me that was most definitively the highlight of this week.

Hennig explaining STUPS at the AWS User Group.

2015-05-15

OpenTechSummit 2015

Yesterday was the first OpenTechSummit in Berlin, a new conference that came partially in place of the LinuxTag. The conference squeezed a large amount of talks into a single day. The talks where either 10 or 20 minutes long and covered many non-technical topics related to open knowledge or open technology.

One thing impressed me especially: All day long there where workshops for children and youth. While some kids took their first steps in coding, others came to work together on advanced programming or hardware projects.
The date (a German state holiday) made sure that children had time to attend, many IT people came together with their children. The organizers where actually surprised by the large amount of children who registered for a free kids ticket.

I gave my "DevOps, Agile and Open Source at ImmobilienScout24" talk and put up some ImmobilienScout24 posters for our sponsoring.