2017-06-23

Eliminating the Password of Shared Accounts

Following up on "Lifting the Curse of Static Credentials", everybody should look closely at how they handle shared accounts, robot users or technical logins. Do you really rotate passwords, tokens and keys each time somebody who had access to the account leaves your team or the company? Do you know who has access? How do you know that they didn't pass on those credentials or put them in an unsafe place?

For all intents and purposes, a shared account is like anonymous access for your employees. If something bad happens, the perpetrator can point to the group and deny everything. As an employer you will find it nearly impossible to prove who actually used the password that was known to so many. Or even to prove that it was one of your own employees and not an outside attacker who "somehow" stole the credentials.

Thanks to identity federation and federated login protocols like SAML2 and OpenID Connect it is now much easier to completely eliminate passwords for shared accounts. Without passwords the shared accounts are much less risky. You can actually be sure that only active and authorized users can use the shared accounts, both now and in the future.
The concept is fairly simple. It is based on the federated identity provider disconnecting between the login identity used for authentication and the account identity that is the result of authorization. 

In the following I use the specific case of shared accounts for GitHub Enterprise (GHE) as an application and G Suite (Google) as the federated identity provider to illustrate the idea. The concepts are however universal and can easily be adapted for other applications and identity providers.

A typical challenge in GitHub (both the public service and on-premise) is the fact that GitHub as an application only deals with users. There is no concept of services or service accounts. If you want to build sustainable automation then you must, according to the GitHub documentation, create a "machine user" which is a regular user re-purposed as a shared account. GitHub even points out that this is the recommended solution even though otherwise GitHub users must be real people according to their Terms of Service.

Normal Logins for Real and Shared Users

Before we deal with shared accounts we first look at the normal federated login process in figure 1. GHE uses SAML2 to delegate user authentication to Google.


Fig 1: Normal Federated User Login
User John Doe wants to use the GHE web interface to work with GHE. He points ➊ his browser to GHE which does not have a valid session for him. GHE redirects ➋ John to sign in at his company's Google account. If he is not signed in already, he authenticates as his own identity ➌ john@doe.com. Google redirects him back to GHE and signals ➍ to GHE that this user is John Doe. With the authorization from Google John is now logged in ➎ as @jdoe in the GHE application.

As users sign in to GHE their respective user account is created if does not exist. This "Just In Time Provisioning" is a feature of SAML2 that greatly simplifies the integration of 3rd party applications.

The traditional way to introduce shared accounts in this scenario is creating regular users within the identity provider (here Google) and handing out the login credentials (username & password) to the groups of people who need access to the shared account. They can then login with those credentials instead of their own and thereby become the machine user in the application. For all the involved systems there is no technical difference between a real user and a shared account user, the difference comes only from how we treat them.

The downsides of this approach include significant inconvenience to the users who have to sign out globally from the identity provider before they can switch users, or use an independent browser window just for that purpose. The biggest threats come from the way how the users manage the password and 2-factor tokens of the shared user and from the organization's (in-)ability to rotate these after every personnel change.

Login Shared Users without Password

Many applications (GHE really only serves as an example here) do not have a good concept for service accounts and rely on regular user accounts to be used for automation. As we cannot change all those applications we must accommodate their needs and give them such service users.

The good news is that in the world of federated logins only the user authentication (the actual login) is federated. Each application maintains its own user database. This database is filled and updated through the federated logins, but it is still an independent user database. That means that while all users in the identity provider will have a corresponding user in the application (if they used it), there can be additional users in the application's user database without a matching user in the identity provider. Of course the only way to access (and create) such users is through the federated login. The identity provider must authorize somebody to be such a "local" user when signing in to the application.

To introduce shared accounts in the application without adding them as real users in the identity provider we have to introduce two changes to the standard setup:
  1. The identity provider must be able to signal different usernames to the application during the login process.
  2. The real user must be able to choose as which user to work in the application after the next login.
Figure 2 shows this extended login process. For our example use case the user John Doe is a member of the team Alpha. John wants to access the team's account in GHE to connect their team's git repositories with the company's continuous delivery (CD) automation.
Fig 2: Login to GHE as Team User
For regular logins as himself John would "just use" GHE as described above. To login as the team account John first goes to the GHE User Chooser, a custom-built web application where John can select ➊ as which GHE user he wants to be logged in at GHE. Access to the chooser is of course also protected with the same federated login, the figure omits this detail for clarity.

John selects the team user for his team Alpha. The chooser stores ➋ Johns selection (team-alpha) in a custom attribute in Johns user data within the identity provider.

Next John goes as before to GHE. If he still has an active session at GHE he needs to sign out from GHE, but this does not sign him out at the identity provider or all other company applications.

Then John can access again GHE ➌ which will redirect ➍ him to the identity provider, in this example Google. There John signs in ➎ with his own identity john@doe.com. Most likely John still has an active session with Google so that he won't actually see this part. Google will confirm his identity without user interaction

The identity provider reads ➏ the username to send to the application from the custom attribute. When the identity provider redirects ➐ John back to the application, it also sets the GHE user from this custom attribute. In this case the custom attribute contains team-alpha and not jdoe as it would for a personal login. This redirect is the place where the identity switch actually happens. As a result, John retained his personal identity in Google and is signed in to GHE as his team account ➑ @team-alpha.

The main benefit of this approach is the radical removal of shared account passwords and the solid audit trail for the shared accounts. It applies the idea of service roles to the domain of standard applications that do not support such roles on their own. So far only few applications have a concept of service roles, most notably Amazon AWS with its IAM Roles. This approach brings the same convenience and security also to all other applications.

Outlook

Unfortunately this concept only protects the access to the shared account itself, not the access to tokens and keys that belong to such an account. Improving the general security is a step-by-step process and user chooser takes us a major step up towards truly securing such applications like GHE.

The next step would be addressing the static credentials that are generated within the application. In the case of GHE these are the personal access tokens and SSH keys that are the only way how external tools can use the shared account. These tokens and keys are perpetual shared secrets that can easily be copied without anybody noticing.

To get rid of all of this we will have to create an identity mapping proxy that sits in front of the application to translate the authentication of API calls. To the outside (the side that is used by the users and other services) the proxy uses the company authentication scheme, e.g. OAuth2. To the inside (towards the application) it uses the static credentials that the application supports. In order to fully automate this mapping, the proxy also has to maintain those static credentials on behalf of the users so that the users do not need to deal with them at all.

In this scenario there is also no need for a user account chooser as described above: users will have no need to act on behalf of the service accounts, the most interaction will be to grant permissions to those service accounts to access shared resources.

Figure 3 shows how such a proxy for GitHub Enterprise and the company's OAuth2 identity provider, e.g. Google, could be built. It is surely a much larger engineering effort than the user account chooser, but it solves the entire problem of static credentials, not only the problem of shared account passwords.
Fig 3: Identity Mapping Proxy to remove static credentials from API authentication

It really is possible to get rid of static credentials, even for applications where the vendor does not support such ideas. While these concepts can be adapted for any kind of application, the account chooser and identity mapping proxy will be somewhat custom tailored. Depending on the threat model and risk assessment in your own organisation the development effort might be very cheap compared to the alternative to continue living with the risks.

I hope that both application vendors and identity providers will eventually understand that static credentials are the source of a lot of troubles and that it is their job to provide us users good integrations based on centrally managed identities, especially for the integration of different services.

2017-06-16

Using Kubernetes with Multiple Containers for Initialization and Maintenance

Kubernetes is a great way to run applications because it allows us to manage single Linux processes with a real cluster manager. A computer with multiple services is typically implemented as a pod with multiple containers sharing communication and storage:
Ideally every container runs only a single process. On Linux, most applications have three phases with two different programs or scripts:
  1. The initialization phase, typically an init script or a systemd unit file.
  2. The run phase, typically a binary or a script that runs a daemon.
  3. The maintenance phase, typically a script run as a CRON job.
While it is possible to put the initialization phase into a Docker container as part of the ENTRYPOINT script, that approach gives much less control over the entire process and makes it impossible to use different security contexts for each phase, e.g. to prevent the main application from directly accessing the backup storage.

Initialization Containers

Kubernetes offers initContainers to solve this problem: Regular containers that run before the main containers within the same pod. They can mount the data volumes of the main application container and "lay the ground" for the application. Furthermore they share the network configuration with the application container. Such an initContainer can also contain completely different software or use credentials not available to the main application.

Typical use cases for initContainers are
  • Performing an automated restore from backup in case the data volume is empty (initial setup after a major outage) or contains corrupt data (automatically recover the last working version).
  • Doing database schema updates and other data maintenance tasks that depend on the main application not running.
  • Ensure that the application data is in a consistent and usable state, repairing it if necessary.
The same logic also applies to maintenance tasks that need to happen repeatedly during the run time of an application. Traditionally CRON jobs are used to schedule such tasks. Kubernetes does not (yet) offer a mechanism to start a container periodically on an existing pod. Kubernetes Cron Jobs are independent pods that cannot share data volumes with running pods.

One widespread solution is running a CRON daemon together with the application in a shared container. This not only brakes the Kubernetes concept but also adds a lot of complexity as now you also have to take care of managing multiple processes within one container.

Maintenance Containers

A more elegant solution is using a sidecar container that runs alongside the application container within the same pod. Like the initContainer, such a container shares the network environment and can also access data volumes from the pod. A typical application with init phase, run phase and maintenance phase looks like this on Kubernetes:
This example also shows an S3 bucket that is used for backups. The initContainer has exclusive access before the main application starts. It checks the data volume and restores data from backup if needed. Then both the main application container and the maintenance container are started and run in parallel. The maintenance container waits for the appropriate time and performs its maintenance task. Then it again waits for the next maintenance time and so on.

Simple CRON In Bash

The maintenance container can now contain a CRON daemon (for example Alpine Linux ships with dcron) that runs one or several jobs. If you have just a single job that needs to run once a day you can also get by with this simple Bash script. It takes the maintenance time in the RUNAT environment variable.


All that also holds true for other Docker cluster environments like Docker Swarm mode. However you package your software, Kubernetes offers a great way to simplify our Linux servers by dealing directly with the relevant Linux processes from a cluster perspective. Everything else of a traditional Linux system is now obsolete for applications running on Kubernetes or other Docker environments.

2017-06-12

Working with IAM Roles in Amazon AWS

Last week I wrote about understanding IAM Roles, let's follow up with some practical aspects. The following examples and scripts all use the aws-cli which you should have already installed. The scripts work on Mac and Linux and probably on Windows under Cygwin.

To illustrate the examples I use the case of an S3 backup bucket in another AWS account. For that scenario it is recommended to use a dedicated access role in the target AWS account to avoid troubles with S3 object ownership.

AWS Who Am I?

The most important question is sometimes to ascertain the identity. Luckily the aws-cli provides an option for that:
$ aws sts get-caller-identity
{
    "Account": "123456789",
    "UserId": "ABCDEFG22L2KWYE5WQ:sschapiro",
    "Arn": "arn:aws:sts::123456789:assumed-role/PowerUser/sschapiro"
}
From this we can learn our AWS account and the IAM Role that we currently use, if any.

AWS Assume Role Script

The following Bash script is my personal tool for jumping IAM Roles on the command line:
It takes any number of arguments, each a role name in the current account or a role ARN. It will try to go from role to role and returns you the temporary AWS credentials of the last role as environment variables:
$ aws-assume-role ec2-worker arn:aws:iam::987654321:role/backup-role
INFO: Switched to role arn:aws:iam::123456789:role/ec2-worker
INFO: Switched to role arn:aws:iam::987654321:role/backup-role
AWS_SECRET_ACCESS_KEY=DyVFtB63Om+uihwuieufzud/w5vm7Lhp3lx
AWS_SESSION_TOKEN=FQoDYXdzEHgaDAgVN…✂…tyHZrYSibmLbJBQ==
AWS_ACCESS_KEY_ID=ABCDEFGFWEIRFJSD6PQ
The first role ec2-worker is in the same account as the credentials with which we start. Therefore we can specify it just by its name. The second role is in another account and must be fully specified. If to switch to a third role in the same account we could again use the short form.

Single aws-cli Command

To run a single aws-cli or other command as a different role we can simple prefix it like this:
$ eval $(aws-assume-role \
    ec2-worker \
    arn:aws:iam::987654321:role/backup-role \
  ) aws sts get-caller-identity
INFO: Switched to role arn:aws:iam::123456789:role/ec2-worker
INFO: Switched to role arn:aws:iam::987654321:role/backup-role
{
    "Arn": "arn:aws:sts::987654321:assumed-role/backup-role/sschapiro",
    "UserId": "ABCDEFGEDJW4AZKZE:sschapiro",
    "Account": "987654321"
}
Similarly you can start an interactive Bash by giving bash -i as the command. aws-cli also supports switching IAM Roles via configuration profiles. This is a recommended way to permanently switch to another IAM Role, e.g. on EC2.

Docker Container with IAM Role

The same script also helps us to run a Docker container with AWS credentials for the target role injected:
$ docker run --rm -it \
  --env-file <(
    aws-assume-role \
      ec2-worker \
      arn:aws:iam::987654321:role/backup-role \
    ) \
  mesosphere/aws-cli sts get-caller-identity
INFO: Switched to role arn:aws:iam::123456789:role/ec2-worker
INFO: Switched to role arn:aws:iam::987654321:role/backup-role
{
    "Arn": "arn:aws:sts::987654321:assumed-role/backup-role/sschapiro",
    "UserId": "ABCDEFGRWEDJW4AZKZE:sschapiro",
    "Account": "987654321"
}
This example just calls aws-cli within Docker. The main trick is to feed the output of aws-assume-role into Docker via the --env-file parameter.

I hope that these tools help you also to work with IAM Roles. Please add your own tips and tricks as comments.

2017-06-08

Understanding IAM Roles in Amazon AWS

One of the most important security features of Amazon AWS are IAM Roles. They provide a security umbrella that can be adjusted to an application's needs in great detail. As I all the time forget the details I summarize here everything that helps me and some useful tricks for working with IAM Roles. This is part one of two.

Understanding IAM Roles

From a conceptual perspective an IAM Role is a sentence like Alice may eat apples: It grants or denies permissions (in the form of a access policy) on specific resources to principals. Alice is the principal, may is the granting, eat is the permission (to eat, but not to look at) and apples is the resource, in this case any kind of apples.
IAM Roles can be much more complex, for example this rather complex sentence is still a very easy to read IAM Role: Alice and Bob from Hamburg may find, look at, smell, eat and dispose of apples № 5 and bananas. Here we grant permissions to our Alice and to some Bob from another AWS account, we permit a whole bunch of useful actions and we allow them on one specific type of apples and on all bananas.

On AWS Alice and Bob will be Principals, either code hosting services like EC2 or IAM Users and Roles, find, look at, smell … are specific API calls on AWS services like s3:GetObject, s3:PutObject and apples and bananas are AWS resource identifiers like arn:aws:s3:::my-backup-bucket.

IAM Roles as JSON

IAM Roles and Policies are typically shown as JSON data structures. There are two main components:
  1. IAM Role with RoleName and AssumeRolePolicyDocument
  2. IAM Policy Document with one or several policy Statements
The AssumeRolePolicyDocument defines who can use this role and the Statements in the Policy Document define what this role can do.

A typical IAM Role definition looks like this:
The really important part here is the AssumeRolePolicyDocument which defines who can actually use the role. In this case there are two other IAM roles that can make use of this role. AWS allows specifying all kinds of Principals from the same or other AWS accounts. So far this Role does not yet allow anything, but it already provides an AWS identity. To fill the Role with life you have to attach one or more Policy Documents to the role. They can be either inline and stored within the Role or they can be separate IAM Policies that can be attached to a Role, AWS also provides a large amount of predefined policies for common jobs.

A PolicyDocument definition looks like this:
Here we have one Statement (there could be several) that gives read and write access to a single S3 bucket. Note that it does not allow deleting objects from the bucket as this example is for a backup bucket that automatically expunges old files.

Creating IAM Roles with CloudFormation

We typically create AWS resources through CloudFormation. This example creates an S3 bucket for backups together with a matching IAM Role that grants access to the bucket:
The role can be used either by EC2 instances or by the PowerUser role which our people typically have. This allows me to test the role from my desktop during the development and for troubleshooting.

Read also Working with IAM Roles in Amazon AWS for the second part of this article with practical aspects and some tips and tricks.

2017-06-02

Root for All - A DevOps Measure?

Who has root access in your IT organizations? Do you "do" DevOps? Even though getting root access was once my personal motivation for pushing DevOps, I never considered the question of the relationship till it was triggered by my last conference visit.

Last week I attended the 10. Secure Linux Administration Conference - a small but cherished German event catering to Linux admins - and there where two DevOps talks: DevOps in der Praxis (Practical DevOps) by Matthias Klein and my own DevOps for Everybody talk. I found it very interesting that we both talked about DevOps from a "been there, done it" perspective, although with a very different message.

DevOps ≠ DevOps

For me DevOps is most of all a story of Dev and Ops being equal, sitting in the same boat and working together on shared automation to tackle all problems. My favourite image replaces humans as gateway to the servers with tooling that all humans use to collaboratively deliver changes to the servers. In my world the question of root access is not one of Dev or Ops but one of areas of responsibility without discrimination.

For Matthias DevOps is much more about getting Dev and Ops to work together on the system landscape and on bigger changes, about learning to use really useful tools from the other (e.g. continuous delivery) and about developing a common understanding of the platform that both feel responsible for.

We both agree in general on the cultural aspects of DevOps: it is not a tool you can buy but rather the way of putting the emphasize on people, how they work together with respect and trust and setting the project/product/company before personal interests and ascribing success or failure to a team and not individuals.

Demystifying Root Access

So why is root access such a big deal? It only lets you do anything on a server. Anything means not only doing your regular job but also means all sorts of blunders or even mischief. I suspect that the main reason for organisations to restrict root access is the question of trust. Whom to trust to not mess things up or whom to trust to not do bad things?

So root access is considered a risk and one of the most simple risk avoidance strategies is limiting the amount of people with root access. If a company would be able to blindly trust all its employees with root access without significantly increasing the overall risk, then this might not be such a big topic.

Interestingly root access to servers is handled differently than database access with credentials that can read, write and delete any database entry. I find this very surprising as in most cases a problem with the master database has a much bigger impact than a problem with a server.

Root = Trust

If root access is a question of trust then that gives us the direct connection to DevOps. DevOps is all about people working together and sharing responsibilities. It is therefore most of all about building up trust between all people involved and between the organisation and the teams.

The other question is, do we place more trust into our people or into the automation we build? Traditional trust models are entirely people based. DevOps thinking strongly advocates building up automation that takes over the tedious and the risky tasks. 

If there is a lot of trust then the question of root access becomes meaningless. In an ideal DevOps world all the people only work on the automation that runs the environment. The few manual interactions that are still required can be handled by everyone on the team.

As both trusting people and building trustworthy automation leads to a situation where it is acceptable to grant root access to everyone, you can actually use root access as a simple and clear measurement for your organisation's progress on the DevOps journey.

Current Status

To find out the current status I did a small poll on Twitter:

As I did not ask about the status of DevOps adoption we cannot correlate the answers. What I do interpret from the result is that there is enough difference in root access in different companies to try to learn something from that. And I am very happy to see that 34% of answers give broad root access.

Embrace the Challenge

You will hear a lot of worries or meet the belief that in your organisation giving root to everybody is impossible, e.g. due to regulatory requirements. These are valid fears and one should take them serious. However, the solution is not to stop automating but rather to incorporate the requirements into the automation and make it so good that it can also solve those challenges.

The question of root access is still very provocative. Use that fact to start the discussion that will lead to real DevOps-style automation, build a dash board to show root access as a KPI and start to build the trust and the automation that you need to give root access to everyone.

I'll be happy if you share your own example for the DevOps - root correlation (or the opposite) in the comments.