2016-02-19

You can't control internal public data

Everywhere there is some data that is relevant either for all applications or for many applications in different parts of the platform.

The "obvious" solution to this problem is to make such data internally public or world-readable, meaning that the entire platform can read it.

The "obvious" solution to security in this case is actually having no security beyond ensuring the "are you part of us?" question.

Common implementations of this pattern are world-readable NFS shares, S3 buckets readable by all "our" AWS accounts, HTTP APIs that use the client IP as their sole access control mechanism etc.

This is approach is really dangerous and should be used with care. The risks include:

  • You most likely don't know who actually needs the data and who not. If you ever need to restrict access you will have a very long and tedious job ahead of you.
  • You don't know who accessed the data for which purpose.
  • After a data leak, you probably won't know how it happened.
  • The data is only as safe as the weakest application in your platform.
This last danger is the real problem here. Imagine that you share personally identifiable information to all your internal systems. Either out of convenience or even unknowingly. The weakest app in your portfolio will be used to hack your platform and it will be used to copy all that sensitive data.

It is not a question of "if" but rather a question of "when" as we can learn from Facebook: Instagram's Million Dollar Bug is a must read for everybody! And please also read the case study for defense that detailed the learning from it.

One key take away is the folly of having a central, internally world readable S3 bucket with the configuration of the entire platform. A simple grep over that data will instantly yield all the other S3 buckets in use and especially those that are internally world readable. A real attacker would have proceeded to copy all "interesting" data he could find to an S3 bucket of his own. In most setups, this operation would not leave any trace except that the exploited system read all the S3 buckets. Facebook was just very lucky that their flaw was found by a security researcher who played by the rules.

Instead of going the convenient way it pays to invest early into a minimum level of security, for example doing everything with a least privilege model. The result is a detailed list of access grants that you can use for security audits and to draw a risk map. The risk map will give you all the systems that have a risk of leaking the data.
Risk Map: Services 1, 2, 4, 5 and 8 define the safety of the personal data.
Yes, this means a bit more work but the payoff is almost immediate. Especially if you partition your platform into smaller parts you should be careful not to destroy the security benefits of that partition. At ImmobilienScout24 we use many AWS accounts, among other things to have a "blast radius" for potential hazard. The blast radius concept applies not only to infrastructure but also to data. Everybody with a partitioned platform should pay attention that the partitioning covers all aspects of the platform and not only the infrastructure level.

Finally, if you are subject to German Data Privacy Laws (Bundesdatenschutzgesetzt) then there are special regulations against uncontrolled spread of data (Weitergabekontrolle). Having personally identifiable information that is internally world-readable probably defies this law. If you can't be 100% sure that your internal public data is clean then it is much safer and easier to just automate the access control.

To conclude, I strongly advise to only put such data world-readable that you can risk to leak into the open.