But is such a VPN infrastructure also a "good idea"? Will it help us or hinder us in the future?
I actually believe that for having many data centers a site VPN infrastructure is a dangerous tool. On the good side it is very convenient to have and to set up and it simplifies a lot of things. On the other side it is also very easy to build a world-wide mesh of dependencies where a VPN failure can severly inhibit data center operations or even take down services. It also lures everybody into creating undocumented backend connection between services.
The core problem is in my opinion one of scale. Having a small number (3 to 5) of locations is fundamentally different from having 50 or more locations. With 3 locations it is still feasable to build a mesh layout, each location talking to each other. With 5 locations a mesh already needs 10 connection, which starts to be "much". But a star layout always has a central bottleneck. With 50 locations a mesh network already needs 1225 connections.
With the move from the world of (few) physical data centers to the world of cloud operations it is quite common to have many data centers in the cloud. For example, it is "best practice" to use many accounts to separate development and production or to isolate applications or teams (See Yavor Atanasov from BBC explaining this at the Velocity Europe 2014). What is not so obvious from afar is that actually each cloud account is a separate data center of its own! So having many teams can quickly lead to having many accounts, I have talked to many companies who have between 30 and 100 cloud accounts!
Another problem is the fact that all the data centers would need to have different IP ranges plus one needs another (small) IP range for each connection etc. All of this is a lot of stuff to handle.
As an alternative approach I suggest not using site VPNs altogether. Instead, each data center should be handled as an independant data center altogether. To make this fact transparent, I would also suggest to use the same IP range in all data centers!
I see many advantages to such a setup:
- All connections between services in different data centers must use the public network.
As a result all connections have to be secured and audited and supervised properly. They have to be part of the production environment and will be fully documented. If a connection for one application fails, other applications are not impaired.
- Standard ressources can be offered under standard IPs to simplify bootstrapping (e.g. 192.168.10.10 is always the outgoing proxy or DNS server or something other vital).
- If development and production are in separate "data centers" (e.g. cloud accounts), then they can be much more similar.
- A security breach in one account does not easily spill over into other accounts.
- Operations and setup of the cloud accounts and physical data centers becomes much simpler (less external dependencies).
- It will be easy to build up systemic resilience as a failure in one data center or account does not easily affect other data centers or accounts.
- Admins will be treated as road warriors and connect to each data center independantly as needed.
Currently I am looking for arguments in favor and against this idea, please share your thoughts!