I am looking for possibilities (and their pros and cons) for protecting network traffic of the components of a a time-critical application in a data center. The aim is minimizing the damage an attacker can cause if he manages to compromose a VM. It shall be impossible to read the traffic between other (non-compromised) VMs. This could be achieved by encryption or by limiting the network access.
We have a VMware environment, several ESXi hosts and a Fortigate firewall. Parts of the internal traffic are not encrypted yet because the application opens several connections, one after the other. And there is a latency limit on the whole process.
Due to the latency limit the (trivial) usage for TLS for each connection is not an option. Maybe it could be done with proxies on all systems which keep the TLS connections open independent of what the application is doing.
I guess using a VPN between all involved systems (about 50) would be a management nightmare. We use keepalived which probably makes a VPN solution even worse.
I also think about using permanent ARP entries as a protection against ARP spoofing. VMware prevents MAC spoofing. This would not add any latency and should avoid the need for encryption. But it does not work well with the Fortigate and not with virtual IPs either.
I am interested in opinions about the mentioned approaches and other approaches which I am not aware of yet.
What do other organizations with microservices and timing restrictions do? I do not require a statement what the best solution is. I would like to know what has proven (not) to be feasible.
It sounds like you're looking for what is nowadays called "micro-segmentation" as a part of a zero-trust architecture.
Basically, with traditional segmentation (firewalling & vLANing) you limit the communications between subnets -- you prevent someone from from communicating with unless they meet the given criteria. In micro-segmentation you do traffic limiting, and/or inspection, between applications/services that are living within the same subnet. This is most readily accomplished at the hypervisor level, since the hypervisor can natively see all the traffic passing to/from its guests.
And zero-trust is referring to the ago old security adage of "trust no one," even your own systems. You extend the bare minimum level of trust, basically the operating equivalent of "need to know" for information. App servers can communicate to the outside world only as responses to TLS requests, nothing else is permitted -- etc. You limit trust to explicit requirements for operations, allowing nothing more.
As an example, you could use micro-segmentation to say that Front-End-App-Server-A can talk to Mid-Tier-Logic-Server-A which can, in turn, talk to Database-Server-A. But the Front-End-Server can't talk directly to the DB server. And Front-End-A can't talk to Front-End-B, even though they're in the same subnet.
VMware has been making a rather big deal about using micro-segmentation on their platform lately. Sounds like you should check it out: https://www.vmware.com/solutions/micro-segmentation.html
This bypasses the session creation time constraints for encryption. Plus it does what encryption can't. (VPNs or SSL/TLS protect data in transit, but don't actually limit the harm an attacker can do once within the "secured" network. Segmentation with limited trust limits the attackers possible next steps -- basically they have to defeat a new firewall each time they try to pivot to a new vector.) And, it is all done at the hypervisor/network level; which means that you don't have to rewrite your applications to use it. Set it up at the infrastructure level and let your applications keep doing whatever they're doing.
Basically VPN would have similar performance impact as TLS would. So you need latency-wise cheaper solution.
You can use ESXi Firewall. (Basically vmware technology is kind of near to Linux and L2 and L3 filtering on bridges and other virtual network components is possible).
You can use a kind of network segmentation - use of multiple network adapters assigned to different groups of guests or guests might create some additional barriers.
You can use some VLAN setup, but IMHO it's a bit overestimated strategy.