I'm planning on replicating a web server for high availability purposes. The server is running as a Ubuntu 15.04 virtual machine in Hyper-V and has MariaDB 10.0, Apache 2.4 and PHP-FPM 5.6 installed.
The 2nd virtual machine will start as a direct copy of this virtual machine but will be located in the United States. The 1st virtual machine is located in Europe. (latency will be about 80-120 ms)
I'd like to keep the servers in synch so they can both serve the same content to my clients and so that clients will use the closest server (for this I will use Route 53)
It's important that the data exchange between the servers is secure, to protect contact details and other information in the database and to prevent altering of the files in the /var/www directory.
I've considered using the following options for this:
- openvpn
- SSH tunnel
SSL(TLS)
- My question is which is the most reliable, quickest(latency, throughput) and safest method? Easy maintenance is also nice to have ;)
I've considered using SSH, for both the database and file replication. However I'm not sure which application to use for the file replication part.
- How should I setup the replication of the files and which application should I use for this?
SSL could be used for the database replication, but it requires the generation of certificates which need to be replaced every now and then and which may cost money.
My final option is to use openvpn, but I'm not sure if I can set this up as an additional network instead of routing all my traffic over it. This method also seems to require the generation of certificate files..
- I'd like to have the ability to add extra servers to the replication process at a later moment, possibly windows servers.
You don't need paid TLS certificates for your own private communication. You can set up your own CA (with very long lasting certs, in case of compromise you just throw away the entire CA) and make your servers trust it, then you can issue as many certs as you want for different services. Paid certificates are only needed when you can't reliably make the remote hosts trust your CA, like your website's visitors for example.
If you just need to use a single service and it supports TLS (like MySQL does), go with that and add an additional layer of security by only allowing connections from your server's IPs at the firewall level.
If you need more than one service, you're better off with a VPN solution. Don't waste your time with OpenVPN, your kernel has built-in IPSec support and you can use that. Plus, it's supported out of the box on Windows so if you ever deploy such servers it'll be easy to set up.
That was the easy part. The real hard part is to keep the files of your app in sync, it's easy if your app only uses a database, but if it's a general-purpose CMS there's a good chance it also modifies its own files for whatever reason (plugin updates for example) or creates new ones (user-uploaded content, etc) and I don't know of any reliable way of keeping them in sync. The only solution that comes to mind is either NFS (and only having a single server which hosts the files, but that's against your HA requirement) or GlusterFS, both of which will perform quite poorly with this kind of latency.
rsync
is a great tool for keeping files in sync. I would use it in combination with SSH (and public keys), like this:For multiple servers multiple uses of
rsync
might be the best option. Another option ispdcp -r
, but that requires copying all the files every time instead of doing delta-transfers. In other words, it's better for small amounts of data and many servers.How to best do database replication depends greatly on what your application does. There's a lot of good advice in the MariaDB docs and other questions around here.
You don't mention, but I assume that you are after a multi-master setup. This rules out master-slave database setups.
I believe which technology to use for security will be the least of your worries.
1. Datababase
Keeping a multi-master database setup is sync is tricky. You can go with multi-master (active/active) database setups like MariaDB w/Galera Cluster, but I'm not so sure if this is such a good idea considering the geographic distance. All writes will be synchronous to all database nodes. Latency will greatly impact database performance. Database clusters such as Galera are generally described as good candidates within a single LAN. However, from a performance point of view, multi-master database clusters are generally described as being worse off when it comes to "real" high-availability setups where database nodes are spread across multiple physical locations in a WAN. Before you jump into a multi-master database setup, read up on the topic first. Begin with a look at The Scale-Out Blog.
You could also take another approach and look into SymmetricDS which allows you to keep databases in sync. It uses triggers to capture database CRUD operations. It will not replicate schema changes, users or anything else than pure data. However, this will be asynchronous replication and it will not take care of e.g. auto incremented primary keys like a MariaDB Galera cluster will.
2. Files
You could use a central NFS server, but that would defeat the purpose of multiple masters (and no single point of failure). I've used csync2 (with lsyncd) to keep web nodes in sync with succes (rsync under the hood).
General Advise
Are you doing this for performance or for high availability? If you do it for performance, a single location and a Varnish server in front will take you far. I know you'll still have the latency, but you'll be able to "shave off" server overhead through caching. You could even add Varnish servers on other locations. Complexity will be greatly reduced (never underestimate this; keep it simple). Top this with a CDN for assets (CSS,js,images etc) and you will most likely be able to provide US users with just as good a user experience as European users (if you base your service in Europe).
If you do this for high availability, then you will have to add complexity by introducing replication.