I've recently setup a loadbalanced solution for our websites. We host about 200 sites, most run of our custom application, but some are running wordpress blogs (in which files can be uploaded/deleted). The setup is basic:
|-------------------> Apache1
|
HAProxy -|
|
|-------------------> Apache2
I've set up Apache1
as a 'master', so that most of the changes made on it are rsync'd over to Apache2
every minute using the following command:
rsync -av --delete apache1:/var/www/html/ /var/www/html/
The problem is, as mentioned earlier, in some cases files are added/removed on Apache2
. The only solution I've come up with so far is to have Apache1
rsync all files in certain directories (wp-content, for instance) to itself (not delete), then push everything back to Apache2
.
This has it's flaws, the main ones being:
- The two servers will eventually get extra files that have been deleted on
Apache2
- As I add more servers, the rsync script will take longer to complete.
Are there any ways to keep 2+ web servers synched, taking into account that both servers can have files added, updated and deleted?
I'm using OCFS2 with DRBD.
A DRBD resource
/etc/drbd.d/r0.res
:/dev/drbd1
is formatted as ocfs2 filesystem:Configuration for OCFS2 without Pacemaker
/etc/ocfs2/cluster.conf
:DRBD status can be looked at with
drbd-overview
utility:or from
/proc/drbd
:We are currently using rsync also, but I'm not crazy about it.
We have been experimenting with fileconveyor, which not only will sync between two servers, but we can also sync up with S3, Cloudfiles or other cloud storage. This will obviously provide us a lot more flexibility.
I don't have any config setups to share at this moment, but we are liking what we see.
I have not used it in a server setup, but you might try Unison. It deals with changes on either side and will automatically sync things that aren't conflicting. I believe it is limited to 2 hosts, so it wouldn't scale past your current solution.
The only way I know how to scale past 2 hosts would be to set up NFS, or some other shared/distributed filesystem.
Another option would be to build an "authoritative" replica of the content apart from the front-facing webservers and make sure all updates and changes are made on that replica.
Then, you deploy from that server to any number of front-facing servers on a set schedule.
Yes, it's an extra copy of the content but it does give you some potential benefits:
1) Control of when the updates go live
2) Less complexity in handling multi-direction sync between any number of servers
3) The ability to make changes and preview them without impact your front-facing production.
Other options are some type of shared storage spread across as much hardware as you need for reliability, performance, and scalability.
I've been having this same conundrum and have come across a few solutions depending on the specifics of the application in question.
NFS: While NFS, or some sort of shared drive would certainly work, in my case, I wanted to avoid it because it creates a bottleneck of one computer that can bring down the whole system. A strong part of my reason for load balancing is redundancy, and NFS takes this out of the equation. Although, if all other options fail, this might be the only one left.
DB Files: Most of what I try to do is build the application to store it's files in a database. That way I don't have to worry about any of the clustered web servers having to write any data. This seems by far the best solution, but is often times not an option if you are not developing the software.
DNS control: For some sites or applications that have an "admin" section that only a few users use (like a wordpress blog), I sometimes use a seperate dns pointing to the master server to ensure that the admin only creates writes on the correct server. With a few modifications, you can redirect wp-admin to use the admin dns. The downside here is that while the front face of the blog remains load balanced and redundent, the admin section is reliant on one system. For most bloggers, this is probably ok, though.
Two-way rsync: The last option, which I try to avoid, is multiple direction rsyncing. Deleting becomes the biggest problem here where it's hard for rsync to know if a file is a new file (and thus only showing up on one server), or a deleted file (and thus only showing up on one server). Generally, if I have to do multi-direction rsyncing, I target a specific folder where the data is stored and remove it from the rest of the structure using a symlink, then rsync it both ways without delete. Most applications don't ever need to delete a file, unless they are creating temp files, which should probably be stored outside of your sites structure, anyway. This can still work with deleting files, but I'd still try to target your specific directories that you store files.
look at LSYNCD present delete support
set up ssh authorization without password https://www.shellhacks.com/ssh-login-without-password/
set up lsyncd (also present in debian/ubuntu repos by default) https://github.com/axkibe/lsyncd