Ping a Specific Port

Question

Derek Downey

Asked: 2011-08-24 06:00:54 +0800 CST2011-08-24 06:00:54 +0800 CST 2011-08-24 06:00:54 +0800 CST

How to keep load-balanced servers synced even with deleted files?

772

I've recently setup a loadbalanced solution for our websites. We host about 200 sites, most run of our custom application, but some are running wordpress blogs (in which files can be uploaded/deleted). The setup is basic:

          |-------------------> Apache1
          |
 HAProxy -|
          |
          |-------------------> Apache2

I've set up Apache1 as a 'master', so that most of the changes made on it are rsync'd over to Apache2 every minute using the following command:

rsync -av --delete apache1:/var/www/html/ /var/www/html/

The problem is, as mentioned earlier, in some cases files are added/removed on Apache2. The only solution I've come up with so far is to have Apache1 rsync all files in certain directories (wp-content, for instance) to itself (not delete), then push everything back to Apache2.

This has it's flaws, the main ones being:

The two servers will eventually get extra files that have been deleted on Apache2
As I add more servers, the rsync script will take longer to complete.

Are there any ways to keep 2+ web servers synched, taking into account that both servers can have files added, updated and deleted?

6 Answers

Voted

quanta · Answer 1 · 2011-08-24T06:19:56+08:00

I'm using OCFS2 with DRBD.

A DRBD resource /etc/drbd.d/r0.res:

resource r0 {
    syncer { rate 1000M; }
    net {
        allow-two-primaries;
        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;
    }
    startup { become-primary-on both; }

    on s1 {
        device      /dev/drbd1;
        disk        /dev/sdc;
        address     ip1:7789;
        meta-disk   internal;
    }
    on s2 {
        device      /dev/drbd1;
        disk        /dev/xvdb2;
        address     ip2:7789;
        meta-disk   internal;
    }
}

/dev/drbd1 is formatted as ocfs2 filesystem:

/dev/drbd1   ocfs2   100660180   7427076  93233104   8% /data/webroot

Configuration for OCFS2 without Pacemaker /etc/ocfs2/cluster.conf:

node:
    ip_port = 7777
    ip_address = ip1
    number = 0
    name = s1
    cluster = ocfs2

node:
    ip_port = 7777
    ip_address = ip2
    number = 1
    name = s2
    cluster = ocfs2

cluster:
    node_count = 2
    name = ocfs2

DRBD status can be looked at with drbd-overview utility:

# drbd-overview 
  1:r0  Connected Primary/Primary UpToDate/UpToDate C r---- /data/webroot ocfs2 96G 9.8G 87G 11%

or from /proc/drbd:

cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:09

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:953133955 nr:42207234 dw:1185526354 dr:62396241 al:230084 bm:5853 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Ryan Gibbons · Answer 2 · 2013-11-11T08:51:56+08:00

Ryan Gibbons

2013-11-11T08:51:56+08:002013-11-11T08:51:56+08:00

We are currently using rsync also, but I'm not crazy about it.

We have been experimenting with fileconveyor, which not only will sync between two servers, but we can also sync up with S3, Cloudfiles or other cloud storage. This will obviously provide us a lot more flexibility.

I don't have any config setups to share at this moment, but we are liking what we see.

2

Alex · Answer 3 · 2011-08-24T06:06:43+08:00

Alex

2011-08-24T06:06:43+08:002011-08-24T06:06:43+08:00

I have not used it in a server setup, but you might try Unison. It deals with changes on either side and will automatically sync things that aren't conflicting. I believe it is limited to 2 hosts, so it wouldn't scale past your current solution.

The only way I know how to scale past 2 hosts would be to set up NFS, or some other shared/distributed filesystem.

1

damorg · Answer 4 · 2011-08-24T06:28:30+08:00

damorg

2011-08-24T06:28:30+08:002011-08-24T06:28:30+08:00

Another option would be to build an "authoritative" replica of the content apart from the front-facing webservers and make sure all updates and changes are made on that replica.

Then, you deploy from that server to any number of front-facing servers on a set schedule.

Yes, it's an extra copy of the content but it does give you some potential benefits:

1) Control of when the updates go live

2) Less complexity in handling multi-direction sync between any number of servers

3) The ability to make changes and preview them without impact your front-facing production.

Other options are some type of shared storage spread across as much hardware as you need for reliability, performance, and scalability.

1

Talik · Answer 5 · 2011-11-04T08:36:20+08:00

I've been having this same conundrum and have come across a few solutions depending on the specifics of the application in question.

NFS: While NFS, or some sort of shared drive would certainly work, in my case, I wanted to avoid it because it creates a bottleneck of one computer that can bring down the whole system. A strong part of my reason for load balancing is redundancy, and NFS takes this out of the equation. Although, if all other options fail, this might be the only one left.

DB Files: Most of what I try to do is build the application to store it's files in a database. That way I don't have to worry about any of the clustered web servers having to write any data. This seems by far the best solution, but is often times not an option if you are not developing the software.

DNS control: For some sites or applications that have an "admin" section that only a few users use (like a wordpress blog), I sometimes use a seperate dns pointing to the master server to ensure that the admin only creates writes on the correct server. With a few modifications, you can redirect wp-admin to use the admin dns. The downside here is that while the front face of the blog remains load balanced and redundent, the admin section is reliant on one system. For most bloggers, this is probably ok, though.

Two-way rsync: The last option, which I try to avoid, is multiple direction rsyncing. Deleting becomes the biggest problem here where it's hard for rsync to know if a file is a new file (and thus only showing up on one server), or a deleted file (and thus only showing up on one server). Generally, if I have to do multi-direction rsyncing, I target a specific folder where the data is stored and remove it from the rest of the structure using a symlink, then rsync it both ways without delete. Most applications don't ever need to delete a file, unless they are creating temp files, which should probably be stored outside of your sites structure, anyway. This can still work with deleting files, but I'd still try to target your specific directories that you store files.

Tester0 · Answer 6 · 2018-03-17T05:36:38+08:00

Tester0

2018-03-17T05:36:38+08:002018-03-17T05:36:38+08:00

look at LSYNCD present delete support

set up ssh authorization without password https://www.shellhacks.com/ssh-login-without-password/
set up lsyncd (also present in debian/ubuntu repos by default) https://github.com/axkibe/lsyncd

0

How to keep load-balanced servers synced even with deleted files?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?