Ping a Specific Port

Question

Industrial

Asked: 2011-03-06 15:59:32 +0800 CST2011-03-06 15:59:32 +0800 CST 2011-03-06 15:59:32 +0800 CST

Load balancing Apache on a budget?

772

I am trying to get my head around the concept of load balancing to ensure availability and redundancy to keep users happy when things go wrong, rather than load balancing for the sake of offering blistering speed to millions of users.

We're on a budget and trying to stick to the stuff where there's plenty of knowledge available, so running Apache on Ubuntu VPS's seems like the strategy until some famous search engine acquire us (Saturday irony included, please note).

At least to me, it's a complete jungle of different solutions available. Apaches own mod_proxy & HAproxy are two that we found by a quick google search, but having zero experience of load balancing, I have no idea of what would be appropriate for our situation, or what we would look after while choosing a solution to solve our availability concerns.

What is the best option for us? What should we do to get availability high whilst staying inside our budgets?

8 Answers

Voted

bonez · Answer 1 · 2011-03-06T16:21:38+08:00

HAproxy is a good solution. The config is fairly straight forward.

You'll need another VPS instance to sit in front of at least 2 other VPS's. So for load balancing / fail over you need a minimum of 3 VPS's

A few things to think about also is:

SSL termination. If you use HTTPS:// that connection should terminate at the load balancer, behind the load balancer it should pass all traffic over an unencrypted connection.
File storage. If a user uploads an image where does it go? Does it just sit on one machine? You need someway to share files instantly between machines - you could use Amazon's S3 service to store all your static files, or you could have another VPS that would act as a file server, but I would recommend S3 because its redundant and insanely cheap.
session info. each machine in your load balancer config needs to be able to access the session info of the user, because you never know what machine they will hit.
db - do you have a separate db server? if you only have one machine right now, how will you make sure your new machine will have access to the db server - and if its a separate VPS db server, how redundant is that. It doesn't necessarily makes sense to have High Availability web front ends and a single point of failure with one db server, now you need to consider db replication and slave promotion as well.

So I've been in your shoes, thats the trouble with a website that does a few hundred hits a day to a real operation. It gets complex quick. Hope that gave you some food for thought :)

coredump · Answer 2 · 2011-03-10T14:59:33+08:00

The solution I use, and can be easily implemented with VPS, is the following:

DNS is round-robin'ed (sp?) to 6 different valid IP addresses.
I have 3 load balancers with identical configuration and using corosync/pacemaker to distribute the 6 ip adresses evenly (so each machine gets 2 adresses).
Each of the load balancers has a nginx + varnish configuration. Nginx deal with receiving the connections and doing rewrites and some static serving, and passing it back to Varnish that does the load balancing and caching.

This arch has the following advantages, on my biased opinion:

corosync/pacemaker will redistribute the ip addresses in case one of the LB fails.
nginx can be used to serve SSL, certain types of files directly from the filesystem or NFS without using the cache (big videos, audio or big files).
Varnish is a very good load balancer supporting weight, backend health checking, and does a outstanding job as reverse proxy.
In case of more LB's being needed to handle the traffic, just add more machines to the cluster and the IP addresses will be rebalanced between all the machines. You can even do it automatically (adding and removing load balancers). That's why I use 6 ips for 3 machines, to let some space for growth.

In your case, having physically separated VPSs is a good idea, but makes the ip sharing more difficult. The objective is having a fault resistant, redundant system, and some configurations for load balancing/HA end messing it up adding a single point of failure (like a single load balancer to receive all traffic).

I also know you asked about apache, but those days we have specific tools better suited to the job (like nginx and varnish). Leave apache to run the applications on the backend and serve it using other tools (not that apache can't do good load balancing or reverse proxying, it's just a question of offloading different parts of the job to more services so each part can do well it's share).

MadHatter · Answer 3 · 2011-03-10T14:32:37+08:00

MadHatter

2011-03-10T14:32:37+08:002011-03-10T14:32:37+08:00

My vote is for Linux Virtual Server as the load balancer. This makes the LVS director a single point of failure as well as a bottleneck, but

The bottleneck is not, in my experience, a problem; the LVS redirection step is layer-3, and extremely (computationally) cheap.
The single point of failure should be dealt with by having a second director, with the two controlled by Linux HA.

Cost can be kept down by having the first director be on the same machine as the first LVS node, and the second director on the same machine as the second LVS node. Third and subsequent nodes are pure nodes, with no LVS or HA implications.

This also leaves you free to run any web server software you like, as the redirection's taking place below the application layer.

3

JamesRyan · Answer 4 · 2011-03-10T08:58:55+08:00

JamesRyan

2011-03-10T08:58:55+08:002011-03-10T08:58:55+08:00

How about this chain?

round robin dns > haproxy on both machines > nginx to seperate static files > apache

Possibly also use ucarp or heartbeat to ensure haproxy always answers. Stunnel would sit in front of haproxy if you need SSL too

1

Christopher Karel · Answer 5 · 2011-03-11T09:27:18+08:00

You may want to consider using proper clustering software. RedHat's (or CentOS) Cluster Suite, or Oracle's ClusterWare. These can be used to setup active-passive clusters, and can be used to restart services, and fail between nodes when there are serious issues. This is essentially what you're looking for.

All of these cluster solutions are included in the respective OS licenses, so you're probably cool on cost. They do require some manner of shared storage -- either an NFS mount, or physical disk accessed by both nodes with a clustered file system. An example of the latter would be SAN disks with multiple host access allowed, formatted with OCFS2 or GFS. I believe you can use VMWare shared disks for this.

The cluster software is used to define 'services' that run on nodes all the time, or only when that node is 'active'. The nodes communicate via heartbeats, and also monitor those services. They can restart them if they notice failures, and reboot if they can't be fixed.

You would basically configure a single 'shared' IP address that traffic would be directed to. Then apache, and any other necessary services, can be defined as well, and only run on the active server. Shared disk would be used for all your web content, any uploaded files, and your apache configuration directories. (with httpd.conf, etc)

In my experience, this works incredibly well.

There's no need for DNS round robin, or any other single-point-of-failure load balancer -- everything hits one IP/FQDN.
User uploaded files go into that shared storage, and thus don't care if your machine fails over.
Developers upload content to that single IP/FQDN with zero additional training, and it's always up to date if it fails over.
The administrator can take the offline machine, patch the heck out of it, reboot, etc. Then fail the active node over. Making an upgrade take minimal downtime.
That now out-of-date node can be kept unpatched for a while, making a fail-back an equally easy process. (Quicker than VMWare snapshots)
Changes to Apache's configuration are shared, so that nothing weird happens during a failover, because an admin forgot to make changes on the offline box.

--Christopher Karel

BillThor · Answer 6 · 2011-03-13T10:06:52+08:00

Optimal load balancing can be very expensive and complicated. Basic load balancing should just ensure that each server is servicing roughly the same number of hits at anytime.

The simplest load-balancing method is to provide multiple A records in DNS. By default the IP address will be configured in a round robin method. This will result in users being relatively evenly distributed across the servers. This works well for stateless sites. A little more complex method is required when you have a stateful site.

To handle stateful requirements, you can use redirects. Give each web server an alternate address such as www1, www2, www3, etc. Redirect the initial www connection to the host's alternate address. You may end up with bookmark issues this way, but they should be evenly dispersed across the servers.

Alternately, using a different path to indicate which server is handling the stateful session would allow proxying sessions which have switched host to the original server. This may be a problem when the session for a failed server arrives at server that has taken over from the failed server. However, barring clustering software the state will be missing anyway. Due to browser caching, you may not experience a lot of sessions changing servers.

Failover can be handled by configuring server to take over the IP address of a failed server. This will minimize the downtime if a server fails. Without clustering software, stateful sessions will be lost if a server fails.

Without failover users will experience a delay until their browser fails over to the next IP address.

Using Restful services rather than stateful sessions should do away clustering issues on the front-end. Clustering issues on the storage side would still apply.

Even with load balancers in front of the servers, you will likely have round-robin DNS in front of them. This will ensure all your load balancers get utilized. They will add another layer to you design, with additional complexity and another point of failure. However, they can provide some security features.

The best solution will depend on the relevant requirements.

Implementing image servers to serve up content like images, CSS files, and other static content can ease the load on the application servers.

Paul Doom · Answer 7 · 2011-03-13T16:27:19+08:00

I generally use a pair of identical OpenBSD machines:

Use RelayD for the load balancing, webserver monitoring, and handling of a failed webserver
Use CARP for high availability of the load balancers themselves.

OpenBSD is light, stable, and quite secure - Perfect for network services.

http://www.openbsd.org - Main site
http://www.openbsd.org/faq/pf/carp.html - Carp documentation
https://calomel.org/relayd.html - A decent set of howto info on RelayD

To start, I recommend a layer3 setup. It avoids complications firewall (PF) setup. Here is an example /etc/relayd.conf file that shows setup of a simple relay load balancer with monitoring of the backend webservers:

# $OpenBSD: relayd.conf,v 1.13 2008/03/03 16:58:41 reyk Exp $
#
# Macros
#

# The production internal load balanced address
intralbaddr="1.1.1.100"

# The interface on this load balancer with the alias for the intralbaddr address
intralbint="carp0"

# The list of web/app servers serving weblbaddress
intra1="1.1.1.90"
intra2="1.1.1.91"

# Global Options
#
# interval 10
timeout 1000
# prefork 5

log updates

# The "relaylb" interface group is assigned to the intralbint carp interface
# The following forces a demotion in carp if relayd stops
demote relaylb

#
# Each table will be mapped to a pf table.
#
table <intrahosts> { $intra1 $intra2 }

# Assumes local webserver that can provide a sorry page
table <fallback> { 127.0.0.1 }

#
# Relay and protocol for HTTP layer 7 loadbalancing and SSL acceleration
#
http protocol httprelay {
        return error
        header append "$REMOTE_ADDR" to "X-Forwarded-For"
        header append "$SERVER_ADDR:$SERVER_PORT" to "X-Forwarded-By"
        # header change "Connection" to "close"

        # Various TCP performance options
        tcp { nodelay, sack, socket buffer 65536, backlog 128 }

#       ssl { no sslv2, sslv3, tlsv1, ciphers HIGH }
#       ssl session cache disable
}

relay intra-httprelay {
        listen on $intralbaddr port 80
        protocol httprelay

        # Forward to hosts in the intrahosts table using a src/dst hash
        # The example shows use of a page with dynamic content to provide
        # application aware site checking.  This page should return a 200 on success,
        # including database or appserver connection, and a 500 or other on failure
        forward to <intrahosts> port http mode loadbalance \
                check http "/nlbcheck.asp" code 200

}

Ankur Chauhan · Answer 8 · 2011-03-06T16:34:33+08:00

Ankur Chauhan

2011-03-06T16:34:33+08:002011-03-06T16:34:33+08:00

Have you given ec2 with cloudfoundry or maybe Elastic beanstalk or just a plain old AWS autoscaling a thought. I have been using that and it scales pretty well and being elastic can scaleup/down without any human intervention.

Given that you say you have zero experience with load balancing, I would suggest these options as they require minimal brain "frying" to get up and running.

It might be a better use of your time.

0

Load balancing Apache on a budget?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?