I have a web site that communicates via XMLRPC with an XMLRPC server web service. (The web service is written in Python using xmlrpclib.) I believe that xmlrpclib will block while it is handling one request.
So if there are three users with an xmlrpclib request ahead of you, your response takes four times as long.
How do I handle it if I receive too many XMLRPC requests and the web service gets bogged down and has slow response time?
If I am getting slashdotted, my preferred behavior is that the first users get good response times and everyone else is told to come back later. I think this is superior to giving everyone terrible response times.
How do I create this behavior? Is this called load-balancing? I am not actually balancing though, until I have multiple servers.
Well, first off, is there a way that you can re-work the XMLRPC server to be able to handle multiple simultaneous requests? Having multiple web sessions depend on a service that can only do one request at a time is probably NOT going to cut it for use in the real world. An internal corporate site with low traffic might get away with it, but in the real world, no way.
Now, that said, I think you'll need to provide a little more info about the web service platform in order to get answers you can use. The best I can say at this point is 'count the outstanding XMLRPC requests in the web server and fail if there are too many', but that's so generic as to be useless.
You can limit the rate of connections using iptables with a statement like:
As you can see, we use this to limit the rate at which ssh connections can be made to our servers - but by changing the port number or other variables you could use this for just about any situation
You can address this by putting a reverse proxy in front of the XMLPRC server. The reverse proxy can be configured to have just one connection to the XMLPRC server, and while having a different maximum number of possible connections that the public can make to it, say 10.
For example, a backend Apache server might have MaxClients set to "1", while a front-end Nginx reverse proxy might connection limits set to 10 clients using worker_processes and worker_connections.
So, up to 10 clients can connect to the reverse proxy, be queued until the XMLPRC server is available (subject to some time-out value). If there are more than 10 connections at once, then the reverse proxy might simply fail to respond. So, you probably want to tune that that you can queue as many connections as you can responsibly handled.
The details are implementation dependent. I have used both HAProxy and Nginx as reverse proxies successfully, and find Nginx more pleasant to work with.
I also agree with the above feedback that being able to serve only one request at a time on the backend sounds like it could be a problem!