Infrastructure: Servers in Datacenter, OS - Debian Squeeze, Webserver - Apache 2.2.16
Situation:
The live server is in use by our cusotmers every day, which makes it impossible to test adjustments and improvements. Therefore we would like to duplicate the inbound HTTP traffic on the live server to one or multiple remote servers in realtime. The traffic has to be passed to the local Webserver (in this case Apache) AND to the remote server(s). Thereby we can adjust configurations and use different/updated code on the remote server(s) for benchmarking and comparison with the current live-server. Currently the webserver is listening to approx. 60 additional ports besides 80 and 443, because of the client structure.
Question: How can this duplication to one or multiple remote servers be implemented?
We have already tried:
- agnoster duplicator - this would require one open session per port which is not applicable. (https://github.com/agnoster/duplicator)
- kklis proxy - does only forward traffic to remote server, but does not pass it to the lcoal webserver. (https://github.com/kklis/proxy)
- iptables - DNAT does only forward the traffic, but does not pass it to the local webserver
- iptables - TEE does only duplicate to servers in the local network -> the servers are not located in the same network due to the structure of the datacenter
- suggested alternatives provided for the question "duplicate tcp traffic with a proxy" at stackoverflow (https://stackoverflow.com/questions/7247668/duplicate-tcp-traffic-with-a-proxy) were unsuccessful. As mentioned, TEE does not work with remote servers outside the local network. teeproxy is no longer available (https://github.com/chrislusf/tee-proxy) and we could not find it somewhere else.
- We have added a second IP address (which is in the same network) and assigned it to eth0:0 (primary IP address is assigned to eth0). No success with combining this new IP or virtual interface eth0:0 with iptables TEE function or routes.
- suggested alternatives provided for the question "duplicate incoming tcp traffic on debian squeeze" (Duplicate incoming TCP traffic on Debian Squeeze) were unsuccessful. The cat|nc sessions (cat /tmp/prodpipe | nc 127.0.0.1 12345 and cat /tmp/testpipe | nc 127.0.0.1 23456) are interrupted after every request/connect by a client without any notice or log. Keepalive did not change this situation. TCP Packages were not transported to remote system.
- Additional tries with with different options of socat (HowTo: http://www.cyberciti.biz/faq/linux-unix-tcp-port-forwarding/ , https://stackoverflow.com/questions/9024227/duplicate-input-unix-stream-to-multiple-tcp-clients-using-socat) and similar tools were unsuccessful, because the provided TEE function will write to FS only.
- Of course, googling and searching for this "problem" or setup was unsuccessful as well.
We are running out of options here.
Is there a method to disable the enforcement of "server in local network" of the TEE function when using IPTABLES?
Can our goal be achieved by different usage of IPTABLES or Routes?
Do you know a different tool for this purpose which has been tested and works for these specific circumstances?
Is there a different source for tee-proxy (which would fit our requirements perfectly, AFAIK)?
Thanks in advance for your replies.
----------
edit: 05.02.2014
here is the python script, which would function the way we need it:
import socket
import SimpleHTTPServer
import SocketServer
import sys, thread, time
def main(config, errorlog):
sys.stderr = file(errorlog, 'a')
for settings in parse(config):
thread.start_new_thread(server, settings)
while True:
time.sleep(60)
def parse(configline):
settings = list()
for line in file(configline):
parts = line.split()
settings.append((int(parts[0]), int(parts[1]), parts[2], int(parts[3])))
return settings
def server(*settings):
try:
dock_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
dock_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
dock_socket.bind(('', settings[0]))
dock_socket.listen(5)
while True:
client_socket = dock_socket.accept()[0]
client_data = client_socket.recv(1024)
sys.stderr.write("[OK] Data received:\n %s \n" % client_data)
print "Forward data to local port: %s" % (settings[1])
local_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
local_socket.connect(('', settings[1]))
local_socket.sendall(client_data)
print "Get response from local socket"
client_response = local_socket.recv(1024)
local_socket.close()
print "Send response to client"
client_socket.sendall(client_response)
print "Close client socket"
client_socket.close()
print "Forward data to remote server: %s:%s" % (settings[2],settings[3])
remote_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
remote_socket.connect((settings[2], settings[3]))
remote_socket.sendall(client_data)
print "Close remote sockets"
remote_socket.close()
except:
print "[ERROR]: ",
print sys.exc_info()
raise
if __name__ == '__main__':
main('multiforwarder.config', 'error.log')
The comments to use this script:
This script forwards a number of configured local ports to another local and a remote socket servers.
Configuration:
Add to the config file port-forward.config lines with contents as follows:
Error messages are stored in file 'error.log'.
The script splits the parameters of the config file:
Split each config-line with spaces
0: local port to listen to
1: local port to forward to
2: remote ip adress of destination server
3: remote port of destination server
and return settings
From what you describe, GOR seems to fit your needs. https://github.com/buger/gor/ "HTTP traffic replay in real-time. Replay traffic from production to staging and dev environnements." ?
It is impossible. TCP is statefull protocol. User end computer is involved in every step of connection and it will never answer to two separate servers trying to communicate to it. All you can do is collect all http request on webserver or some proxy and replay them. But that will not give and exact concurrency or traffic conditions of a live server.
Teeproxy could be used to replicate traffic. The usage is really simple:
a
production serverb
testing serverWhen you put a HAproxy (with
roundrobin
) before your webserver you can easily redirect 50% of your traffic to testing site:TCP, being a stateful protocol, isn't amenable to simply blasting copies of the packets at another host, as @KazimierasAliulis points out.
Picking up the packets at the layer of TCP termination and relaying them as a new TCP stream is reasonable. The duplicator tool you linked to looks like your best bet. It operates as a TCP proxy, allowing the TCP state machine to operate properly. The responses from your test machines will just be discarded. That sounds like it fits the bill for what you want exactly.
It's unclear to me why you've written off the duplicator tool as unacceptable. You will have to run multiple instances of the tool since it only listens on a single port but, presumably, you want to relay each of those different listening ports to different ports on the back-end system. If not, you could use iptables DNAT to direct all the listening ports to a single listening copy of the duplicator tool.
Unless the applications you're testing are dirt simple I expect that you're going to have problems with this testing methodology relating to timing and internal application state. What you want to do sounds deceptively simple-- I expect you're going to find a lot of edge cases.
I'm trying to do something similar, however, if you are simply trying to simulate the load on a server I would look at something like a load-testing framework. I've used locust.io in the past and it worked really well for simulating a load on a server. That should allow you to simulate a large number of clients and let you play with the configuration of the server without having to go through the painful process of forwarding traffic to another server.
As far as "we would like to duplicate the inbound HTTP traffic on the live server to one or multiple remote servers in realtime", there's one way not mentioned above, which is configuring a mirror port on the switch it's connected to.
In the case of Cisco Catalyst switches, this is called SPAN (more info here). In a Cisco environment you can even have the mirrored port on a different switch.
But the purpose of this is for traffic-analysis so it will be uni-directional - keyword in quoted text in first paragraph above: inbound. I don't think that port will allow any return traffic, and if it did, how would you deal with duplicate return traffic? That will probably just wreak havoc with your network.
So... just wanted to add one possibility to your list, but with the caveat that it will be indeed for one-way traffic. Maybe you can put a hub on that mirror port and have duplicate server replies handed by some local client simulator that would pick up initiated sessions and respond to, but then you would be duplicating incoming traffic to your duplicate server... probably not what you want.
I have also written a reverse proxy / load balancer for a similar purpose with Node.js (it is just for fun, not production ready at the moment).
https://github.com/losnir/ampel
It is very opinionated, and currently supports:
GET
Using round-robin selection (1:1)POST
Using request splitting. There is no concept of "master" and "shadow" -- the first backend that responds is the one that will serve the client request, and then all of the other responses will be discarded.If someone finds it useful then I can improve it to be more flexible.
my company had similar requirement, to clone a packet and send to another host (we run market data simulators and needed a temporary solution that would listen to a market data TCP feed, ingest each packet but also send a clone of each packet to another simulator server)
this binary runs very well, its a version of TCP Duplicator but written in golang instead of jscript, so its way faster, and works as advertised,
https://github.com/mkevac/goduplicator
there is a tool created by a guy from a Chinese company, and maybe it is what you need: https://github.com/session-replay-tools/tcpcopy