I have a box running Mikrotik RouterOS, which is set up to do transparent web proxying, as described here.
In short, this means that I have a firewall rule for destination NAT causing any port 80 traffic to get redirected to port 8080 on the router, which is received by the Mikrotik local web proxy. The local web proxy then makes the web request on the client's behalf, in this case to a parent web proxy server (which in turn does the real web request).
My question is, how will this two-part process get reported in the logging of traffic flow information (netflow)?
Looking at the logged information, what I seem to be seeing is this:
- One flow recorded from client machine (private IP address) to remote proxy (8080)
- Another flow recorded from router to remote proxy (8080)
The original request that the client made to port 80 isn't recorded.
I want to write code to analyse traffic usage, so I want to be sure I'm not losing information if I discard the latter of these.
You can check the URLs passed in the HTTP requests. If the URLs between the two flows match, the flows are just duplicates and you can simply discard one of them. The first flow is more meaningful to you as you said because it tells the client IP.