I'm implementing a network monitoring solution for a very large network (approximately 5000 network devices). We'd like to have all devices on our network send SNMP traps to a single box (technically this will probably be an HA pair of boxes) and then have that box pass the SNMP traps on to the real processing boxes. This will allow us to have multiple back-end boxes handling traps, and to distribute load among those back end boxes.
One key feature that we need is the ability to forward the traps to a specific box depending on the source address of the trap. Any suggestions for the best way to handle this?
Among the things we've considered are:
- Using snmptrapd to accept the traps, and have it pass them off to a custom written perl handler script to rewrite the trap and send it to the proper processing box
- Using some sort of load balancing software running on a Linux box to handle this (having some difficulty finding many load balancing programs that will handle UDP)
- Using a Load Balancing Appliance (F5, etc)
- Using IPTables on a Linux box to route the SNMP traps with NATing
We've currently implemented and are testing the last solution, with a Linux box with IPTables configured to receive the traps, and then depending on the source address of the trap, rewrite it with a destination nat (DNAT) so the packet gets sent to the proper server. For example:
# Range: 10.0.0.0/19 Site: abc01 Destination: foo01
iptables -t nat -A PREROUTING -p udp --dport 162 -s 10.0.0.0/19 -j DNAT --to-destination 10.1.2.3
# Range: 10.0.33.0/21 Site: abc01 Destination: foo01
iptables -t nat -A PREROUTING -p udp --dport 162 -s 10.0.33.0/21 -j DNAT --to-destination 10.1.2.3
# Range: 10.1.0.0/16 Site: xyz01 Destination: bar01
iptables -t nat -A PREROUTING -p udp --dport 162 -s 10.1.0.0/16 -j DNAT --to-destination 10.3.2.1
This should work with excellent efficiency for basic trap routing, but it leaves us completely limited to what we can mach and filter on with IPTables, so we're concerned about flexibility for the future.
Another feature that we'd really like, but isn't quite a "must have" is the ability to duplicate or mirror the UDP packets. Being able to take one incoming trap and route it to multiple destinations would be very useful.
Has anyone tried any of the possible solutions above for SNMP traps (or Netflow, general UDP, etc) load balancing? Or can anyone think of any other alternatives to solve this?
A co-worker just showed me samplicator. This tool looks to be just about a perfect solution what I was looking for. From the tool's website:
I would go implementing the solution myself, as I don't know if you will find something as specific as you want.
I would use a high-level language like ruby to implement the balance rules and even the trap listener. For instance, using this libraries seems easy.
Listen to traps:
You should add the balance logic in the
on_trap_default
block.Send traps:
To build the daemon you could use the daemon-kit ruby gem.
If you keep it simple and define good objects you can maintain the software with not much effort.
Your main problem is going to be, how do you know the actual ip of the device you are receiving the traps from?
If you are using SNMP v1, you can get the ip off the header of the trap. If you are using v2 or v3 traps, you will need to correlate the snmpengine id to the ip that you have previously fetched from the device. Engineid is typically not a mandatory config item for most SNMP implementations, and hence you can't fully rely on that alone.
The fallback is that you can use the source ip from the udp packet header. Ofcourse, this will fail, if your trap is routed through another EMS/NMS or if you have a NAT between the device and your mgmt application.
If you don't need to support NAT/forwarded traps from other NMS, then just make a copy of the udp packet, and route based on the ip
If you need to support that, you have to parse the SNMP trap and check for engine id match for v2/v3, for v1 you can read it off the agent-address field in the SNMP header.
one more netfilter-based hack:
[ assumption - all traps are sent to 10.0.0.1, which then redirects them to 10.0.0.2, 10.0.0.3, 10.0.0.4 ]
as long as you have one-packet-long snmp traps - this should spread load nicely - in this case across 3 machines. [ although i have not tested it ].
I think the answer from chmeee is the right way to go. Get rid of UDP and SNMP as early in the process as you can, they are horrible to manage.
I'm now building a system that will put all events (including traps) on a JMS queue and then use all the wonders of enterprise messaging to do load balancing and failover.
To get the original sender's IP, you could try to patch the snmptrapd with this patch - https://sourceforge.net/p/net-snmp/patches/1320/#6afe.
That modifies the payload, so IP headers will be kept intact, so they don't get into your routing and/or NATting.