We have a j2ee webapp on tomcat which is using ehcache with multicast discovery. Except it's not discovering anything. There appears to to be no multicast traffic visible on the network but we're a little unclear how to really troubleshoot it. We have a few positive things so far...
if I tcpdump for the relevant multicast IP address on one device and ping it from the other then I see the echo requests coming in.
if we run a simple java multicast listener tool that our web developers provided then it can see the multicast requests from the machine that it is on.
But the negatives also...
tcpdumping we see no other traffic on that address at all, even whilst seeing this java tool printing periodic multicast summaries "tcpdump -vn -i eth3 ether multicast" still shows nothing, where eth3 is the nic on which the route for 224.0.0.0/4 lives. There is no routing / IGMP involved, all traffic is on a single vlan, upon which unicast traffic is flowing fine.
we see no other traffic on the java tool, only the local machine.
It may be relevant that these are CentOS VM's running on a series of ESX 4.1 systems. We intend, come the morning, to install systems on the same ESX machine and see if, when using the vSwitch alone, we see communications there, but as the levels of multicast specific knowledge is really pretty low compared to normal unicast knowledge, we're kinda stumped. There appears to be nothing relevant to do on either the cisco switches or the esx level networking from what we are aware of, yet the multicast discovery is just not happening.
If anyone can give pointers to useful tools for shooting this, if not outright pointers for resolution, it'd be really appreciated.
I've used tcpdunp a fair bit but don't find it very easy so I use
ngrep
which supports regex too.ngrep -t -d eth3 '' broadcast
andngrep -t -d eth3 '' multicast
should both put the interface into promiscuous mode and achieve the same thing.I'm not suggesting that this is much different from what you're trying with tcpdump but in case for some reason it's misbehaving then ngrep is another tool on top of the Java tool to try.
Sadly it wasn't a network problem at all, it turned out our developers had put a TTL of 0 in the config files for ehcache and only ever bothered to test their code on a single system. Changing it to 1 lets the traffic reach the whole IP subnet. Job done. Thanks