My developer hypothesis is packet is getting dropped at arp layer. We increased the default value of neigh.default.gc_thresh3 1024 --> 2048 and now everything looks good. But I want to understand is there is a way to figure it out arp packet loss.I try to search for systemtap/tools but didn't find anything. Any help is really appreciated.
net.ipv4.neigh.default.gc_thresh3=<n>
There is several articles about ARP table overflow. This article have good explanation. Also, you could check this bug.
Both articles say that you should see error
neighbour: arp_cache: neighbor table overflow!
in yourdmesg
output, as it was mentioned in comment by user188737.