My organization has a Juniper SSG20-WLAN that routes our traffic to the outside world. We've been having intermittent problems with our internet connection so I wrote up a Python script to ping the internal interface of the router, the external interface, a couple of our internal servers, the ISP router our router talks to, their upstream provider, and Google and Yahoo for good measure. It does that about every minute.
What I have found is that when our internet goes out, our Juniper router ceases responding to pings on the external interface. Everything past that is, of course, unreachable. The internal interface and our internal servers continue to echo back without interruption.
None of the counters indicate dropped packets of any type. They all look normal. The logs complain about VIP servers being unavailable but otherwise nothing indicative of network issues.
My questions are these:
- Does this exonerate our ISP? Or, contrawise, might a problem with the connection be causing the external interface to go down?
- Is there somewhere else in the SSG20, beside the system log and counters, that might help me track down info on the problem?
UPDATE: Turned out that one of the switches between my monitoring box and the router was a router itself, and occasionally diverting from the gateway to itself. Kudos to those who made suggestions along those lines. Not really sure which answer to mark as accepted, as it was really stuff in the comments that turned out to be right.
Thanks for the suggestions.
I work for an ISP, and what I can tell you about the routers we provide for our T1s is that when the internet connection goes down, it renders the WAN and LAN interface null to pings. We don't use Junipers, but this is the case on the Cisco 1841s, samsung ubigate 1000s and netopias that we use. It has to do with the way the IP is provided and the way the routed block is provided through the WAN IP that make them unreachable without a connection to our core routers.
How often do the drops occur? Any pattern that can be determined (time, traffic load, etc.)? Did this situation manifest itself after some period of things working correctly in the past? What type of media is your WAN interface (ethernet, T1 WIC, etc.)?
If it does happen to be ethernet, then you might check to see if it is set for auto-negotiate. If so, so you might try "hard coding" the line settings, just in case it is an auto-negotiation issue - which occurs often enough.
If it is a T1 interface, then you should start by going through the T1 stats/counters - looking for resets, FECN (forward explicit congestion notification), BECN (backward explicit congestion notification), etc. High counts with these counters may indicate issues with the carrier (need to reset LMI, LMI setting/line encoding issues, etc.).