We have a web server that is in our DMZ serving up a ASP.NET web application. The application has been live for around 2 months and working great but we are getting periodic emails from users saying that they cannot access the site because they get a timeout or link broken or page not found etc...
My first thought is that it is something on their end as we have had people test the site from literally around the world with no issues and there have been no known downtimes. My problem is I don't want to just tell the user, "Problem is on your end, figure it out." I would like to have a way to prove it to them or maybe some steps for them to prove it to themselves.
Any suggestions for myself or the user's with issues?
Edit: To clarify a bit more, the problem is with the same users over and over (like 5 total so far) and they can't access the site at all. So it's not a page specific issue. Some great answers so far I wish I could mark more than one as the answer as they are all good.
Thanks for the quick turn around as well. It was faster to ask and get an answer here than contacting my server/networking group in house :)
You could use this site to verify your site is up from 'outside'
If the user is more savvy and willing to do a little extra work for you, have them load the 'YSlow' plug-in into Firefox and then visit your site. That will help identify performance bottlenecks, broken links etc.
There are many services that will test your site from multiple locations, and give you a report of response times from each. One such site is mon.itor.us
You also might want to check your server settings and logs. Perhaps you are maxing out your allowed number of connections, or bandwidth, memory or CPU capacity?
Use a remote monitoring service from multiple locations. Website Pulse is a good one that's cheap and easy to setup yourself. It can give you an idea how your site performs from different parts of the world and on different networks.
They can send you emails and SNMP traps to let you know when your site is slow or not answering.
I ran into this problem at my last company. I would remote into my machine at home and check the website. Ran into one problem doing it this way, I was on the same ISP as our servers, so it wasn't a true check from the outside, just outside our firewall.
Now I use downforeveryoneorjustme.com in addition to my remote to home to check.
Also if your company has multiple sites you can use Nagios or even powershell to check web sites and alert you if they go down. Just make sure to use your public facing address in the check.
I have used Pingdom with great success. You can create checks that test DNS, ICMP, HTTP connectivity as well as HTTP Get and Post methods to ensure that your site is returning a valid response and that scripting/forms are working properly.
Reasonable pricing and the checks originate from either 5 or 30 locations (Basic or Business account, respectively). They also track response times from the locations so you get a sense of how your site performs from around the world.
If it's an intermittent problem then it could be caused by transient network issues anywhere along the way, or server performance issues. The number of possible reasons are HUGE! You need to eliminate the server as the cause of the problem. Check event OS logs, IIS error logs, etc. try asking someone to contact you immediately as the problem happens. Ask them to do a tracert or pathping to the server to diagnose network issues. And check the server for high load while the issue is happening.
For a more conclusive answer, we would need more information.
Of course there is a "normal" amount of downtime for websites... you could try monitoring the site yourself from outside and checking it's uptime. As long as it's above your 4 or 5 nines according to your SLA, that's sometimes all that matters.
If you already have people from around the world using the site you should point the users having the issue to something like http://downforeveryoneorjustme.com/. This is to prove to them that whatever the issue is, it's not you. You should also have set up some sort of internal monitoring that actually loads the webpages. Monitoring a website is not simply getting a 200 OK message. You need a monitoring solution that actually looks for something on the correct page. If there is a backend connection of some sort (SQL DB, ADAM authorization) the monitoring solution should be able to load a page using that as well.
Proving it one way or another is difficult as internet routing can be intersting, as can DNS aching.
Remember it may be neither you or them that has a problem it may be a bit of the internet between you. Like your ISP's ISP's DNS. Or their ISP's ISP's ISP's routing. Or your webserver's prefix might be blackholed by someone else's AS advertising too wide a netmask (it happened t google!)
To definitively know you need to to nslookup test of your site from their location, ideally starting at the root nameservers e.g. a.root-servers.net. and working down. You want to try all of the DNS servers in the path between the root and your webserver's hostname's authrititive DNS serveer as it occaisionally happens that one of the authoritive servers will be fubar but other OK, so it works for say you, but not for them.
Assuming DNS is OK then they need to see if packets can get to you. I.e. a ping in the first instance. And you need to be sure your packets can get back. Ping them back. Assuming all is well then you probably want to get them to do a wget, curl or telnet to your webserver and do a GET by hand (to eliminate browser caches). Then its probably reasonable to say your sit is reachable - assuming it is. And if not, then you will have a reasonable idea where the problem is.
As you can see this is a non trivial matter.
As other have suggested some commercial servers can help as can locating your own servers in multiple locations on multiple, networks and doing checks on each other.
If you need a fairly polished monitoring service, I'd recommend looking at something along the lines of Keynote or Gomez.
You need an external service to check it if you host the web server, or you could house the monitoring service yourself if the web server is external.
In any case, well except perhaps for your case exactly, but anyhow - normally you need to do more than just check for a response...
Configure checks for specific content in the responses, so you for instance won't allow a successful response that is just an error page or some other site failure slip through as a green light - walking around thinking all is fine when it's really not ^^
Monitoring any service isn't as easy as it sounds, and is why really useful suits are often extremely complex and/or expensive. Of course, the simple methods may cover the basics, especially perhaps in this specific case.