We have a web application that we host that is customized for multiple customers. Each customer accesses their version of the site via their own custom subdomain. We are running into an issue where ONE customer makes a claim that there is an issue with their site -- site unavailable, certain functions timing out, or generically just 'slow'.
Our support representatives get a report/email/call from the customer when this happens -- and try to reproduce the issues. More times than not, they cannot. If the issue still persists, they end up engaging our IT group, and setting up a troubleshooting call with the customer's IT group.
The customers are usually on their own corporate networks, although their sites can be accessed via internet as well. Once our support rep CANNOT reproduce the issue AND no other customers are reporting issues, we assume the issues are not systemic and have a high probability of being within the customer's network.
Since we cannot ping/traceroute to/from the clients making the request and see all hops, it turns into pointing the finger at each other with seemingly no way to PROVE where the issue is.
- What can we do to provide EVIDENCE that our service and network are not the issue?
- What is the best way to show that from our webserver to our ISP (in both directions) is OK?
- Are there any recommended tools we can put in place to monitor #2?
There are many a service (some, even free) that will "monitor" your site from various parts of the world, report metrics and response times and the like. Personally, I would utilize one or two of these so I could point the customer at them. (The ones that come to my mind immediately are pingdom, uptrends and loadimpact, but this is not my specialty, so I'm sure there are better, more specific services out there that would fit your use-case better.)
In any event, the point of these tools and services is to get you to a point where you can credibly say "Look, it's working and pretty zippy for the rest of the world, none of our other customers have issues, and these worldwide results look to pretty firmly suggest it's an issue specific to your setup."
No, it's not evidence, but if they're not going to let you host a box on their network to monitor with, it's about as good as you'll probably do. You could also try talking reason to them (though that sounds like a waste of time) - how much data is involved in a Google search, versus a hosted service like you provide - of course Google and Bing are going to seem fast... they seem fast on a 56k modem. Doesn't mean anything about the state of the network utilization because so little data is used.
At the end of the day, bear in mind that not all customers are worth having. If they're costing you more money in support troubleshooting their network bullshit than they generate as customers, it's time to tell them to take their business elsewhere.
I recommend asking the customer to install TeamViewer and politely request them to share the TeamViewer key, so that you can view what's going on, and (under customer's supervision) check all relevant settings from 'their' point of view.
Also remind them to uninstall TeamViewer or generate new TeamViewer keys for their security.
Edited to add: I've seen this happen firsthand in my own network. But this is happening with my network as the 'customer' side; apparently, there was a mistake with proxy.pac causing parts of the website being accessed to be using "DIRECT" instead of the corporate proxy, so that part of the website always fail.