Using Analog 6 for web stats, and I'm surprised to see over a million 404's over 54 days. Am I looking at this correctly? Is this an unusual ratio of 404's to "200 OK" page views? I don't see any 404's in the list of actual URLs; where would a list of the broken URLs be? The site is a combination of html, WordPress and asp pages on unix/apache, if that matters.
Requests Status Codes
6548392 200 OK
807 206 Partial content
1830136 301 Document moved permanently
61795 302 Document found elsewhere
3091342 304 Not modified since last retrieval
3042 400 Bad request
49012 403 Access forbidden
1043694 404 Document not found
2936 500 Internal server error
411 503 Service temporarily unavailable
General stats:
Successful requests: 9,640,541
Average successful requests per day: 183,490
Successful requests for pages: 1,620,543
Failed requests: 1,099,095 (20,066)
The list of broken URLs would be in the actual log files. Right now it appears like ~15% of the requests to your system are 404. That does seem unusually high.
If I was to guess I would bet that your page template included a link to a broken image, javascript, or css file.
A quick grep of the log files will probably reveal most of the details.
I agree that is a rather high amount of 404s, but it might be automated bots trying to exploit known holes in software.
Granted it's not quite the same, but I have tens of thousands of 404's a month on our web server, and analysing the URL's it just looks like some bot trying known SQL injections to hundreds of different products (none of which we have installed).
It's a mammoth initial task, but exclude the exploit URLs from your preferred way of finding genuine 404's and it gets much more accurate.
If you can't get access to the raw logs as already suggested, consider running a crawl over your site to find broken links. See W3C's Link Checker, specifying Check linked documents recursively, recursion depth as make sense.