(Preface: This isn't precisely a "Server, Network or infrastructure" question, but it's even further away from the preferred topics on SF's sister sites, so I'll take a small risk and post it here)
One of my clients wants to put a Google custom search on their web site. The goal is to restrict the custom search to return results from only five specific web sites. We don't want advertisements displayed next to the results so we're paying for the business edition. At the last page of the process there is this somewhat cryptic question: "Approximately how many web pages will you need to search?"
At first blush, this question seems to be self explanatory. I assumed that it meant approximately how many combined pages are there on the five target web sites. But either I've over analyzed the grammar or the question is a little ambiguous (I suspect a little of both) because I'm unsure of how to answer this question. I'd like to know more about what this question is probing for because it's the difference between $100 and $2,000 USD. The options listed are:
- Up to 1,000 web pages One year is just $100
- Up to 5,000 web pages One year is $250
- Up to 25,000 web pages One year is $750
- Up to 100,000 web pages One year is $2,000
Oddly, I've Googled that phrase and there isn't a single thing concerning it. There's no FAQ page, forum question (except one that I started) or other manual that I could read. I'd gladly RTM... that is, if I could find it. Any help would be greatly appreciated.
Bonus question: If the question means what I first thought it did, how on earth am I supposed to estimate the approximate number of pages on a web site? I know that several of them are rather huge and I'm at a total loss to figure that one out.
EDIT: Unless I can find definitive information otherwise I'll accept Zypher's answers. We decided to forge ahead using the free Google Custom Search which inserts ads into the search results. We'll see if the ads are poor and return results from our competitors or not. If it becomes a problem then we could always pull the plug in an instant and fork out the cash for the ad-free search... or buy a Google mini. =)
To answer the bonus question -- if the site's already being indexed by Google, then use the search term "
site:host.domain
" adjusting "host.domain
" as appropriate. Looking at the top right ("Results 1-10 of about xxx from host.domain
") will give you a page count -- but it'd include more than just web pages.If you want just "web pages" add "
filetype:html
" to the search query.It seems to me you have it exactly right. Although another phrasing would be "How many individual pages does google have to index to provide searching across all of your content"
As far as how to figure out how many pages there are on those sites, your best option is probably to ask your client. Otherwise the tools here seem like a good starting place.