And yet one more SharePoint question from MCM yesterday.
What kind of HA do you provision for your SharePoint search databases? I heard from one admin of a very, very large SharePoint farm that they don't have a redundant copy of the search database and they repopulate it if it gets lost. The problem is that to recrawl their 43 million items takes 8 weeks.
I'm not a SharePoint expert, but I'm guessing that without a fully functional search database, functionality is going to be degraded. Does 8 weeks sound right for this? That seems astronomically slow. What's your experience?
Thanks!
I work with a system that does about 112 million files. A full crawl takes about 3 weeks.
I would say this is up to how you setup crawlers for the farm. Multiple index servers would be best for this to help spread the load out.
My best advice is to put all the db's on the same HA cluster if it is that important to you. However, search databases are, by design, meant to be regenerated.
Will functionality be degraded? only search in that your queries will only return against data that has been crawled. Content will be fine still.
The problem with the MOSS Search database is that its very tightly coupled with the index file that physically resides in the file system of the farm's Index server; I believe transactions are sync'ed down to the millisecond. So if you lose your Search database your only option (unless you have a specialized SharePoint DR tool) is to rebuild your index and start over with a new Search database, because your index file will be out of sync with the restored database and become corrupt.
The latest version of Microsoft Data Protection Manager 2007 is able to back up the search index and database, but you have to run a special script to enable that functionality. I'm not sure if tools from other vendors are able to do it, I think that several are but can't remember off the top of my head. The only way to restore your index if you're using SQL backups or SharePoint's out of the box backup/restore tools is to rebuild it from scratch.
The previous answer is spot on about managing the size of the search corpus with multiple indexes, although it does add some additional overhead and complexity to the farm. An additional index server will need to be build and it can be a challenge to effectively manage user queries so they hit the correct index and/or merge results.
The recovery story with the SSP DB which hosts BDC, Search, User Profiles is really bad because part of it is in SQL Server and part of it is in files on the servers. It is a really poor architecture. If everything were in SQL Server then recovery would be doable. But because part of the Search Indexes are on file system recovery becomes a nightmare.
We actually had to recover the SSP already. To do this we attached the content database to a new farm and pulled out information for User Profiles (we do not use BDC).
Then we rebuilt the search index by recrawling content. It is a pain and means the SLA for search recovery is poor, but we felt this was the most reliable solution.