I wonder what the common guidelines for large-scale Sharepoint 2007 Architecture is? Mostly, I'd like to plan how many Site Collections are needed, and why I should split my Intranet into multiple Site Collections instead of just having one to rule them all?
The question that derives from that is what the disadvantages of different Site collections are. From what I see, they can not talk to each other (i.e. the CQWP can not read data from another Site Collection) and I have to deploy my features on every Site Collection.
Are there any good resources and white papers on larger-scale Sharepoint Architecture? I found some stuff which tells me how many servers, but I'm looking for something one level higher, talking about Application Pools, Content Databases and Site Collections.
It depends on what you're planning to achieve.
If you're going to have a small farm with small quantity of users, then this is something you don't have to take in consideration. However, when you're deploying a really large farm you should pay attetion to software boundaries.
As you can see in this Technet Article Plan for Software Boundaries, SharePoint degrades performance exponentially when certain things are not taken into account.
I would suggest you pay attetion to these performance issues, rather than site collections. Site Collections have their pro's and con's, however when you're working with large farms (100GB+), Microsoft suggests to use a single site collection and quotas for content databases. Hope this helps.
-Máximo
Among other things, site collections provide an administrative and security boundary. Site collection administrators for site collection A don't have any rights by default on site collection B. You can specify those administrators independently. Also, SharePoint groups exist at the site collection level. So every SharePoint group you create is available for use in all of the sites in a site collection.
For our main intranet that our IT group administers, we've found that one site collection works pretty well. IT controls the security and delegates control of the content. However, for department- or branch-level portals, we've found that creating a site collection with a quota allows our us to provide our users with a collaboration portal without requiring too much IT time investment or worrying that they're going to mess up another group's data. The downside of doing that is that users either have to be trained or just tossed in the deep end and told to figure it out.
The flip side of separating the administration of the site collections that we've seen is that sometimes it would be more convenient to have one point of administration. However, we've been able to combat that with some PowerShell commands and scripts. Gary LaPointe's stsadm extensions and PowerShell cmdlets are very helpful.
I'm new to SharePoint administration, and so is my organization. We're struggling with this issue right now, so take this answer with a grain of salt.
The chief concern with keeping everything in a single site collection is that you can't break the collection across multiple databases. Once a single content database gets too large, you'll begin to see performance issues, not to mention all the headache that comes along with backing up and restoring big databases.
Various resources will tell you different stories about how big you can grow a particular database before this becomes a problem, but the largest guideline I've heard is 100GB. That's SharePoint guru Joel Oleson's recommendation. So as a rule, if you think your content may one day grow above 100GB, break into seperate site collections now. Moving content into a new site collection after the fact is by all accounts very painful.
This slide deck from Joel Oleson hits the high point of capacity planning and farm architecture.
I just saw that TechNet has an updated Book available for download.
Size is important for performance, but don't forget how long it will take to back up and restore a broken site collection.