I'm interested in how well Active Directory would fare as the authentication backend for a website, scaled for ~1 million users. Do you have experience with AD in web environments of this scale, and if so what level of hardware would we need?
[Update] Regarding frequency of login: I agree that this is a key factor, but we don't have that information yet. Assume a regular commerce/banking site setup: login via a form once, carry your identity in a session (ie. no authentication calls to AD on pages other than login page).
The AD will not store a significant amount of user information beyond what's needed for authentication.
How busy are you expecting the website to be: Assume a normal commerce/banking site. No further information on this.
Will this AD be partitioned: It can be, although simplest architecture is preferred.
Will this AD be serving anything else: No.
How complicated will your OU structure be
Will you be extending the schema: The standard schema will be used. The OU structure will be fairly simple.
Will you be performing many searches on it: only to lookup username / email for a subsequent bind.
Will you be storing a lot of information against the User Objects: No
- Will exchange be involved with this AD: No
Could you? Yes. Should you? No.
First, scaling load - 1M users with an average of 1 login per second is a LOT different than 1M users with an average of 100-1000 logins per second.
Just some general thoughts on this though - While technically it could, I don't know that Active Directory would be the ideal vehicle to store 1M users all within one domain. If you were to use this for your web application and started having performance issues, it would be pretty difficult to troubleshoot. Personally, for something supporting 1M users, it really needs to be something more dedicated to that particular task.
If this is the benchmark you need to hit and you really want to use AD, you probably need to get Microsoft involved to make sure your architecture is absolutely correct and get your load/performance testing in place at minimum.
The amount of "other things" that Active Directory does and introduces (layers, replication, extensions, security concerns of the accounts being on your "production" network domain) when you just need it for an authentication database is, IMHO, not appropriate for the amount of users and the relative simplicity required. Way too overkill and complex.
Generally, you should use AD - LDS (ADAM) for application users. There isn't a licensing fee, and the users in LDS can't be used as security credentials for the servers themselves. This is a good thing. If your user directory is compromised then your operational directory is still functional.
YOU SHOULD use AD for managing the machines. Make sure there aren't local accounts, use group policy to restrict security settings. (I think most people would be surprised how tight this can be.)
These include:
Truth is...if this is a new system. LDS would be a great place to put your users. It has great password policy, password complexity, blah blah blah...You should really be looking at using SAML or OpenID...If you have a mix of users that federate and users that don't you should still code against the claims model and abstract out authentication provider specific code.
You can use AD for this. But I would recommend you go with the ADAM (or "AD LDS" it is called now). It should give you many of the benefits of AD, i.e. pre-existing techie knowledge and stuff like FRS for replication. If the AD benefits for this are low on your "pros" list, then you should really consider a different LDAP package, like Siteminder, although that will require pulling together more bits of tech in order to build a scalable system.
The biggest performance issue you will need to look at is concurrent logon requests, as many posters have pointed out. The easiest way around this is to build your DCs on 64-bit hardware and make sure that your DC has enough RAM to store the entire .dit file. This will significantly improve your performance, as it wil entirely eliminate paging while processing LDAP queries. You could go with 32-bit hardware of your .dit file is less than 1.5GB, but why bother?
Also, if you are looking for some type of High-Availability, be aware that the replication and the site-awareness in AD is not really designed to provide this on the level you might need for a commercial application. You will need to be aware of the limitations of locating a DC and write your applications to use the Windows API to correctly handle offline/unavailable DCs. I see this issue a lot, where an app dev just points their LDAP auth package at fqdn.ad.domain, but that address i just a simple round-robin and won't be updated if you take a DC offline.
I don't want to take this into a direction you already considered or rejected for other reasons, but what will AD give you that something like RADIUS won't? You could probably easily set up a scalable RADIUS server and I think I've seen some that will use a database like MySQL for a backend, allowing you to easily scale it and replicate it without the added features that AD would give you that you sounded like you weren't going to use. RADIUS was kind of made just for authentication tasks...but please feel free to correct me in the comments...
We used to use it for dialup users at some businesses I worked in many moons ago, didn't try it with web authentication.
This note claims that even old Windows Server 2000 was performing 2,376 LDAP-based full-tree searches per second on a 5 million-object directory. And they had quite simple hardware for their tests.
Anyway I think that AD is the best solution for reliable authentication because it is very scalable (you can have domain controllers as much as you need and where you need), secure. It was designed for authentication and accounts' management and now its quite mature in its evolution.
I don't know about licensing for sure but I think you don't need have CALs for users if you use AD for authentication only. But I think you need to query MSFS for your specific scenario.
The number of users is fairly irrelevant in comparison to the number of users you'll be serving at the same time. If you had a million users, but you only had one login a day you're not going to need a lot of hardware. It also depends on how you're doing the authentication. If you're using HTTP auth, you'll probably have to do a lookup every request. If, however, you use a HTML form, you can do the lookup on a login request and return a cookie for the following requests, meaning you do much less authentication lookups.
You'd want to look at how well AD scales out; how many servers can you have serving requests before the replication becomes a problem, either maintainence, or producing diminishing returns adding more.
Also, can you add caching to speed up your lookups. This would be much more important for HTTP auth setup.
Don't forget the licensing. You should get an Windows Server external connector, as 1M CALs would be very costly.
Microsoft's Licensing Lady helps explain what an external user is.
There is a lot of detail missing from your question:
Lots of information we realisticly need before we can make a judgement call on hardware requirements.
I don't know if the article on Scaling Active Directory with Commerce Server contains any generic AD scaling hints and tips. It may be worth a quick browse?
If this is purely to authenticate external users -- why even bother with AD?