I handle much of the IT for a company of around 100 people, spread across about five sites worldwide. We're using Active Directory for authentication, mostly served to Linux (CentOS 5) systems via LDAP.
We've been suffering through a spate of events where the IP tunnel between the two major sites goes down and the secondary domain controller at one site can't contact the primary domain controller at the other. It seems that the secondary domain controller starts denying user authentication within minutes of losing connectivity to the primary.
How do we make the secondary domain controller more resilient to downtime? Is there a way for it to cache the entire directory and/or at least keep enough information locally to survive a multi-hour disconnection?
(We're all in a single organizational unit if that makes any difference.)
(The servers here are Windows Server 2003R2; don't assume that we set this up correctly. I'm a software engineer, not an IT specialist.)
I'm thinking that Universal Group Membership caching is what you need to look into.
http://technet.microsoft.com/en-us/library/cc816797(WS.10).aspx
http://www.windowsnetworking.com/kbase/WindowsTips/Windows2003/AdminTips/ActiveDirectory/Whentouseandnotuseuniversalgroupmembershipcaching.html
You only have a 100 users. So your active directory is tiny. Just ensure that your sites are split up accordingly in sites and services and make all your remote branches Global Catalogs. The will be no additional overhead (by todays standards) on disk space or CPU usage or bandwidth. If the line goes down the users will still be able to authenticate themselves on the domain.
This sounds like a good use-case for Active Directory Sites. Sites are how Active Directory provides network awareness into the structure of the tree. There are several answers here for how to deal with Sites, but here is a quick summary.
Your Domain Controllers are all equal(1), unlike WinNT there is no Primary/Backup domain controller. Active Directory uses IP subnet to determine which Site you belong to, which means you have to enter it somewhere. Which Site a DC is in is determined by its IP subnet. If DCs are in different Sites, then there is a Replication Policy built between the sites that determines how often replication happens between them. By default, this policy states that replication happens every 4 hours, and is easily changed.
Sites are how you allow your network to tolerate extended network outages in your WAN links. Generally speaking if you have a WAN link, and the office on the other end of that WAN link can't go down during the outage, then that office needs both a Site and a DC. Thanks to being in a different Site that DC will be able to tolerate up to a 4 hour outage without even noticing it happened(2).
If you're willing to update your DC's to 2008, you have another option in the form of Read Only DC's.
(1) Unless you're running Server 2008 or better, when Read Only Domain Controllers are introduced. But you're not there yet.
(2) There are some things that are replicated immediately, like password changes, but by design they're small things.