We have always had problems with DFS, but recently it has gotten worse with no apparent cause and it's becoming harmful. We have one master server and DFS connections to other four servers. The four severs don't modify any files, so all replications always propagate from the master to the four other servers. The replicated directory has about 900,000 files. In recent weeks, every time we check the DFS backlogs have hundreds of thousand of files. For instance, at the moment, the master server replicating about 700,000 files to three of the four servers while the fourth one is fine. Sometimes, only one is off, sometimes two and this time three. Also, it is never the same set of servers. It is inconceivable that something periodically touches all 900,000 files. The biggest change which happens is a scheduled update of several thousand files every six hours.
Does anybody have the same problem? Is it a known issue?
Update: (This is also an answer to some of the questions raised by Jeff Miles). The problem again happened few hours ago. I setup some probes in the morning and monitored the servers during the day, and at a seemingly random time, three backlogs ballooned to 3 million changes (which is more than the total number of files) within a minute. Nothing interesting in the DFS Event Log. Even no "started initial replication". Only a couple of "DFS connection lost or unresponsive" errors, but they happened about 10 minutes after the fact. Most likely because something choked on the huge backlogs. More importantly, the fourth server is fine. This indicates that the 3 million changes are most likely bogus. Also, I can't imagine anything changing that many files within such a short interval. Regarding the technical setup; it is a combination of Win2003R2 and Win2008R2. Could it be a problem?