TL;DR in bold
Hello, I have set up a new DFS Namespace and Replication for a software and user share from an older 2012R2 server to a newer 2019 server that we wish to migrate to. I've left the servers running for several days now but there does not appear to be any progress in the replication.
Running dfsrdiag backlog
shows "No Backlog member x is in sync with partner y" but that's clearly not the case. Running a DFS replication report shows that no files or folders have been copied and the drives on the new server have not changed size since the initial 753MB consumed when the replication was set up.
In the DFS Replication event log on the old server, I have a rolling log of approximately 25000 events per minute, all indicating event 4308: "The DFS Replication service has successfully recovered from sharing violations encountered on a file." It appears to be triggering the event for every file in the share one by one.
At one point it reached a folder which is corrupted and just spammed the name of the folder continuously. I stopped and restarted the DFS service and it seems to restart the whole spam process on files I've already seen go by. I've scheduled a chkdsk /f
on reboot to handle the corruption this evening. However, it would seem logical to me that the system could replicate the non-corrupted files in the meantime.
Why am I getting so many 4308 events in the log? It's rolling over the 15MB log basically every minute and there's nothing in the logging other than 4308 events. I don't want to make the log larger or force it to archive as that's just going to fill up my system with more useless events. Meanwhile, I'm seeing all these recoveries and no initial errors. There's no visible progress whatsoever.
More details (probably not relevant): the old server is in a Hyper-V Failover cluster using iSCSI cluster disks for the drives. The shares are hosted from direct iSCSI mounts inside the server. The new server is VMware with drives provided from vSAN (virtual datastore replicated across local hosts). Both servers have Windows Backup set up, the old server backup takes about 18 hours and leaves only 6 hours a day for replication (Windows Backup appears to pause DFS-Replication). No progress is made during those 6 hours however, so I do not think the backup is the main culprit. I have done dfsrdiag pollad
for both machines which changes nothing. The recipient mostly sits there doing thing in the logs except occasionally complaining daily about windows backup taking the replication offline for ~30 minutes. The old server does the above spam.
TL;DR: looks like the corruption on the drives was causing the issue.
After performing the
chkdsk /f
on reboot on the drives, I still saw errors with chkdsk. After performingchkdsk /f
with a force dismount and then confirming there were no more errors withchkdsk /f
andchkdsk /scan
, the DFS-R started working. It took a while to read files on disk (I presume hashing for the DB?) then it started staging files and I'm starting to see data flow to the destination server.