SnapOverflow

SnapOverflow Logo SnapOverflow Logo

SnapOverflow Navigation

  • Home
  • Server
  • Ubuntu

Mobile menu

Close
  • Home
  • System Administrators
    • Hot Questions
    • New Questions
    • Tags
  • Ubuntu
    • Hot Questions
    • New Questions
    • Tags
  • Help
Home / server / Questions

Questions[dfs-r](server)

Martin Hope
Jeff Sacksteder
Asked: 2014-06-10 12:49:00 +0800 CST

Does DFSR replicate shadow copies?

  • 10

There a a few questions related to using DFSR and Shadow Copies together, but none that indicates if Shadow Copies replicate or not. Meaning, if I have a a pair of DFS replicas with Shadow Copies on Server-A, can I revert that file to a previous version on Server-B? If so, will that reversion be replicated back to Server-A?

I suspect not- that VSS is a local NTFS feature and outside the scope of replication, but I cannot verify that myself at the moment.

dfs-r
  • 2 Answers
  • 5398 Views
Martin Hope
Rob Nicholson
Asked: 2014-03-05 05:14:34 +0800 CST

Force DFS-R into a dirty state

  • 5

We use Windows Server 2012 as our file system running DFS-R between our two sites as part of our business continuity systems. Last week, DFS-R failed at one site requiring the file server to be rebooted. The same thing has just happened at the other site causing several hours of downtime whilst we tried to resolve - although now we know to simply reboot which isn't nice.

The DFS-R service is currently disabled whilst we diagnose the root cause (timeout errors in ESENT) but I'd like to bring it back online overnight.

I'd like to be able to force the same code that runs when a dirty shutdown occurs, i.e. check the database when the service is restarted. I know this takes many hours but I'd prefer that than bringing up a service that might instantly fail again.

Is this possible?

dfs-r
  • 1 Answers
  • 318 Views
Martin Hope
Emmaly
Asked: 2012-05-23 17:55:33 +0800 CST

How to monitor DFSR backlog more efficiently than dfsrdiag

  • 6

Is there a way to monitor the DFSR backlog in a manner more efficient than using dfsrdiag.exe backlog?

I wrote a program that just slurps in the backlog count via dfsrdiag.exe backlog /smem:alpha /rmem:beta /rgname:domain\namespace\foldername /rfname:foldername with five minute intervals. Each time it runs, it takes quite a while (between 2 to 5 minutes) to get the resulting value. That means that in the end, it runs for a few minutes to collect the info and then delays for five minutes. It feels like it is probably expensive in some fashion in order to get this info. It also returns the top 100 files in the backlog. I really only want the backlog count alone and don't care about the files themselves. This is being used to create historical graphs.

Info for these DFSR peers: Windows 2008 R2 on four servers, three distant offices connected via 50-100Mb Internet connections, 30 replication groups, several replication groups are very large in file total size (1-2TB each) though most are small (500MB-10GB).

dfs-r replication wmi windows-server-2008-r2
  • 2 Answers
  • 40104 Views
Martin Hope
Emmaly
Asked: 2012-05-12 19:07:55 +0800 CST

Windows DFSR - Changed replicated directory permissions and now have a 350,000 backlog for more than a week

  • 11

Question: Is there a way to make this 350,000 file backlog complete faster? For nearly every file the only change was a change to the ACL for each affected file. Some files have changed content, but that is not the common case in this situation.

This might be fixed. I'll edit this text to confirm success/fail after a period of time and verification. Toward the end of this question text I have detailed the changes made recently that might have fixed it.

We have a DFSR replication group with about 450,000 files and takes up 1.5TB of space. In this situation, there are two Windows Server 2008 R2 servers that are about 500 miles apart. There are other servers, but they aren't involved in this replication group. Server ALPHA is the main server and is the one used by most of the staff. Server BETA is the server in the remote office and is less busy.

Here is a graph of backlog for this replication group (PNG hosted on Google Drive) showing the slow sync progress.

I needed to remove a permission entry that was in the root directory of that replication group, which of course was inherited across most of the subfolders. I made this change on server ALPHA. Right away after that, DFSR had a 350,000 file backlog. It has been more than a week and now it is at 267,000. The only thing that changed (initially) was the single permission change.

This is what happened (this is not the solution, just another explanation of what happened to cause this issue): http://blogs.technet.com/b/askds/archive/2012/04/14/saturday-mail-sack-because-it-turns-out-friday-night-was-alright-for-fighting.aspx#dfsr

Any changes that occur on server BETA are replicated to server ALPHA very quickly since there is no backlog in that direction. Any files changed on BETA do make it to ALPHA without trouble.

It's replicating 24/7 at full speed across a 50Mbps connection one end to a fiber 100Mbps on the other end. The staging area is 100GB on each server. There is nothing interesting in the event logs at all. There is an unrelated high watermark event that shows up for an unrelated replication group that is neither for this particular replication nor for this ALPHA/BETA server pair. In particular there are no event log entries for high watermark nor for connection errors.

ALPHA's view of the replication group:

Bandwidth Savings: 99.83% reduction (30.85 MB replicated instead of 18.1 GB)

I believe that the 30.85MB/18.1GB happened since I last restarted the DFSR service on ALPHA and BETA. If so, this shows that even though it is taking a very long time (longer than I believe it should take) it isn't actually transferring the file contents across the wire.

Replicated folder: 1.46TB (actual size), 439,387 (files), 52,886 (folders)

Conflict and Deleted folder: 100.00GB (configured size), 34.01GB (actual size), 19,620 (files), 2,393 (folders)

Staging folder: 200.00GB (configured size), 92.54GB (actual size)

I got one high watermark error in the logs (May 14, 7pm) and so have upped the staging quota to 200GB from 100GB. I know that the Microsoft-approved route is to increase by 20%, but I'm not playing around on this. We have plenty of disk space to spare on the staging disk arrays.

Disabling anti-virus on all servers did not help, though I thought it would have helped a little bit. For now I have re-enabled anti-virus but set the replication group's path to be excluded from scanning in order to remove that variable from the equation.

Is there a way to get this to go faster? I would just make this change on server BETA as well, but there are files that have changed on ALPHA but haven't replicated to BETA and by making the inherited permission change on BETA would push OLD files from BETA to ALPHA (because DFSR seems to ignore file timestamps when comparing which file is the winner in a collision). And having that happen would be rather bad.

The backlog is reducing slowly. Very, very slowly. It is going forward, though. But at this rate, it will be weeks before it finishes. I'm contemplating just shoving a copy of the data set onto a 3TB drive and shipping it to the remote office. Is there a better way?

May 16, 4am US PT: What might have fixed the problem (assuming it's honestly fixed, anyway):

I made multiple changes to the DCs that should have been made a long time ago. The problem is that this network was inherited from someone else who probably inherited it from someone else, etc. I can't promise which change fixed the problem. Here they are in no particular order:

  • All DCs were not in the "Domain Controllers" OU. I've never seen a Windows Domain that had their DCs elsewhere. I moved them back to where they belonged. They were previously in OUs that were segregated by the name of the city each office is in. (I have a feeling I've got some plumbing work to deal with now that I moved those, but all seems okay at present...)
  • AVG Anti-Virus is running on all DCs and DFSR-participating servers. I excluded the replicated folders and the staging folders from active/on-access scanning. I don't think this fixed the problem and I'm likely to test this issue later on to see if undoing that change will interfere with the replication speed of DFSR. That's a challenge for another day.
  • dcdiag.exe complained of a DNS issue with regard to RODCs. I remedied that problem even though we have no RODCs on the domain at all. I doubt this fixed anything.
  • One of the _ldap._tcp.domain.GUID._msdcs.DOMAIN.NET SRV records was missing for one of the DCs (not one of the DFSR servers) and I remedied that. I don't think this helped either.
  • One of the times I rebooted server BETA it complained of a bad shutdown of the DFSR database (event 2212) and it then proceeded to take hours to rebuild the database. When finished it reported event 2214 to let me know it finished. After that, replication was still running extremely slowly, but it might have helped unstick whatever was stuck.
  • One of the DCs didn't have 127.0.0.1 as a secondary DNS server in its interface configuration. I added it. This wasn't one of the DFSR servers, so that probably had nothing to do with it.
  • I followed the TechNet Blog: Tuning replication performance in DFSR recommended Registry settings for DFSR servers. I used all of the "tested high performance value" values except for AsyncIoMaxBufferSizeBytes was set to 4194304, which is one notch lower than the high value. This could have helped with the problem... or maybe not. It's difficult to tell when one changes too many variables.
  • dcdiag.exe complained about a problem with communicating with the RPC service on BETA, but only after already making the above changes. This seemed to be the most likely issue going on, but there was nothing I did to correct it. The VPN was running properly and the firewall wasn't blocking it. It's possible that one of the above items is what caused and then remedied the RPC issue or it could have been simple coincidence. I am not getting that error now and replication is running smoothly at present.

The moral of the story is: change one thing at a time or you'll never really know what fixed it. But I was desperate and was running out of time to fix it, so I just fired a bunch of bullets at the problem. If I ever pinpoint the fix, I'll report that here. Don't bank on me narrowing it down, though.

EDIT 5/21/2012: I solved this by driving for about seven hours with a spare server (GAMMA) to the remote office yesterday. GAMMA is now acting as their primary local server while their usual server (BETA) catches up on the replication. Since I put it into place, the servers have been going about double the replication speed. While this tells me it could be a VPN-related issue, I'm less inclined to believe that it is since all new updates seem to replicate to GAMMA from ALPHA have been very quick and going well.

EDIT 5/22/2012: It's at 12000 right now and should be finished in a few hours. I'll post a nice graph of the progress from slow start to fast finish. The problem is that the only thing that really actually "fixed" it is the local server connection. I'm presently thinking that maybe the VPN is part of the problem. And if that's the case, I feel that this question isn't quite answered yet. After I've had some more time to check out how things are replicating via the VPN and seeing any failures, I'll debug and report the progress.

If something changes I'll update here.

windows dfs-r replication windows-server-2008-r2
  • 3 Answers
  • 10216 Views
Martin Hope
TJF
Asked: 2012-03-15 18:02:16 +0800 CST

DFS-R alternatives?

  • 6

I was wondering what alternatives to DFS-R are out there on windows machines for real-time bidirectional file & folder replication? DFS-R requires active directory and a domain controller which I don't want utilize in this particular environment.

Thanks!

Tom

windows-server-2008 dfs-r replication
  • 2 Answers
  • 4846 Views

Sidebar

Stats

  • Questions 681965
  • Answers 980273
  • Best Answers 280204
  • Users 287326
  • Popular
  • Answers
  • Marko Smith

    Can you pass user/pass for HTTP Basic Authentication in URL parameters?

    • 5 Answers
  • Marko Smith

    Ping a Specific Port

    • 18 Answers
  • Marko Smith

    Check if port is open or closed on a Linux server?

    • 7 Answers
  • Marko Smith

    How to automate SSH login with password?

    • 10 Answers
  • Marko Smith

    How do I tell Git for Windows where to find my private RSA key?

    • 30 Answers
  • Marko Smith

    What's the default superuser username/password for postgres after a new install?

    • 5 Answers
  • Marko Smith

    What port does SFTP use?

    • 6 Answers
  • Marko Smith

    Command line to list users in a Windows Active Directory group?

    • 9 Answers
  • Marko Smith

    What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

    • 3 Answers
  • Marko Smith

    How to determine if a bash variable is empty?

    • 15 Answers
  • Martin Hope
    Davie Ping a Specific Port 2009-10-09 01:57:50 +0800 CST
  • Martin Hope
    Smudge Our security auditor is an idiot. How do I give him the information he wants? 2011-07-23 14:44:34 +0800 CST
  • Martin Hope
    kernel Can scp copy directories recursively? 2011-04-29 20:24:45 +0800 CST
  • Martin Hope
    Robert ssh returns "Bad owner or permissions on ~/.ssh/config" 2011-03-30 10:15:48 +0800 CST
  • Martin Hope
    Eonil How to automate SSH login with password? 2011-03-02 03:07:12 +0800 CST
  • Martin Hope
    gunwin How do I deal with a compromised server? 2011-01-03 13:31:27 +0800 CST
  • Martin Hope
    Tom Feiner How can I sort du -h output by size 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent How to determine if a bash variable is empty? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus How do you find what process is holding a file open in Windows? 2009-05-01 16:47:16 +0800 CST

Related Questions

Trending Tags

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • Home
  • Questions
    • Hot Questions
    • New Questions
  • Tags
  • Help

Footer

SnapOverflow

About Us

  • About Us
  • Contact Us

Legal Stuff

  • Privacy Policy

Help

© 2022 SOF-TR. All Rights Reserve