Our NetApp filer shows huge amounts of lag for snapvault operations. What could be possible causes of such large snapshot lag?
Snapvault is ON.
Source Destination State Lag Status
home:/vol/home/ vault1:/vol/home_backup/ Source 1602:06:04 Idle
home:/vol/h_root/- vault1:/vol/home_backup/h_root Source 1578:07:40 Idle
Listing of the .snapshot directory reveals an mtime of Dec 2, even on directories where the live filesystem has had modified files since that date.
drwx--x--x 13 user staff 8192 Dec 2 17:35 Feb-01
A snapvault status -l
reveals the following information:
Snapvault is ON.
Source: home:/vol/home/
Destination: vault1:/vol/home_backup/
Status: Idle
Progress: -
State: Source
Lag: 1602:15:24
Mirror Timestamp: Fri Dec 2 00:01:56 PST 2011
Base Snapshot: sv.7
Current Transfer Type: -
Current Transfer Error: could not read from socket
Contents: -
Last Transfer Type: -
Last Transfer Size: 28618316 KB
Last Transfer Duration: 04:02:46
Last Transfer From: -
Source: home:/vol/h_root/-
Destination: vault1:/vol/home_backup/h_root
Status: Idle
Progress: -
State: Source
Lag: 1578:17:00
Mirror Timestamp: Sat Dec 3 00:00:20 PST 2011
Base Snapshot: sv.7
Current Transfer Type: -
Current Transfer Error: could not read from socket
Contents: -
Last Transfer Type: -
Last Transfer Size: 1488 KB
Last Transfer Duration: 00:00:04
Last Transfer From: -
On the destination end, we have this output for snapvault status
:
Source Destination State Lag Status
home:/vol/home/ vault1:/vol/home_backup/ Snapvaulted 1602:24:40 Quiescing
home:/vol/h_root/- vault1:/vol/home_backup/h_root Snapvaulted 1578:26:16 Quiescing
Both source and destination filers are running NetApp Release 8.0.2.
The destination status says that it's quiescing, meaning that it's trying to stop the transfer cleanly... It has clearly been like this for quite a while, unless you recently tried to quiesce the transfer.
First I would try to create a new transfer with dummy volumes and see if you can get a new one to transfer correctly, that'll help troubleshoot some... Assuming that works I would break the snapvault transfers and then try to reinitialize them one at a time.
You may also want to check to see if you have any throttling set up, as sometimes the throttle config can be confusing and you may have accidentally set it much lower than expected.
Did anything in your network change that may prevent the source from talking to the destination? I've run into a bug with Cisco code that caused our snapmirror traffic to show up as asynchronous routing where our firewall kept dropping the packets erroneously.
Is the CPU high on the system? a 3050 is a pretty old system, it's possible that you're overloading it and it's unable to keep up with the snapvault overhead while serving user data. You would only see this if you're up in the high 90s consistently, it's not a very common issue.