I've been trying to figure out this problem for several weeks now, we are using an Exadata that runs nightly RMAN jobs to backup the database to a NFS mount. When these jobs occur, the average latency on our Fiber SAN goes through the roof. However, these are only a few thousand iops, 90% of the time, the rest of are infrastructure is running around 20k iops, so it's a drop in the bucket, and latency does not jump even during spikes. When I run tests against the same NFS server with dd operations, there is no increase in SAN latency.
We are running a Sparc Blade for the NFS server that has 8Gb fibre connections to a AMS SAN. The storage presented to that server is on SATA, but the latency is affecting our VMWARE, and Oracle systems that are on Fibre drives, and on a different controller.
I'm running out of ideas, has anyone else ever seen anything like this?
update 12/17
After doing some research, it looks like the mount options on the exadata are set to 32k transfer sizes for reads and writes. I'm working with the DB team to use some sane transfer sizes, however Oracle recommends 32k...
update 12/31
It was the NFS mount sizes, we upped them to a meg each, and also dropped the RMAN channels two 2 instead of 32( I have no idea what the dba's were thinking)
0 Answers