I'd like to do some analysis of our NFS server to help track down potential bottlenecks in our applications. The server is running SUSE Enterprise Linux 10.
The kind of things I'm looking to know are:
- Which files are being accessed by which clients
- Read/write throughput on a per-client basis
- Overhead imposed by other RPC calls
- Time spent waiting on other NFS requests, or disk I/O, to service a client
I already know about the statistics available in /proc/net/rpc/nfsd
and in fact I wrote a blog post describing them in depth. What I'm looking for is a way to dig deeper and help understand what factors are contributing to the performance seen by a particular client. I want to analyze the role the NFS server plays in the performance of an application on our cluster so that I can think of ways to best optimize it.
Just an idea, try to sniff nfs traffic with wireshark. Might tell you which user accessed what file:
collectl (especially its NFS subsystem) is a very nice utility which might be useful for your analysis but it does not match your requirements list. I'm not aware of any Linux utility that does.
(Please let me add this off-topic note: There is software which matches your requirements: Sun's DTrace-based Analytics (pdf) - but unfortunately is not available on Linux. You'll find lots of great examples in Brendan Gregg's blog which illustrate the capabilities of this tool.)
I have to say of all the different *stat utilities available to one, nfsstat is by far the worst! It gives you the ability to look at a bunch of counters but that's all. If you look at them twice, YOU have to do the work of trying to figure out by how much each counter changed and if you want to know the rate of change you then need to divide by the number of seconds between samples. In all fairness, nfsstat does date back many years when things were still pretty crude and is now hampered by nobody wanting to change the output format because it would probably break a lot of things.
As for using collectl to monitor nfs, it does provide nfsstat output in a much easier to read format, but what's even better you can let it run for hours or days and play back the data you had collected in the background. As for the request to see what processes are doing, collectl can also gather process data including how much I/O each process is doing and even play it back showing the top I/O users. You can also use the top feature in real time.
If you want to watch the disks theme selves collectl can do that too and display everything in a coordinated display.
Check it out... -mark
I don't have better answers at the moment, however you can follow disk IO quite precisely with
It gives very useful figures, particularly the average queue size and wait time (in ms) for your IOs. It shows quite readily if your disks are a bottleneck, and if the bottleneck is IO count or throughput.
Then with
You'll see the client connections and the bytes transferred from each client in real time. loop on it for continuous data. It would be quite easy to make a script that provide continuous data... I'm working on it :)
Now to get IO per process, you can use the excellent iotop. You still have to find a way to match nfsd processes with the clients, though.
As to which files are being accessed by which client, I'm stuck. Actually files currently read/written from a NFS client don't even appear in lsof output.
Just to expand on the netstat, use watch -d to see how things change & sort by host
Check out nfsstat. It doesn't show everything you want but at least a good subset.
http://linux.die.net/man/8/nfsstat
In my opinion this exactly highlights the problem with today's tools. Here we're mentioned at least 3 including nfsstat, iostat and iotop. Then there was passing mention of wireshare and nfsreplay. Does this really sound like a normal way to do things? Other than wireshark with is a category all its own, wouldn't you prefer 1 tool?
For openers, while I find the output of iostat very useful, it's too hard to read with all those .00 in the numbers. Collectl reports the exact same data but formatted much easier on the eyes. You already know what I think of nfsstat and since collectl can play back any data there's no need for a 'replay' utility. As for 'iotop', collect can also show processes sorted by anything included I/O.
So there you have it all in too, complete with timestamps. If you need a finer monitoring interval you can alway crank back the sampling to 0.1 or 0.5 seconds or anything in between, though you will generate more overhead if you monitor processes this fast, but would with any process monitoring utility.
AND the final bonus is anything you collect with collectl you can load into a spreadsheet and easily plot OR use colplot which is part of collectl-utils.
-mark
You might want to try
nfswatch
from http://nfswatch.sourceforge.netYou can see some sample output at http://prefetch.net/blog/index.php/2009/06/16/monitoring-nfs-operations-with-nfswatch/
nfswatch
is kind of liketop
(though I'm not sure if there's a batch mode). Once it's running you can change change the display by hitting a key (e.g. "c" to display NFS clients using your NFS server).In my brief testing, however,
nfswatch
doesn't seem to work with NFSv4.You might want to check out nfsreplay. It might help you figure out what is happening. Also you might find the information and links here useful