I have a large multi user system, NFS mounting a large piece of shared storage. I can identify the source of all my network traffic - narrowing it down to a particular host - thanks to nfsstat
.
However I'm still having a little difficulty tracing which of the users is doing this - there's a few hundred, and there aren't any obvious culprits in the process list. (I usually start with looking for instances of find
)
But we do definitely have 2.5k IOPS coming from it, and it's causing resource issues on the host.
Is anyone able to offer me suggestions to figure out culprit processes/users?
Box is a RedHat Linux, talking to a NetApp filer.
Assuming that there is indeed a culprit (a single user) responsible for the majority of the 2.5k IOPS:
I'd start with
top
- at that rate you should see one or a few users and processes standing out in the active processes on the box, mostly sleeping but quite often in ready state as well - I'd pressi
(to hide inactive processes) then press and hold the space bar for very fast updates.I'd single out the users showing up more often and only display their processes in
top
for a more stable view.If you see many processes from the same user (for example a complex build performed on NFS) - check the process tree for that user to confirm
pstree -ps <user>
. It may be difficult to prove that such process collection is the cause other than by starting/stopping it and watching for correlations in activity changes on the netapp side.If the culprit is a single process I'd expect it would be a steady presence in the
top
output. In addition tofind
I'd also look for:But it could also be something completely custom, you won't find it "by name".
It's also possible for the rate to be the collective effect of so many users (is the NFS server holding their homedirs and/or shared project partitions?) - not much you can do about it - maybe time to scale up the NFS storage?
Maybe a good hint is to find how many opened NFS files a user have (from the client side).
I would use
lsof -N
. This one liner might help you:There is a great tool wireshark . With it's terminal version, tshark, you can find client and uid:
You will get output like:
with timestamp, client ip, nfs operation and uid. Collect data for some minutes and then you can use your favorite tools to find top users: