I have an 8-core linux-based server dedicated to serving NFS to 80 linux clients that run batch jobs. The clients have 400 cores in total, and so are generally running 400 single-core batch jobs concurrently.
Occasionally, lots of batch jobs try to do I/O at the same time, and exhaust the number of nfsd threads on the server, of which there are currently 80. The batch job gets an I/O error (such as Permission Denied) and bails out.
I'd like to increase the number of nfsd threads, but want to know:
- What rules of thumb are there for setting the number of threads in this situation?
- What drawbacks are there to setting it too high?
References
- This NFS tuning guide from Sun suggests some rules of thumb for Solaris, but gives no rationale for those particular numbers, so I'm not sure how they apply to my Linux server.
- This other one gives an approach to this type of tuning, but is highly subjective.
In an ideal world, your batch jobs would have some backoff logic and you'd stick to 80 threads.
I'm by no means an expert in NFSd, but the rules of linux threading that apply to all Linux applications should apply. The rule here is that each thread takes a certain amount of space in memory, realistically, this memory amount is so small on an average production server (With double digits of RAM) that it's pretty much non-consequential, the more pressing concern is the way in which threads are implemented in applications like NFSd - Semaphores. Counting semaphores are an excellent way to ensure no locking conditions occur in a threaded situation, the problem is that semaphores keep track on threads and increment and decrement a counter to reflect 'free' vs. 'locked' threads, in order to do this, they must index available threads and check that against locked threads to provision execution time appropriately, this is done in a semi-efficient manner that grows exponentially, if you're NFSd requires very high amounts of speed, you'll notice an increase in computation time approximately equivalent to double the execution time to register a new thread, luckily, this is such a small lookup time value (one instruction) to begin with (Called the base if you remember Algebra :) , that you can have very large exponents without any major problems.
The Too-Long; Didn't Read summation - If I were you I'd limit the number of threads to your expected number of concurrent hosts maximum, but I'd also do some testing to ensure that execution time is sane with your expected values. I'm aware thats probably not a whole lot of help to you, but its very difficult to analyse appropriate configuration without expected use scenarios.
Also, on a side note, if you extrapolate Sun's numbers, a 2.2 GHZ processor should be able to run somewhere in the realms of 800 threads without problems, even if these numbers are essentially arbitrary, it gives me the feeling that you'll be fine with my prior suggestion
Don't use NFS. NFS is great for minor file access, but crumbles under any kind of load. Have you investigated some of the other technologies like AFS or Hadoop?