I have a Windows 2003 file server that is massively busy. Tens of millions of files come and go on the server every day. I'm looking for statistics to gather to help me size and configure the Windows 2008 R2 replacement for this machine. I have a good handle on the basics (memory, cpu, network), but I'm particularly concerned about file serving, file movement etc. Any guidance?
added detail -- The files are mostly internal (to the server) movement driven by scripts, though there is a SQL server dropping and picking up files remotely. I have good handle on metrics once the files are served to outside clients (which is a very small percentage). We push the limits of path depth (dozens of subfolders), number of files in a directory (not uncommon to find 50k+) and total number of files (50M+ at rest, 100k+ in motion).
Not a lot has changed between 2003 and 2008 in the way of simple file-serving. The one key difference is SMB2 for the Vista/Win7 clients out there. SMB2 offers different file access semantics and is supposed to be more efficient. It still doesn't fix the serialization SMB forces, so client programs that rely on parallel access will continue to perform as they had with Server 2003.
For my money, sufficient memory to cache the working-set of open files is the biggest performance gain for sizing a file-server. One way to get that is to take a snapshot of the open files, stat them for file-size and then sum the list for the total file-open size. Do it at various times during the day to get a feel for the ebb and flow of open files.
If your file consumers don't keep files open, and instead copy them local and then write back, cache-sizing is harder to predict. Files that are hit a lot should fit in cache if posssible, but if your accesses are simply too random for that then other metrics should be used. Such as monitoring the 15/30/60 minute outbound network traffic sizes, this is a proxy for the size of data read from your server and sizing cache for one of these values would be a good approximation of a good value for you.