Hardware: Sunfire v20z, dual Opteron 1Ghz, 2GB RAM, 73GB 10k rpm scsi.
Is there a "reasonable" threshold I can use in my monitoring software? It's currently at 500 warning, 1000 critical.
What number of interrupts/sec would be better to use? I know this depends on a lot of things, but hoping there's a ballpark value so I'm not randomly picking numbers from a hat...
There is no limit. It depends on the interrupt coalescing, number of requests, threads, processes, kernel type, clock type, kernel configuration. That is why it is used too as a seed for the random numbers generator.
On my Fiber Channel QLogic cards (SAN), I got my best bandwidth when I was hitting about 2,000 interrupts per second. Each card had two ports on it so each interrupt would get hit 2,000 times per second.
In my readings, interrupt coalescing is on a per-driver basis and each driver can do it completely different. For example, here are instructions on configuring the e1000 (Intel PRO/1000) network card driver.
If a particular driver doesn't allow coalescing, then you need to compute the ideal rate as a function of work units in a fixed amount of time. Mircea Vutcovici gives the outline in his comments. Consider an 8 Gb/sec card (assuming one port).
Now, each device and controller has a maximum number of Input/Output operations per second that they can sustain. That should be directly proportional, if not equal, to the number of interrupts per second, if my guess is right.
So, how many interrupts can you handle? Well, I would find the optimum buffer size for the communication you are performing on your device (disk in this case) and determine where the interrupt level when the device peaks out. Then, anything above or near that means that someone is really abusing the device. Since you are using an internal controller, anything goes. You will have to use empirical analysis to make a guess.
This means that your thresholds are tied to a) your controller, b) your disks, and c) your CPU (higher frequency equals the ability to get work done despite high interrupt rates).
For posterity, a lot of devices have interrupts. The ones most likely to be a bottleneck are I/O related, specifically: storage, network (not just TCP), video, audio (some).