When trying to choose a SATA magnetic hard drive (not SSD) for high performance for both random and linear access, which should be the primary factor?
For example: would a 10k RPM drive with 16MB of cache perform better than a 7200RPM drive with 32MB of cache
The short answer is yes. Your total hard drive latency is the [seek latency] + [rotation latency].
The 10K RPM drive will have a smaller rotational latency due to its faster spinning and will also be able to read data off of the drive faster. What the higher cache will do is for writes. A cache is similar to a buffer. When it reads data from the disk it will store recently accessed data and data within the near vicinity for quicker access. This is called temporal and spatial locality. The higher cache will be useful if your access pattern is such that you read the same file a lot or the data is stored near each other.
Wikipedia has a decent page on disk caches.
Depends on the likelihood of cache hits. If you have a small amount (8/16/32MB) of data across your disk that you're always reading from and writing to then you'll get a very high cache hit % and so the bigger the cache the better. Of course your OS may be able to cache much more than that, and using faster memory too. If the likelihood of high cache hits is low, i.e. your data set is larger many times larger than your disk cache, then I'd go for as low a random seek time as possible given the data set size.
Either way just get a mirrored pair of Velociraptors if you need 270GB or less or a pair of Seagate Barracuda 7200.11's if you need more. We could dance around it all day but these will sort you out :)
That's a very difficult question to answer, and will be affected by other factors such a NCQ, command queue support.
I think a rule of thumb is that for lots of small accesses, random I/O, go for rpm. For linear access go for cache.
Ultimately it depends on the data you're using. Cache improves performance of things that get accessed again and again and again, like files on a webserver, for example. It also improves write performance, since data only needs to be written to the cache, which then gets spooled to disk when the disk is available (or the cache is full). Higher RPMs improve "random" access (seek time, for instance), which is what a database needs.
I would go with a higher RPM, all things being equal.
Don't forget that areal density is also a factor in performance. All else being equal, you will realize faster data transfer rates as the areal density of a drive platter increases. Rotational speeds are primarily related to access times, but once the data is located, the areal density becomes a factor in the throughput of reading the data.
So, also go big on capacity. Big capacity drives tend to be the ones with higher areal density per platter.
It depends on your data and how it is used. If you have data that is not re-used very often I think you're gonna be better off with a fast spin, since the cache only really helps if it gets hits. If your data is changing a lot then the cache won't buy you much relief.
I'm not aware that cache size is documented as having any significant impact on performance, and when you compare the few dozen megs the drive is caching to the gig or more cached by the OS it's easy to see why.
Rotational speed, as other answers say, has a huge impact.
I would say, it mainly depends on what you want to do with your HDD. If you use it as a boot, then go for the cache! If you use it to store, go for the RPM! If you're in the middle, it's more complicated and more of a feeling thing. But I think the main question people should ask themselves is: will you boot from it or not ? If yes, then your question is answered ^^