Today I was trying to execute a long-running (but highly optimized) query on a server in our company. I've run the same query on the same data on my home computer (5400rpm harddisk) and it took about 40min. The server is about 10 times slower, maybe still running! The server does about 30 inserts/second, my home computer 300 inserts/second.
As the CPU load is very low (3%), memory is plenty and network is not used as the query is executed locally I suspect the disk I/O to be the bottleneck. At my home computer instead (which got better CPU power) two cores were nearly fully loaded.
The relevant disk configuration is as follows:
Windows 2003
SEAGATE ST3500631NS (7200 rpm, 500 GB)
LSI MegaRAID based RAID 5
4 disks, 1 hot spare
Write Through
- No read-ahead
- Direct Cache Mode
- Harddisk-Cache-Mode: off
Surely, RAID5 may not be the fastest, but the disks compensate by being faster as at home.
Some further interesting data:
- Avg. Queue Length: 30
- Avg. Queue Length (Read) : 2
Avg. Queue Length (Write): 26
Bytes/s read: 1.3MB/s
Bytes/s write: 1.2MB/s
Sec/Read: 0.007
Sec/Write: 0.500
Writes per Second: 36
How can I resolve this write bottleneck? Is it because of the many small writes? Is there some malfunctioning driver or a possibility to cumulate some transactions for bigger writes?
The seconds per write are very large!
Thanks!
The write-through cache could be a performance hit. Every write to the cache must be written to the disk straight away. A write-back cache can wait for the optimal time to write the data to disk. Depending on the hardware, this may require an battery-backup (on the array controller), and some controllers disable write caching if the battery is absent or needs replacing. You may want to verify if the array cache is in fact enabled. Some controllers don't make this very obvious.
I would suspect the RAID5 to be the main culprit. Every block write to your array could result in a reads followed by two writes - at best (if the extra block is already read into cache) it will be two writes. On top of that, if your transaction logs and data files are on the same array each write the database performs will result in two writes (one to the log and once to the data file) - while this will be the case on your home system too so it won't make much odds in your comparison on its own, it will compound any performance difference that there is otherwise.
Your home drive probably has its cache turned on, which will help by allowing it to reorder writes slightly to improve performance by reducing head movement. Turning on read-ahead may help you by reducing head contention with the write operations but that depends on the exact I/O pattern your query imposes.
RAID10 is usually recommended for databases rather than 5 for write performance reasons.
When you say "the same data" - is one system running a backed up and restored database from the other? If there is a difference in the history of the two databases it could be that the slower one needs to have its index stats updated (unexpected query speed problems can be a result of bas stats causing the query planner to chose a less optimal path than it otherwise would).
Also, do you have the same number of CPUs/cores on both environments? Some time ago I saw SQL Server 2000 take longer to run a disk-bottlenecked query slower when using two CPUs than when it was told just to use the one, I presume because it tried to split the load over two CPUs but this resulted in increasing I/O contention due to two threads reading together from different places on the disk (I've not seen this more recently, so it may have been a bug in that edition+SP that is long since fixed, but it might be worth looking in to - if you are giving explicit index hints that help in a low-core environment try removing them in case the query planner could make choices better suited to the extra cores, or try manually reoptimising the hints yourself).
I had serious IO issue on our BI datamart. I found that someone resitted one of HDD configured in RAID 5 and rebuilding parity-according to HP support; I still don't know what started rebuilding.
It caused so much trouble, but once rebuilding is finished, I have the performance back. This may not be most insightful answer but it is one of the consideration you should make.