I use Postgresql 8.4 and I wanted to know what type of RAID is mostly used for databases. I read everywhere that RAID10 is the best suited and RAID5 not a good option. ex: http://www.revsys.com/writings/postgresql-performance.html
My server is a Dell Poweredge 2950. Dell support told me that they dont have a lot of customers using RAID10. Most of times they use RAID5 + raid controller cache ON.
What do you think about that? What RAID level do you use for your database servers?
Thanks !
Pol,
I'd probably go with RAID10 if you have a lot of writing done to your database.
Here's a good real world case to consider... replacing a file server that has only a single 1TB HDD with a new server with four 1TB hard drives. The decision was made to go with RAID5.
Performance was NOTICIBLY slower on the new machine. It was discovered later that the server handled A LOT more writes than originally thought. The write penalty from RAID5 was pretty bad.
The correct decision in this was to go with RAID10.
The best answer depends on the profile your database fits more closely: OLTP or OLAP. To oversimplify, does your database do more writes or more reads? A database doing more writes than reads will typically perform better on raid10 than raid5 (assuming the same number of drives are used). As Karl mentioned, if your dataset (or drives) are large stay away from R5 and use R6 if R10 isn't your selection.
It's workload dependent and you must benchmark.
In theory, RAID 5 is slow for small writes. A big cache can mitigate this tremendously (or even make it identical in performance to RAID 1+0 for a given setup).
Often RAID 1+0 is politically difficult as the extra space is seen as wasted. Try to explain that having empty space on a RAID5 that could be used to gain performance is a different form of wasting space.
If you can't benchmark and you have the disk space already, always go with RAID 1+0. If you can't benchmark and you are forced to go to RAID5 or RAID6, make sure you clearly state in writing that RAID6 and RAID5 have the biggest performance risk for write-intensive loads. Make sure the insistence that you set it up in a manner that creates the biggest performance risk without testing first is also put in writing.
When you benchmark, make it clear to the application group doing the benchmarking how RAID5 works, the fact that the controller has cache that can mitigate the small write penalty, and that the benchmark needs to be on a data set that is identically sized to production.
Do all this in writing. Storage configuration mistakes are the most common blame game in many environments. Basically you're not going to be allowed to benchmark and you're going to have to guess, so make sure that you have the evidence that you made a best guess, or that you were told to do something contrary to best practices.
Please find here a performance and scaling reports, about HP Proliant DL380 G5.
the tests are based on various file systems (jfs, xfs, reiserfs, ext2 and ext3).
Honestly, in my experience, I don't see a difference in RAID5 over RAID10 from a data redundancy perspective. However, RAID10 offers a performance boost due to the fact that the RAID is striped. RAID 10 is basically a RAID0 and RAID1. Two RAID0 sets are mirrored (RAID1). We use RAID5 over RAID10. But we don't require the performance gain from RAID10. It just depends on your environment and what you are working with. I would recommend trying a RAID5. If it seems that performance is slower move to RAID10 and see if the performance gain helps. If you have small to medium workloads, RAID5 should do the trick.
For best performance use SSDs, then R10, then 1 or 0, then 5, then 6.
RAID5 is used because it's easier to setup and think about than RAID10. You don't require an even number of disks and more people are familiar with it.
In the past, we have always done RAID5 (Dell PowerEdge 2650-2950), but in our latest machine (running MS-SQL, not PostgreSQL) I tested both RAID10 and RAID5. I found that for our workload, RAID10 gave us a moderate performance increase (~10%).
If you have the time, I would suggest setting the server up both ways and running normal DB tasks (backups and restores, whatever jobs or reports you might do).
SSDs don't always provide best performance. They read the data in ordered blocks, which may not be what the database needs.
When deciding what RAID and filesystem to use for a database, the important questions are:
If the dataset is large, i.e. > 10 TB, you will want avoid RAID5. You could lose a second drive while rebuilding the array, resulting in a total loss of data. RAID6 and RAID10 are good profiles, but be aware of the restrictions with RAID10. Also: Pick a journaled filesystem that doesn't require a regular fsck-ing; in other words, avoid ext3 and go with something like xfs... or better yet, go Solaris and use zfs. Do you have any idea how long it takes to fsck a 10tb volume?
If you have more money, you can buy an external cabinet and might get some additional speed boosts since the machine won't be trying to do system activity (logging, etc.) while reading from the dataset. You can also get a better RAID controller with more RAM and higher throughput, or faster disks. Basically, you get what you pay for.
If the data is read-heavy, you can stick to Raid10. If your data is balanced read/write or write-heavy, you'd probably better stick with raid6.