People keep telling me that in order to improve an SQL server's performance, buy the fastest hard disks possible with RAID 5, etc.
So I was thinking, instead of spending all the money for RAID 5 and super-duper fast hard disks (which isn't cheap by the way), why not just get tonnes of RAM? We know that an SQL server loads the database into memory. Memory is wayyyy faster than any hard disks.
Why not stuff in like 100 GB of RAM on a server? Then just use a regular SCSI hard disk with RAID 1. Wouldn't that be a lot cheaper and faster?
Your analysis is fine -- to a point -- in that it absolutely will make things faster. You still have to account for a couple of other issues though:
Not everyone can afford enough memory; when you have multiple terabytes of data, you have to put it on disk some time. If you don't have much data, anything is fast enough.
Write performance for your database is still going to be constrained by the disks, so that you can keep the promise that the data was actually stored.
If you have a small data set, or don't need to persist it on disk, there is nothing wrong with your idea. Tools like VoltDB are working to reduce the overheads that older assumptions in RDBMS implementations made which constrain pure in-memory performance.
(As an aside, people telling you to use RAID-5 for database performance are probably not great folks to listen to on the subject, since it is almost never the best choice - it has good read performance, but bad write performance, and writes are almost always the production constraint - because you can throw RAM into caching to solve most read-side performance issues.)
Short version: consider the working set size. Long version: How big is your data? If it can fit in memory of a modern server, yes, you're absolutely right. Unfortunately, the biggest Xeon can address 2TB of RAM right now, and that's not that big of a dataset any more. If you can't buy machine big enough to house your entire working set in RAM, you're forced to solve problems with your brain, not your wallet.
If you want speed:
Follow those steps, and SQL Server will fly.
Then if you want, add more RAM... but do the above first, and you may well find you are done.
In http://www.tbray.org/ongoing/When/200x/2006/05/24/On-Grids . Note that was six years ago. Yes, we have database systems that try (and try hard) to keep the entire dataset in RAM and rather shard to multiple machines than to use the disk because disk is magnitudes slower anyways. You need to write out the dataset to disk but as in the motto above, that's more akin to a background backup task than an online operation. Durability is achieved through append only logs with these databases (I am thinking MongoDB and Redis but there are tons more).
This question is similar to a basic one that has led to a lot of research and development in database architectures over the past 5-10 years. Now that it is feasible to store an entire database in RAM for many use cases, the database needs to be designed around working in RAM, rather than simply applying older inherited architectures to RAM-based storage.
Just as many smaller and more special-purpose languages have been widely adopted in recent years, we are entering an era more special-purpose databases will be needed.
For some further reading on this topic, I recommend the academic paper The End of an Architectural Era (It’s Time for a Complete Rewrite). It's not a difficult read.
It's unclear if this question was specifically about SQL Server. The original poster should clarify this.
Daniel Pittman wrote:
Reducing the overheads from older assumptions in RDBMS implementations was exactly the design goal of VoltDB, but it does scale horizontally with no architectural limit on the data size, and it can persist to disk for full durability using snapshotting and command-logging.
If you can get a server with enough RAM to hold, at least, the hot part of your dataset, you'll be fine. Also, RAID 1 and 5 are not the fastest way to arrange your data - RAID 0 is faster, but, then, you'll have to consider the higher odds of a filesystem failure that wipes out your database - not a nice thing to happen. You can RAID 1 or RAID 5 your RAID 0 array, provided you have enough drives and controllers.
You can even play with replication here - do your writes to a disk-heavy server which replicates to one or more memory-heavy servers where you run complicated queries.
Sadly, RDBMSs appear to be in the big-iron realm - they aren't that easy to grow horizontally.
This is a case of "it depends what you are doing." Perhaps the "right" advice is to avoid SQL altogether and use memcache/redis/etc!
I agree with you that extra RAM will help a lot, especially if you are able to read the whole working set into RAM. Yes it will still have to write data, but if you have mostly reads then the writes will have no contention for disk I/O.
However disk performance is often a bottleneck on SQL servers and harder than other things like RAM to upgrade later (if you have a server that's not fully populated with DIMMs).
There were a number of comments about RAID5 being slow, but I would say this is not always the case, so be careful before making sweeping statements. Really high end servers with fast RAID cards and lots of BBWC sometimes go much faster in RAID5 (or RAID50 with >4 disks) than they do in RAID10...
Over the years I've personally experienced slow RAID5 arrays, but after benchmarking a DL360 G5 with 4 146G SAS disks in ~2009, we had to double check our tests. Indeed, the array went faster with RAID5 than RAID10 in nearly every test. BBWC and fast parity calculations allowed the server could use the 4 disks much more effectively as a RAID5 array than RAID10. Some of the tests showed 50% better throughput with RAID5, and nearly none were slower. The tests that were slower were only 5-10% off.
I would caution the people who make blanket statements that RAID5 is slow, everybody says it online, but it is simply not true in every case.
You have a mix bag of candy to select from and really depends on what the flavor you want is.
Simply put invest in the knowledge (free) before forking out cash. 1. Learn the configs for your database and look at your current config to optimize. 2. Look at the programming and sql statements, unit test with simple scripts that mimic the operations involved, it may not even be what you think is the issue. IF the simple scripts take up time using SQL Joins, split it up and do the same thing with a programmed loop to do the same. This is were memory can help 3. Look at the hosting plan and server. Use ps aux in a linux console and see if there is something sucking up your memory and processor.
The absolutes Hard Drive improves speed but is not up to you in a virtual server space. Memory does not improve speed unless you config the services for it, period. Striped RAID (0,5), RPM and Synchronous Read/Write with a fast bus helps that. A core processor with good l1,l2,l3 cache will help processing bottleneck. can I hear it for Xeon!
Overall, you must keep size and scalability in mind. While you may seem to begin with small storage needs, your data will grow very quickly and exponentially. DB's are best using atomic data, which are data broken down to the smallest possible size. Because of the small size, it travels faster within the data warehouse. Then, you also factor in the DB structure. In the future, you could be linking to outside DB's, which is why structure is also crucial. In this scenario, it would make little difference for your query if half of the data lives outside of your data mart. When data is queried, the point is not to keep stored data on the RAM; rather, the query should be quick in accessing and returning data.