We're in the process of building a new SQL server, and would like some recommendations on hardware. Currently, we build all systems in house. With the information below, what would you recommend for drives and an external SAS cage. Also, in this workload, how beneficial would SAS be vs. SATA2?
Basic specs so far:
- AMD Phenom II X4 955 CPU
- 8GB RAM
- 2TB RAID6 storage (thinking 450GB Seagate Cheetah 15K.6 drives)
- 1TB RAID1 storage (why below) (1TB Seagate Barracuda ES.2)
- 3ware 9690SA SAS Controller
Software:
- Windows 2003 or 2008 server (haven't yet decided)
- SQL Server 2005 or 2008 (will depend on application compatibility)
Workload:
This server is fairly read/write heavy for most applications (about 1TB) (financials, internal apps, etc). We also have GIS data on the server that is used by ArcGIS. Our plan is to have our vector data on the main RAID6 array with the raster data (about 750GB) being on the RAID1 array (since it's use isn't as frequent).
I would suggest RAID10 for heavy database work that isn't mainly read-only, not RAID5 or 6. On a 4-drive RAID6 array each block write (or partial block write) could be turned into a read (to get the other data block) followed by three writes (one for the initial write, two for the two parity blocks) which can have a significant write performance impact. With RAID10 each (partial) block write is just two writes to disk.
If your database access did include very few writes then RAID6 may be preferable for redundancy as a RAID6 array can survive any two drives failing where-as a RAID10 array of four drive will only survive 4 of the 6 possible combinations of two failed drives, but you state that you expect the activity to be both read and write intensive (and you have good backup and disaster recovery plans, right?).
Obviously make sure you go for a 64 bit edition of Windows and SQL Server, otherwise you'll not be able to use all that RAM without performance draining hacks like PAE. While we are on memory: I would suggest more if you can. RAM is not expensive these days, even good know-brand ECC capable RAM, so going up from 8 to how-ever-much-you-can-fit-on-the-board isn't a bad idea. For most applications in SQL server (with big enough databases) you will notice the benefit under any appreciable load as it will greatly reduce I/O operations for read queries as a larger working set can be held in memory. From your description it sounds like you have a data size and activity pattern that would benefit from how ever much RAM you can practically throw at it. Many new servers support 16Gb of RAM these days, and 32Gb is not uncommon (and increasingly neither is support for 64Gb, though that is getting into the "specialist" market so you may have to pay a premium for the extra).
If your database is at all write intensive, use RAID 10, not RAID 6. With really fast, really great SAS disks.
Buy a bigger chassis than you need so you can grow in to it. I recently added a quad core to my production database server- and I can tell you, for once in my career I was really glad I bought the dual socket motherboard even though I only needed four cores to start. Having CPU utilization go from 60% average with long spikes at 100% to 30% average with only rare 100% spikes had a huge impact on our performance. Not having to completely migrate from one machine to another to get this was truly awesome - popping another chip in the extra socket took about 20 minutes. As for RAM; put in as much as the machine can take.
As a note, our production system uses SAS, our development system uses SATA. We can definitely feel and quantify the difference; queries that might need 1.5 seconds on our loaded production machine might take 3 seconds on our non-loaded development server. This is definitely anecdotal, of course, and there are other differences; but we've noticed that IO is definitely the killer. 1.5 seconds is not a big deal, right? Except in production that's a 1.5 second difference * x users * y requests per hour.
One more thought on IO: we were careful to set up our biggest most often used tables to be on filegroups with multiple files in them. One of the performance enhancements of this is that SQL will thread requests to each file in the filegroup - so if BigOverUsedTable is on FileGroup1 and FileGroup1 has four files in it and your DB has 8 cores, it will actually use four cores to do "select big number crunching nasty query from BigOverUsedTable" - whereas otherwise, it will only use one CPU. We got this idea from this MSDN article:
http://msdn.microsoft.com/en-us/library/ms944351.aspx
From TFA:
"Filegroups use parallel threads to improve data access. When a table is accessed sequentially, the system creates a separate thread for each file in parallel. When the system performs a table scan for a table in a filegroup with four files, it uses four separate threads to read the data in parallel. In general, using multiple files on separate disks improves performance. Too many files in a filegroup can cause too many parallel threads and create bottlenecks."
We have four files in our filegroup on an 8 core machine due to this advice. It's working out well.
SAS is a huge advantage over SATA2 for random access operations. I would avoid SATA at all costs. You mention building in house, have you looked at rack systems like this one?
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021
It would give you 20 SAS drive bays in a single rackmount unit. Add your motherboard and SAS controllers of choice.