The databases that I am interested in are: SQL Server 2008, MySQL, and PostgreSQL 9.0.
In general, I am wondering which of those would "scale-up" the best. I have read that PostgreSQL used to scale-up better than MySQL, but that the difference has shrunk with newer versions of MySQL.
In addition to general information, I am also looking for advice for my specific situation:
I have a 64-bit SQL Server 2008 R2 Developer Edition database with 20 years of stock data and 2 years of options data. The hardware is Intel i7 Extreme with 6 cores, 12 GB RAM, 64-bit Windows 7.
The database is fairly large and intense calulations, such as moving averages, can take as long as 7 minutes. Furthermore, there's considerable disk activity even after the query is compete. I'm doing all the calculations in stored procedures right now.
My system is too slow and I am trying to improve its performance and efficiency. I am currently improving my data model and tweaking the software setup. Any suggestions?
Also, when should someone consider using MySQL Cluster? (Since I'm asking, I'm sure the answer is, "Not yours!")
Too little memory.
And, most important - like most people not knowing really about databases - you talk a lot about cores and RAM (and Win 7 - get rid of it and install a Windows Server, please), but totally ignore the one thing most important for database performance: DISCS. How many discs do you run?
As example, I run a Futures database - and my SQL Server has a 6 Velociraptor set JUST for the data, and 2 more discs for tempdb and logs. This is on a SAS infrascturcture with a hardware RAID controller. And I am not sure I like the IO performance ;)
Get discs - or better SSD - to give you a powerfull disc subsystem.
You're eventually going to hit a brick wall in terms of performance if you rely on stored procedures for large datasets. If you need faster response times, you probably want to look at offloading these calculations from the DBMS.
EDIT:
I made the incorrect assumption that you were talking about some sort of transaction system where it is difficult to make changes to the data model. I work at the big enterprise where doing that to is very difficult. You can also "offload" calculations by doing them once, ahead of time.
Before you do anything, you should study the query plans very carefully and understand which queries are using the most resources and why. Think about what you're actually doing -- with the example of computing moving averages, consider that you're referencing historical data that doesn't change. If you need to plot the 52-week moving average of IBM from 1982-1992... why compute it on demand? Do it in advance! Storage capacity is generally cheap -- IOPS and CPU is generally expensive.
Another thing that I've seen happen is that folks over-normalize (because that is taught as the "right thing to do). Sometimes, especially for static data, de-normalizing improves performance by eliminating joins.