What sort of computer specs (CPU, RAM, disk speed) should I use for running queries on a database of 200+ million records? The queries are for a research project, so there is only one "user" and only one query will be running at a time.
I tried it on my own laptop with SQL Server with an i3 processor, 2GB RAM, 5400 RPM disk and a simple query didn't finish even after 8+ hours. I have an option to connect a SSD via eSata and upgrade to 4GB RAM, but not sure if this will be enough...
Thanks!
Edit:
The database is about 25 GB and the indexes are not setup properly. When I tried to add an index, I let it run for about 8 hours and it still hadn't finished so I gave up. Should I have more patience :)?
In general, the queries will run once in a while and its ok even if it takes a couple hours to complete....
Also, the queries will produce probably about 10 million records which I need to process using Stata/Matlab and I'm concerned that my current laptop is not strong enough, but unsure of the bottleneck....
You didn't complain about how long it took to fill the DB with 200 million records, so I conclude your lapttop is fast enough.
Databases work because they store the data records with certain "key" fields indexed to provide fast access. (Otherwise every query would have to read every record every time, and 25Gb takes a long time to read).
If your query is supposed to produce a modest number of results (say, 10K records), I'd guess your problem is that your records aren't indexed by the principal element of the query.
Speccing hardware for a huge database is something that's incredibly difficult to get right. We've got several large databases (9M+ records in some tables) and one massive database (300M+ records in most tables), and even for us it's been a bit of a hit and miss affair.
But just as important as hardware specs is your database schema, and as Ira mentioned, Indexes are king in this scenario.
It will take much longer than 8 hours to create the index, and the table has to be completely unlocked, as creating an index locks the entire table. An index taking a long time to create is often a sign of a locked table, so make sure that nothing else is using the database when you try to do it.
What will make things harder for you in this scenario is the speed of your disk. A single 5400 RPM hard drive (I've seen laptops with even slower spindle speeds, 5400 is pretty good for a laptop) will take a LOT longer to load the data into cache for indexing than a 15,000 RPM drive. As a comparison, our SQL server has 14x 15,000 RPM disks (for a grand total of 210,000 RPM, or a whopping 38x faster spindle speed, and I would guess maybe 50x faster access speed in real life) and I would expect creating an index on 200M records would still take over an hour.
So, get that SSD if you can. It will help dramatically. One SSD's performance gain over a 5400 RPM drive is almost too great to measure.
If you can, get that 4Gb of RAM. Not only will it help your SQL server speed (not as much as SSD, but it will still help), but from my (brief) experience with Matlab, it will help there too.