I have already selected httperf as the load testing tool to use. I'm trying to figure out some realistic parameters to use, and if I need to use multiple machines to create more simultaneous connections, etc. I've done some basic load testing before but nothing very sophisticated.
The only information I have about the estimated load is that I'd like to be able to handle 3 million hits in a day. Part of my problem is that I don't have a good rule of thumb for how web traffic is "bursty". Clearly, it depends on the specific site and is never the same twice. But, maybe there is a rule of thumb that says, if your average load for a day is X requests per second, then you should plan for Y simultaneous connections and a peak rate of Z requests per second.
I've done a fair bit of searching around, and while I've found a number of explanations of the various load testing tools and their parameters, I've never seen a decent write up of how you can come up with realistic values to use for the parameters.
Although this might not be an answer you are looking for I have found JMeter to be an excellent resource across platform for various performance tests.
You can read more on JMeter from the Apache site @ http://jakarta.apache.org/jmeter/
As it differs for every site and situation, here's my experience:
For a local audience, unless your site is aimed at kids/elderly, you'll find traffic is distributed roughly along (office drone) work hours. 8/24th of the day will carry all your traffic. Peak is roughly 2 times the normal traffic in that period, usually arround lunchtime.
For a global audience, the traffic is distributed arround the day, with some drops (the pacific ocean is sparsely populated).
If you are linked from a major news site or do an interesting large TV/radio ad, you can expect traffic off the charts. No point in doing predictions there.