I am setting up a nginx/php-fpm only server that will just redirect requests in a very simple way.
I am planing to do this like this:
- the request should be as simples as for example http://server-url.com?redirection-url.com,
- save the url in log file,
- log with a real time analytics (piwik.org)
- and at end proceed with the redirection via php-fpm.
This will be a very fast server only used to this purpose.
The question is: is there a way to calculate the right server I'll be needing if this server receives, let say, from 5k to 100k visits a day?
Would Linux Bootstrap JeOS VPS be a solution to this problem?
Thanks.
Like pQd mentions, the distribution of the visits will matter more than the total amount. If you expect a Slashdot Effect you'll need to plan for the moment of peak requests. However if the distribution will be more spread out over the day, the numbers you mention should be no problem for something as simple as providing a redirect.
Which Linux distribution you choose doesn't really matter, but the software does. It's hard to predict traffic and you'll need to benchmark if your solution will do what you need, which is hard.
PHP-FPM is definitely a better choice than a regular process-based PHP server, but you don't need PHP at all in the case you describe. After all, you don't want to return any content, and your request is not really dynamic. It's just some HTTP headers for the redirect, depending on the incoming URL. Just install a Varnish server and have it process the incoming URL and return the redirect header. Varnish writes standard NCSA logfiles, which you can then process with your analytics tool. Varnish is extremely fast and can handle thousands of requests simultaneously, while only using very little CPU and memory. A simple VPS will suffice.
In fact, your requirement for real-time analytics is the tricky one. As far as I know, the Piwik.org analytics tool works like Google Analytics and requires a JavaScript code-snippet to log the request. Where are you going to trigger that code? A redirect consists only of HTTP headers, so there's no room for executing JavaScript in the scenario you describe. If your logs don't have to be real-time, and Piwik supports reading Apache logfiles, you could process the logs every hour or so and I would go with the Varnish solution I mentioned above.
If you expect spikes of traffic, the I/O load of writing the logfiles may become a bottleneck. In that case you could log to memory, and process those logs into results later when it's less busy or even on another machine. Have a look at Redis for storing the log. It's a very fast key-value store that can handle a high write-speed. You could write a Varnish module that logs to Redis, or if you're more comfortable with Nginx I'm sure that could be made to work as well.
As you see, it pretty much depends on your requirements and the expected traffic.
it's quite difficult to answer; especially without knowing the distribution within a day. but vps + monitoring should give you quite good way to know when to scale and be able to scale [providing they dont oversubscribe too much]