I´m building a url shortener web application and I would like to know the best architecture to do it in order to provide a fast and reliable service.
I would like to have two separate servicies in different machines.
- The first machine will have the application itself with a apache, nginx, whatever..
- The second one will contain the database.
- The third one will be the one that will be responsible to handle the short url petitions.
UPDATE:
The service is not a url shortener at all. It was just easier to explain it like that.
I just need one machine that receives one http query and inserts a record on a database. And I need this machine to do this simple task in a very efficient way. The system will run on linux (I don´t know the distro yet) and I´m totally open to any language or technology. I was thinking using Yaws, Tornado or Snap for that service, but I don´t know yet and its time to plan the architecture for that part. The database will be built on Hadoop.
For the third machine I just need to accept one kind of http petition (GET www.domain.com/shorturl), but it have to do it really fast and it should be stable enough.
Do you really think there is need for yet another URL shortener? There are just so many of them around... unless you've by chance managed to acquire a very short and appropriate domain name, I just don't think your site is going to be noticed by anyone. Just my two cents, of course.
Anyway, to the technical part:
It's difficult to answer your question without even knowing this.
The only answer that can make any sense here is "avoid Java like a plague". A Java application server is overkill for many applications, and it would for sure be overkill for such a simple one.
I'd go for Linux/Apache/MySQL/PHP here... if I could think of any good reason to even start the project, of course.
Edit:
Ok, now it makes a little more sense; but the suggestion to start as simple as possible and then worry about scaling up is still valid. If your application really is that simple, any decent web server/language/database combination should be able to process lots of requests per second on modern hardware (but I still strongly suggest avoiding Java).
If performance is paramount, I'd go with a CGI application written in C; that will be the fastest possible solution, orders of magnitude faster than any interpreted or VM language; and having it do simple INSERTs and SELECTs to a database shouldn't be so difficult. But I think LAMP is more than enough for your needs... they actually run Facebook on it, do you know?
Are these just recording data, or do they also send back something of interest? If they're just logging, then just use apache and fling the apache logs into hadoop. If they have to return some sort of data, then it's not at all clear to me how they get the data that they're returning.
Still, apache set up to just return a static file for any request is pretty damned fast.
First, I know you said it's not a URL shortener, but if it's anything similar, a RDBMS is a terrible way to store this data; since there's no real relationship between any two pieces of data, you want a flat storage engine. Consider Mongo (or Couch, depending on your actual solution space).
As to your solution, beware premature optimization. There are a lot of ways to go crazy with this; since you asked, the craziest that I can think of offhand might be to fire up Varnish, write all your pages in the VCL, and have it connect to memcache on the backend to store and retrieve the corresponding data. But realistically, that's batshit crazy unless you're under patently absurd loads.