Ping a Specific Port

Question

linkedlinked

Asked: 2010-06-22 10:51:01 +0800 CST2010-06-22 10:51:01 +0800 CST 2010-06-22 10:51:01 +0800 CST

Scaling a GIF hosting site

772

My friend runs a popular Youtube-to-GIF conversion site. Right now, he has converted 250,000 Youtube videos to GIFs (each video gets 6 thumbnails for 1.5m total GIF files) and serves about 80TB of bandwidth per month.

His server is IO blocking -- I'm not a guru admin, but it seems to be the harddrive seek time for non-sequential GIFs that's clogging everything up. He has a server with 100tb.com for $300/mo, and it comes with 100TB free bandwidth. At first, I advised him to get a CDN to solve his problems, because then the GIFs get served without consuming his server resources, and his main box could just handle the encoding -- We found one CDN for $600/mo that was too slow/unreliable, and the rest wanted at least $2000/mo for 80TB of bandwidth. We're trying to keep the whole project under $900/mo, right now.

So the cheapest bandwidth we can find is with 100TB, but we're outgrowing one server. We could add another server, but I don't really know how to partition the GIF storage so that the load is distributed evenly between two boxes. Our host recommended using software like Aflexi.net, but I'm sure there must be a cheaper solution.

Can anyone help? I'm a programmer by trade, not a sysadmin, but trying to learn the ropes. Thanks!

4 Answers

Voted

Malte Diedrich · Answer 1 · 2010-06-22T11:21:52+08:00

Best Answer

Malte Diedrich

2010-06-22T11:21:52+08:002010-06-22T11:21:52+08:00

S3 is no alternative, the bill for 80 TByte will be over 8k$ alone per month.

It looks like you serve the GIFs right out of the filesystem. Why don't you put all the GIFs on 2 machines, use a hash-algorithm mapping the name to one of the 2 machines and deliver them this way? This would easily scale to more machines as long as your loadbalancer holds up…

1

Bill Weiss · Answer 2 · 2010-06-22T10:54:57+08:00

Bill Weiss

2010-06-22T10:54:57+08:002010-06-22T10:54:57+08:00

Dump the files to S3 and serve them from there. The poor man's CDN :)

If you need more processing power, you can do the conversions out of EC2 instances and dump directly to your "CDN" as well.

0

Amethi · Answer 3 · 2010-06-22T14:30:54+08:00

Amethi

2010-06-22T14:30:54+08:002010-06-22T14:30:54+08:00

I can't comment on the other comments, but they sound good. I would look to lift some of the load from the file servers by keeping your most commonly accessed (i.e. most popular) files in a memory cache, i.e. have a http handler that does something like this:

Receive the GIF request
Check if its in memory, if so, serve to client
If not, get from one of the file-servers (do some round-robin here) and add to memory cache
Return GIF to client

If you can get a machine with a crap-load of RAM, you're laughing as it's quite likely you'll be able to fit a large percentage of your popular files in memory.

And when you saturate that, add another image-handler server and round-robin them. Keep doing this until something breaks, i.e. throughput, scalability, economy.

I've done something like this before to good effect.

0

blacklotus · Answer 4 · 2010-06-22T17:33:31+08:00

blacklotus

2010-06-22T17:33:31+08:002010-06-22T17:33:31+08:00

If it's just 2 machines, you can consider using DRBD to sync between both machines. Then just use PHP to decide randomly or algorithmically which server to pull from during a request. Simple but workable solution.

0

Scaling a GIF hosting site

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?