I'm looking to split our image serving functionality off from our main server. We have nearly half a million images - any of which could be accessed at any time. I've been investigating using a W2k8 cluster connected to a iSCSI SAN and was wondering if there is a better way to provide a redundant way of serving large amounts of images? It has been suggested that losing the SAN and having a copy of the pictures on each file server would be a better solution - is this the case? The OS must be Windows based.
Thanks,
Andrew
Might I suggest going over to Amazon S3 for image hosting? Depending on your bandwidth, the storage and hosting is cheap and I would think much more reliable and cost efficient than having your own redundant SAN and distribution system. There are many success stories.
I know you say the OS must be Windows based, but not sure of your requirements, as the interface to S3 is in Windows.
You might find that a distributed file store is a better method for storing a large number of files redundantly. These usually involve a special API for storing and accessing the files, rather than using standard file operations. The storage system is then responsible for storing your files redundantly. The classic example is Amazon S3, but that probably wouldn't be the idea solution for you. There are a number of products that can be used. I've not used any Windows products, but you might look at Facebook's haystack, which may be written in Java, so could potentially run on Windows. Livejournal's MogileFS is another example, but that runs in Perl and last time I looked, had a single point of failure. I'm sure you can easily find several more similar products.
It depends a lot on what your goals are.
A Cluster will make the images highly available but will not necessarily increase performance.
Using DFS Replication, you can make your images both Highly Available and Geographically load balanced which will provide MUCH improved performance in a global company. (This is how Microsoft handles their software share) However, if the images change a lot, DFS replication can lag behind a bit.
You could also use multiple clusters that are load balanced through an NLB cluster.
And then of you could go as far as to use a combination of Failover Cluster, NLB, AND DFS!!!
It really all depends on what your goals are.
In our case, we share our files directly from the SAN. Our NetApp storage array is a 3020 cluster that also acts as a CIFS file server. Files live directly on the SAN and are shared to clients. Not sure if you're project is in the market for a solution like that, but a SAN with CIFS sharing has been a great advantage to us.
A SAN means you get more storage for a given number of disks.
E.g. if you have 2 1G disks mirrored in 10 hosts, that means you can store roughly 1G of images you can store.
OTOH if you have the same 20 1G disks in a mirrored SAN you can store roughly 10G of images.
Even better though - you have those 20 disks in two san enclosures mirrored over 2 sites e.g. 10 disk in site A 10 disk in site B, and A and B are mirrors then you still get your 10G of storage but increase your data's resilience.
I.e. site A can go down and you can still be serving data from site B. (actually you'd probably have 9G, i.e. 9 active disks and have a hot spare disk in each san enclosure).
If you also spread your servers out, 5 at each site, you also increase your entire site's resilience. If site A goes down, you'll lose 5 of your servers but the other 5 will still be serving and you'll still have access to all or your data.
You also remove the need to sync the data on the 10 servers with a definitive src of images. Depending on how you plan to backup, backups may be easier with SAN too as you'll only need to do one data backup.
The only reason I can see for losing the SAN is if you can't afford it and/or your data requirements are small and you don't anticipate they'll grow a great deal.
The more data you have the more your savings should be with the SAN approach as the SAN enclosures get cheaper per G the more trays/disks you hang from them.
If you go iSCSI make sure you have a dedicated VLAN and ideally dedicated switches for your iSCSI network.