i currently have a client that will be adding replicated data from satellite locations in the number of approximately 80TB per year. with this said in year 2 we will have 160TB and so on year after year. i want to do some sort of raid 10 or raid 6 setup. i want to keep the servers to approximately 4u high and rack mounted. all suggestions welcome on a replication strategy. we will be wanting to have one instance of the data in house and the other to be co-located (any suggestions on co-locate sites too?).
the obvious hardware will be something like a rack mount server with hot swap trays and dual xeon based type processors. the use of the data is for archives of information, files will be made up of small file sizes.
i can add or expand to this question if it is too vague. thanks for looking.
Sun Fire X4540 - 96TB in 4u.
Buy one box* each year, don't buy before you need it. (The full storage system should have some level of redundancy, typically one box for work in progress and one box for each archive set.)
ZFS with deduplication on RAIDZ3. Deduplication could reduce your data storage needs by a large factor depending on data patterns. Add compression if data type and usage permits.
HP Lefthand storage arrays will do all of what you want. RAID 10, or 6, or 60 (probably the best choice given the data density you'll need). They do off-site replication, data deduplication, and you can have your choice of iSCSI or FC connections. The only drawback is the price, be prepared to pay for all that storage and features.
If you're willing to roll you're own solution; you might consider a Chanbro case, 4U can hold 48 SFF drives (with 500GB drives thats 20+ TB per 4U), and they have SAS expander boards that allow an empty chassis (full of drives) to be used as an external case.
For most situations I highly recommend going with a "professional" solution as it will be supported and more likely to continue to be supported 4+ years from now.
We are using CORAID shelves for some of our stuff. The last shelf we are setting up is a 24 port filled up with 2tb drives http://www.coraid.com/PRODUCTS/SR-Series/SR2421-EtherDrive-Storage-Appliance_2 . We got 4 shelves so far. It takes up 4u, is certified with vmware and has linux & windows drivers available (I use both currently).
The cost/gb is pretty low. I deployed the first shelf on 2006 and never had any problems with CORAID equipment.
Considering the data growth it's a good idea to think about a storage device that can grow with your needs. (like an XIV/DS????/StorageTek/StorageWorks etc.).
Also is it possible to apply deduplication before or even after replication (in the event it makes sense, of course)?
You could consider a HSM implementation with the final layer of storage being tapes not necessarily disks.
I don't know if 4u servers are the correct approach in this case, what will the capacity per server be and how many 4u servers do you think you will need in let's say 6 years (for almost half of petabyte of data) ?
Are you willing to build your own? Or are you looking at a vendor solution? Do you need to access the data as a single volume, or, will your system handle figuring out which server your data is on? Are you concerned with having at least two copies of your data on disparate systems or are you handling backups in addition? Does this need to act as a SAN, or is data archived and able to deal with slightly slower access times?
That said, if you are looking at a vendor provided solution, their salespeople are paid very well to identify what you need and provide a quotation. If you are looking to build your own, consider ZFS.
The most dense 4U solution I've used http://www.supermicro.com/products/chassis/4U/?chs=847
If you were to build your own similar to a Lefthand solutions product or with glusterFS, you could build a cluster with redundancy on top of a number of these nodes.
I'd seriously consider using HP's MDS 600 disk enclosures - you can get 140TB in 5U, they're very fast, reliable, can serve multiple servers or have multiple boxes connected to a single server and can easily be RAID 10'd.
CLICKY for link.
This is a faily large and tricky subject but since I'm not here for points, i'll give it a try.
You may have to determine the upgrades. Lets say you want to set this up and run it on this hardware for 5 years, your client is going to pay a lot for this unused space. I say unused space because the overhead for the next 5 years isn't going to be used and isn't going to be cheap let say you would choose a SAN. Although this solution is tempting, (ie: buying a large SAN with 1/5 of disks used.) BUT, in 3 years, you may be able to buy the same hardware but there is less chance you may buy the new and (most of the time bigger, let say 6TB disks).
If on the other hand you plan for 18 months to 30 months upgrades, they may save a lot. Hardware changes fast and SAN or storage technology move fast too. In the last year, most vendor now offer SAN deduplication which by example takes a 512kb into a 128kb encrypted hash. If this storage solution is for disaster recovery, depending on the budget, tapes sill does a good job. They are slower to restore but still works and does not cost much. If the data need to be accessed, either on a daily basis or from times to times, tapes are probably not a great idea.
I suggest you call some software vendor like Commvault and Symantec and also Dell(EMC), HP, ... for the hardware solution that they will suggest. Make sure it is clear with your client what kind of restore time they need and are willing to pay, restoring that much of data is a couple of days of downtime. Hope this may help.
The ULTRASTOR™ RS16 JS is an enterprise class JBOD storage system designed to provide storage expansion to the award-winning ULTRASTOR™ RS family. Utilizing point-to-point host/disk connectivity and high performance SAS disks, the ULTRASTOR™ RS16 JS offers Enterprise-grade fault tolerance solution and improved performance for applications that demand nothing short of superior reliability. Designed to accept both high reliability SAS disks and high capacity SATA disks, the new ULTRASTOR™ RS16 JS is becoming the new standard for enterprise storage expansion.
UltraStor™ Versatility Featuring two SAS channels and one SAS expandable port, the RS16 JS provides expandability critical to a company's IT infrastructure. ULTRASTOR™ RS16 JS is ideal for data intensive applications, and is powered by dual cutting edge SAS host connections which maximizes disk performance allowing for a broader range of storage possibilities. The RS16 JS works best when connecting to an advanced ULTRASTOR™ RS series RAID system with dynamic on-line RAID expansion, allowing additional capacity to be added without total system downtime, increasing your department's productivity.
Key Features x icon Enterprise-class, cable-less enclosure design with integrated power and cooling x icon Next generation Serial Attached SCSI dual ports produce up to 2,400MB/s bandwidth x icon 3U modular rack design easily integrates into existing storage infrastructures x icon Accept a mixed of high reliability SAS drives or high capacity SATA drives x icon Extended storage management through Web GUI via TCP or LCD panel
Larry Aguilar Enhance Technolgy (562) 777-3488 [email protected] http://www.enhance-tech.com/products/ultrastor/index.html