Ping a Specific Port

Question

Phil

Asked: 2011-06-07 08:59:16 +0800 CST2011-06-07 08:59:16 +0800 CST 2011-06-07 08:59:16 +0800 CST

24TB RAID 6 configuration

772

I am in charge of a new website in a niche industry that stores lots of data (10+ TB per client, growing to 2 or 3 clients soon). We are considering ordering about $5000 worth of 3TB drives (10 in a RAID 6 configuration and 10 for backup), which will give us approximately 24 TB of production storage. The data will be written once and remain unmodified for the lifetime of the website, so we only need to do a backup one time.

I understand basic RAID theory, however I am not experienced with it. My question is, does this sound like a good configuration? What potential problems could this setup cause?

Also, what is the best way to do a one-time backup? Have two RAID 6 arrays, one for offsite backup and one for production? Or should I backup the RAID 6 production array to a JBOD?

EDIT: The data server is running Windows 2008 Server x64.

EDIT 2: To reduce rebuild time, what would you think about using two RAID 5's instead of one RAID 6?

8 Answers

Voted

wazoox · Answer 1 · 2011-06-07T13:02:01+08:00

I currently support 220 servers up to 96 TB (totalling 2 PB or so), some in clusters of up to 240 TB, that my team built. Here are my advices :

use a good, reliable hardware RAID controller : possible choices are 3Ware 96xx or 97xx, LSI 92xx, Areca 16xx, Adaptec 5xx5... Of course, with a Battery Backup Unit because power failures occur sometimes.
use only professional grade drives,coming with 24/24 and 7/7 operation support; don't use cheap desktop drives. You don't want to lose 100,000$ worth of data because you chose to save 20 bucks per drive.
The biggest the drives, the longer the rebuild. 3 TB will need at least 12 hours in the best case. Use RAID-6 for reliable protection.
drives do fail. Up to 5% per year; don't even dream of using JBOD, even for backup. This is plain bad advice. Use RAID-6.
RAID-5 is obsolete, we simply don't use it anymore with drives bigger than 300GB. See this expert post for instance. Did I mentioned you should use RAID-6?
For only 24 TB, I'd stick to 2 TB drives; there is a 10-15% premium on 3 TB; more spindles will provide better performance, shorter rebuild, and better safety because the drives have been available for quite a long time and are really very reliable.
You could buy an excellent 3U Supermicro, AIC or equivalent chassis with 16 drives slots, filled with 2TB drives (RAID-6 + hot spare) that would provide exactly 24 TiB of available space and redundant power supplies.

TheCompWiz · Answer 2 · 2011-06-07T09:11:31+08:00

Honestly, I think $5k for the drives is a bit steep... but that's a whole other subject. The setup sounds sound-enough, but in the event of a drive-failure... having a single-volume that is 24tb will take FOREVER to rebuild. (ever tried to read 3tb of data split across 9 other disks?) It would be better to have smaller raid-sets and join them together to form a bigger volume. If a drive fails, it doesn't kill the performance of the entire volume while the whole thing rebuilds... but rather only the performance of the one raid-set.

Also, depending on what your website is run on... (Linux/Windows/OSX/Solaris/???) can also dictate what tools you use and the configuration you use.

What do you mean by a "one-time backup?" If you meant a "one-way archive"... (i.e. new files are written to the backup-server.. but nothing is ever read from it), I highly recommend using rsync in *nix flavored environments (linux/unix/etc...) or if it's IIS (windows) based use something like synctoy or xxcopy. If you need a LIVE copy (0 delay between when a file is written to when it appears on other server) you'll need to provide more information about your environment. Linux & Windows work completely different, and the tools are 100% different. For stuff like that, you'll probably want to look into clustered-file-systems and probably should look more towards a SAN rather than host-based storage.

Phil · Answer 3 · 2011-06-07T09:08:30+08:00

Phil

2011-06-07T09:08:30+08:002011-06-07T09:08:30+08:00

We generally use RAID5 or 6 for backup disks as it gives the best bang-for-buck once you ignore RAID 0 :-) so I'd go for that rather than JBODs

One thing you might consider is buying your disks in separate batches rather than all 20 at once as if there is a manufacturing defect in a batch, they may fail at similar times.

You also may wish to consider using mirroring rather than conventional backups if the data is only being written once - there are quite a few software and hardware storage systems that allow that to be set up and you may also get the benefit of failover in the event of your primary storage failing.

4

Tom Shaw · Answer 4 · 2011-06-07T18:52:32+08:00

One option that would fit well with your use-case, especially if your requirements keep growing, is an HSM (Hierarchical Storage Manager). I've installed several HSMs ranging up to 150TB of disk and 4PB of tape.

The idea is that an HSM manages the lifecycle of data to reduce the overall cost of storage. Data is initially stored on disk but almost immediately archived to tape (which is much cheaper per byte). Archive policies can be configured to store multiple copies on tape for extra safety, and most people take a second copy offsite. The migration to and from tape is transparent to the end user - the files still appear in the filesystem.

When the end user requests the file in future, the data is automatically staged back from tape and served to the user. With a tape library, the staging process only adds about a minute to the retrieval time.

One huge benefit of an HSM is the recovery time if your disks fail or if you have filesystem corruption. If you ever have a catastrophic disk or filesystem failure, you can just find some more disk and restore a recent backup of the filesystem metadata (a tiny fraction of the total data volume). At that point, all of the data is available on-demand as per usual.

Jim B · Answer 5 · 2011-06-07T09:26:13+08:00

Jim B

2011-06-07T09:26:13+08:002011-06-07T09:26:13+08:00

when determing the raid configuration for a san you have to worry about performance and the amount of reliability, and recovery time you require. Because you double the number of parity writes (depending your particualr flavor of raid six) it's usually best in a san with custom ASICs to do the calculations. Since your data is static your real concern is how long you can afford to be in a degraded state should 1 drive fail. Also of note is that drives tend to fail multiples so it's best to install drives with some time between sets.

As far as backups go, I see no need for redundancy in the backup set so JBOD is fine

1

Javier · Answer 6 · 2011-06-07T09:52:51+08:00

Javier

2011-06-07T09:52:51+08:002011-06-07T09:52:51+08:00

I currently have filesystems on that scale range, currently totaling 58TB onsite, plus a separate copy offsite.

I've had a few drive failures and yes, the bigger the drives, the longer the rebuild. To alleviate it somewhat, I split the storage in several RAIDs, each one 5-7 drives. It's currently RAID5, but when I get 3TB drives I plan to start using RAID6.

It's all joined and resplit with LVM, so i don't have to think about what goes where, simply add extra boxes when needed and remove old drives when they're too small to justify the slots they occupy.

The Hardware is mostly Coraid AoE boxes (but some iSCSI targets will join soon), managed with LVM, the filesystems are Ext3/4 if under 4-6 TB, or XFS if over that (up to 34TB, currently). All backup is handled with rsync and DVD for offline archive.

Besides some monitoring software (mostly Zabbix), it's a nearly maintenance-free setup.

0

Chris N · Answer 7 · 2011-06-07T11:37:59+08:00

Chris N

2011-06-07T11:37:59+08:002011-06-07T11:37:59+08:00

Another point to add to what everyone is saying here. With Windows and huge file systems, if you do decide to break a filesystem up, but want to retain the same file structure as you would have had, look at mounting these drives to folder paths.

http://technet.microsoft.com/en-us/library/cc753321.aspx

0

Ask Bjørn Hansen · Answer 8 · 2012-03-26T19:17:20+08:00

Ask Bjørn Hansen

2012-03-26T19:17:20+08:002012-03-26T19:17:20+08:00

I'm surprised nobody has suggested using MogileFS (github).

MogileFS will mirror data on different servers automatically and each disk is just a "JBOD" dumb disk. There are many production installations with many TBs (100+) of data.

For the server hardware there are many options for "lots of disks in an enclosure". For example a Backblaze Pod (a bit of do-it-yourself/unsupported, relatively) or a Super Micro server (we use Silicon Mechanics. I believe at wordpress.com they use regular 2U Dell servers with MD1000 enclosures for the disks.

0

24TB RAID 6 configuration

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?