Ping a Specific Port

Question

Nenad

Asked: 2011-10-13 08:03:30 +0800 CST2011-10-13 08:03:30 +0800 CST 2011-10-13 08:03:30 +0800 CST

setup lowcost image storage server with 24x SSD array to get high IOPS?

772

I want to build let's name it a lowcost Ra*san which would host for our social site the images (many millions) we have 5 sizes of every photo with 3 KB, 7 KB, 15 KB, 25 KB and 80 KB per Image.

My idea is to build a Server with 24x consumer 240 GB SSD's in Raid 6 which will give me some 5 TB Disk space for the photo storage. To have HA I can add a 2nd one and use drdb.

I'm looking to get above 150'000 IOPS (4K Random reads).

As we mostly have read access only and rarely delete photos i think to go with consumer MLC SSD. I read many endurance reviews and don't see there a problem as long we don't rewrite the cells.

What you think about my idea? - I'm not sure between Raid 6 or Raid 10 (more IOPS, cost SSD). - Is ext4 OK for the filesystem - Would you use 1 or 2 Raid controller, with Extender Backplane

If anyone has realized something similar i would be happy to get Real World numbers.

UPDATE

I have buy 12 (plus some spare) OCZ Talos 480GB SAS SSD Drive's they will be placed in a 12-bay DAS and attached to a PERC H800 (1GB NV Cache, manufactured by LSI with fastpath) Controller, I plan to setup Raid 50 with ext4. If someone is wondering about some benchmarks let me know what you would like to see.

7 Answers

Voted

David Spillett · Answer 1 · 2011-10-13T10:19:04+08:00

Use RAID6 over RAID10. For mainly read based I/O loads the throughput should be similar when the array is not degraded, you get better redundancy (any two drives can fail at the same time with R6, R10 can not survice if both failed drives are on the same leg (so can only survive four of the six two drive failure combinations in a 4-drive array, I'm not sure off the top of my head how that 4/6 figure scales for larger arrays)), and you get a larger usable array size unless you arrange the drives in 4-drive sub-arrays (see below).

Your space calculation is out, certainly for RAID10. 24*240Gb is 5760Gb with no redundancy (RAID0 or JBOD). With RAID10 you'll get only 2880Gb as there are (ussually) two exact copies of every block. If you use all the drives as one large RAID6 array you will get your 5Tb (5280Gb, two drives worth of parity info spread over the array) but I personally would be more paranoid and create smaller RAID6 arrays and join them with RAID0 or JBOD - that way you have shorter rebuild times when drives are replaced and you can survive more drives failing at once in many cases (two drives per leg can die, rather than two drives out of the total 24, without the array becoming useless). With four drives per leg you get the same amount of space as RAID10. Four 6-drive arrays may be a good compromise (4*4*240=3840Gb usable space) or three 8-drive arrays (3*6*240=4320Gb usable space).

With regard to controllers: these can be a single-point-of-failure for RAID. If the controller dies you lose all the drives attached to it at once. While such failures are pretty rare (random corruption is more common) there is no harm in taking care to reduce the impact should it happen to you. If you use RAID10 make sure that no pair of drives are both on the same controller (which means having at least two). If splitting into 4-drive RAID-6 arrays use four controllers and have one drive or a given array on each. This of course assumes you are using software RAID and simple controllers which might be unlikely (you are spending this much on drives, you may as well get some decent hardware RAID controllers to go with them!).

You should give a thought to a backup solution too if you have not already. RAID will protect your from certain hardware failures but not from many human errors and other potential problems.

slashdot · Answer 2 · 2011-10-13T15:58:07+08:00

Best Answer

slashdot

2011-10-13T15:58:07+08:002011-10-13T15:58:07+08:00

I would consider a hybrid solution which could be achieved with OpenSolaris, SolarisExp 11, OpenIndiana, or Nexenta. Hybrid pool would be a lot less costly, and with a few thousand bucks worth of RAM, you will have your 150k+ IOPS with mostly normal spinning disks. At Nexenta we have many, many customers who do just exactly this. ZFS is a robust filesystem, and with enough RAM and/or SSDs for additional Read/Write caching you can have a very robust solution at a relatively low cost. With Nexenta Core, which is community, you get an 18TB at no cost at all. Of course, a new release of OpenIndiana would allow a lot of the same functionality. Add to this snapshots, cloning, replication usinf ZFS send/recv and you can build a SAN that will give any EMC a run for its money at a far lower cost. Lots of SSDs are nice, but there are other options, some not half-bad.

6

Skyhawk · Answer 3 · 2011-10-13T08:51:44+08:00

Answering your key questions:

RAID 6 vs. RAID 10: You almost certainly do not need to worry about IOPS if you are using SSDs as primary storage.
SLC vs. MLC: There are subtler differences. If you are going to use MLC, I would suggest buying Intel. The Intel 320 series has a SMART counter that you can use to track the wear level percentage and replace the drive before it fails.

However, you may want to look at ZFS on the Nexenta OS (or possibly FreeBSD, unsure of development status) if you want to use SSDs to improve storage performance in a reliable way:

ZFS allows you to build a "RAID-Z2" (somewhat like RAID-6) array of conventional disks that use SSDs as massive read (L2ARC) and write (ZIL) caches, allowing you to get the performance benefits that you're looking for without the cost of an all-Flash array.
Blocks that are accessed often will be read from the SSDs, and blocks that are used less often will still be read from disk. All writes will go to SSD first and be committed to disk when it is convenient for the array.
Because you will need fewer SSDs, you will buy higher-quality devices and you will not have the kind of catastrophic failure that is to be expected if you build a RAID array out of consumer-grade MLC devices from OCZ (or whatever).
Even if you don't use high-quality devices, the consequences are less severe. If you use MLC devices for your ZFS L2ARC and they fail, you still have your data preserved on disk.

Chopper3 · Answer 4 · 2011-10-13T11:16:20+08:00

Chopper3

2011-10-13T11:16:20+08:002011-10-13T11:16:20+08:00

Just buy two FusionIO Octal cards and mirror them - far simpler, far faster (might be a bit more expensive however).

4

Basil · Answer 5 · 2011-10-13T11:38:48+08:00

150k IOPS with 4k blocks is 585 Mb/s throughput. Make sure your controller and backplane can handle that. As for raid, remember that protection against SSD failures is all it'll buy you. A controller failure (or memory fault, processor outage, or failure of any other single point of failure on the server) will render your data unusable. Keeping another identical server around in synch would be needed to avoid downtime and potentially having to go back to tape.

This second server, if filled with SSDs like the first one, might make it so it's almost cheaper to buy a centralized storage device that supports SSD, if it has no single points of failure. If, however, you keep your second server in synch using real hard drives, you can save a large chunk of change without affecting performance. Since most of the IO is reads, the load on the drives will be minimal except during times when the primary server is offline. This would allow you the financial flexibility to buy more than one replication target, and maybe even move some offsite in case of a site failure.

Alan Brown · Answer 6 · 2012-07-10T06:57:02+08:00

Alan Brown

2012-07-10T06:57:02+08:002012-07-10T06:57:02+08:00

You can avoid the RAID controller issue entirely by using ZFS - it can detect AND CORRECT silent corruption (data errors that get past ECC checks), which virtually no raid controller is able to to (detect yes, but not fix) and on a large drives (2Tb+) you can expect 1-2 errors per year, per drive.

Unfortunately if you want this with vendor support you'll need to use Solaris. Some Linux vendors support it it but it's still a beta product (That said, I use it on linux and I've found it virtually impossible to kill, right up to pulling several drives out of their bays while hot. At worst the array shuts down - but there is no data corruption)

1

n8whnp · Answer 7 · 2012-07-10T08:03:48+08:00

n8whnp

2012-07-10T08:03:48+08:002012-07-10T08:03:48+08:00

Doing all of this as a single server with expensive disks might not be the best answer. Given your budget and needs I would recommend looking at STF. It was designed as an image storage for one of the largest blogging services in Japan:

https://github.com/stf-storage/stf

0

setup lowcost image storage server with 24x SSD array to get high IOPS?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?