Ping a Specific Port

Question

Pierre D

Asked: 2013-12-17 21:18:42 +0800 CST2013-12-17 21:18:42 +0800 CST 2013-12-17 21:18:42 +0800 CST

AWS hs1.8xlarge RAID performance issue

772

EDIT: I cannot get my hs1.8xlarge AWS instance to deliver high performance IO from its local 24 drives. Please don't tell me how to make EBS volumes faster.

Context: After running for a couple of years and with great success Greenplum single-node edition 4.0.4.0 on an Amazon cc1.4xlarge instance (let's call it gp), I figured it would be really nice to take advantage of the hs1.8xlarge instance and its 24 hdd (48TB raw) locally mounted disks, plus 120GB of RAM. Let's call this new setup hsgp.

On gp, I had mounted in RAID-0 20 EBS volumes (given that EBS volumes are backed up and relatively robust against bit errors, I figured I would go for maximum speed).

Now, the new shiny hs1.8xlarge, I figured, would just handsomely top that setup. So far I was wrong. A bunch of small and simple queries (a few million rows each) come in at around 900ms average for gp, 2800ms for hsgp. Larger queries (6 billion rows) also show at least 2 to 3x advantage for gp.

I am by no possible stretch of imagination an expert at RAID levels, but I figured RAID-10 was a reasonable choice for the 24x 2TB drives. I use ext4 on the raid array, with -m .1 -b 4096 options, and it is mounted with -a noatime.

One thing I've noticed is that, even after the three days it took for mdadm to settle ("resync the drives"), it is not as fast as Amazon claims an hs1.8xlarge can deliver: I get roughly 305MB/s write, 705MB/s read. Amazon states that it is possible to get up to 2.4GiB/s sequential write, 2.6GiB/s sequential read.

Any ideas to get a more performant setup?

Should I abandon a unified disk space (an array with the 24 drives) and instead have smaller arrays, one per greenplum slice?

Below are details of the hsgp setup:

I used the hvm Amazon linux instance (amzn-ami-hvm-2013.09.1.x86_64-ebs (ami-d1bfe4b8)), and updated to vmlinuz-3.4.71-63.98.amzn1.

The parameters to tune the system are given below.

sysctl.conf:

# greenplum specifics in /etc/sysctl.conf
kernel.sem = 250 64000 100 512
kernel.shmmax = 68719476736
kernel.shmmni = 4096
kernel.shmall = 4294967296
kernel.sem = 250 64000 100 512
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2

limits:

# greenplum specifics in /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

RAID array details:

mdadm --create --verbose /dev/md0 --chunk=2048 --level=raid10 --raid-devices=24 /dev/xvd[b-y]

mkfs.ext4 -v -m .1 -b 4096 /dev/md0
mount -o noatime /dev/md0 /data

2 Answers

Voted

notpeter · Answer 1 · 2014-10-24T11:56:41+08:00

notpeter

2014-10-24T11:56:41+08:002014-10-24T11:56:41+08:00

A number of things which may account for this performance gap:

Comparing 24 spindle RAID-10 vs 20 spindle RAID-0 volume write performance would be expected to max at 12x and 20x of a single disk respectively. So a ~2X slowdown off the bat isn't insane.
You've made your chunk size is only 2KB. The default is 512KB. (supporting benchmarks).
The actual quote "2.6 GB per second read and write performance...with 2 MiB block size." (Source). Your ext4 block size is 4K which is 512 times smaller.

You've also left out details about your 20-EBS backed volume setup. Without specifying volume size nor type (ssd GP, ssd provisioned IOPS or magnetic) we're just left guessing about that size of the equation entirely.

1

nandoP · Answer 2 · 2013-12-17T21:59:34+08:00

nandoP

2013-12-17T21:59:34+08:002013-12-17T21:59:34+08:00

if diskio is your bottleneck, you may get much better performance and ease of mgmt, by running an iops volume at 4000G/s...... this is easier to manage than raid0 on regular ebs volumes, and the ability to ebs snapshot makes recovery easy. my preliminary benchmarks show iops 4000 faster than raid0 with 6 100G shards, but i have not tested thoroughly and consistantly enough to give exact numbers.

0

AWS hs1.8xlarge RAID performance issue

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?