EDIT: I cannot get my hs1.8xlarge AWS instance to deliver high performance IO from its local 24 drives. Please don't tell me how to make EBS volumes faster.
Context: After running for a couple of years and with great success Greenplum single-node edition 4.0.4.0 on an Amazon cc1.4xlarge instance (let's call it gp
), I figured it would be really nice to take advantage of the hs1.8xlarge instance and its 24 hdd (48TB raw) locally mounted disks, plus 120GB of RAM. Let's call this new setup hsgp
.
On gp
, I had mounted in RAID-0 20 EBS volumes (given that EBS volumes are backed up and relatively robust against bit errors, I figured I would go for maximum speed).
Now, the new shiny hs1.8xlarge, I figured, would just handsomely top that setup. So far I was wrong. A bunch of small and simple queries (a few million rows each) come in at around 900ms average for gp
, 2800ms for hsgp
. Larger queries (6 billion rows) also show at least 2 to 3x advantage for gp
.
I am by no possible stretch of imagination an expert at RAID levels, but I figured RAID-10 was a reasonable choice for the 24x 2TB drives. I use ext4
on the raid array, with -m .1 -b 4096
options, and it is mounted with -a noatime
.
One thing I've noticed is that, even after the three days it took for mdadm to settle ("resync the drives"), it is not as fast as Amazon claims an hs1.8xlarge can deliver: I get roughly 305MB/s write, 705MB/s read. Amazon states that it is possible to get up to 2.4GiB/s sequential write, 2.6GiB/s sequential read.
Any ideas to get a more performant setup?
Should I abandon a unified disk space (an array with the 24 drives) and instead have smaller arrays, one per greenplum slice?
Below are details of the hsgp
setup:
I used the hvm Amazon linux instance (amzn-ami-hvm-2013.09.1.x86_64-ebs (ami-d1bfe4b8)
), and updated to vmlinuz-3.4.71-63.98.amzn1
.
The parameters to tune the system are given below.
sysctl.conf:
# greenplum specifics in /etc/sysctl.conf
kernel.sem = 250 64000 100 512
kernel.shmmax = 68719476736
kernel.shmmni = 4096
kernel.shmall = 4294967296
kernel.sem = 250 64000 100 512
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2
limits:
# greenplum specifics in /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
RAID array details:
mdadm --create --verbose /dev/md0 --chunk=2048 --level=raid10 --raid-devices=24 /dev/xvd[b-y]
mkfs.ext4 -v -m .1 -b 4096 /dev/md0
mount -o noatime /dev/md0 /data
A number of things which may account for this performance gap:
You've also left out details about your 20-EBS backed volume setup. Without specifying volume size nor type (ssd GP, ssd provisioned IOPS or magnetic) we're just left guessing about that size of the equation entirely.
if diskio is your bottleneck, you may get much better performance and ease of mgmt, by running an iops volume at 4000G/s...... this is easier to manage than raid0 on regular ebs volumes, and the ability to ebs snapshot makes recovery easy. my preliminary benchmarks show iops 4000 faster than raid0 with 6 100G shards, but i have not tested thoroughly and consistantly enough to give exact numbers.