On my local machine (i7 9200) running Debian Wheezy amd64 I can get some significant speedups on some "big data"/HPC type stuff by:
- Following the instructions here to reserve some RAM for huge pages and setting up hugetlbfs.
- Running my application using libhugetlbfs' (2.17) nifty
HUGETLB_MORECORE=yes
to redirect its mallocs to 2M pages.
The application also runs reasonably well on Debian Wheezy on EC2 (I'm using latest wheezy AMI) with normal 4k pages (some scalabilty testing tried on c3.2xlarge, c3.4xlarge and c3.8xlarge instances). But I'm curious to also see whether I see similar benefits using huge pages on EC2, if it's possible.
I fired up a c3.3xlarge instance and set up huge pages as usual. And after that /proc/meminfo does indeed report
HugePages_Total: 4096
HugePages_Free: 4095
However after compiling libhugetlbfs, it's make func
self-testing triggers some kernel errors. The system soon after seems to lock up, but not before I'd had time to inspect dmesg and see a bunch of call stacks with various xen_
and hugetlb_fault
symbols in them. Once it became unresponsive, system needed a forcible stop from the AWS console to get it to halt.
I did try booting up again and just running my app with HUGETLB_MORECORE=yes
anyway (in case the make func
testing was breaking on something obscure I didn't actually need), but much the same thing happened again.
Any success stories with libhugetlbfs on EC2 (preferably with Debian), or recipes for getting it working correctly ?
Research: there's scant googleable info about huge pages on EC2 (or Xen) out there. I did find this, which seems to report the same problem: /proc/meminfo reports hugepages available, but attempting to use them kernel panics. Article predates the new c3 instances but suggests a cc2.8xlarge might be worth a go due to it using HVM instead of PVM.
Update: couldn't find an up-to-date Debian AMI for HVM, but tried an Ubuntu one (13.04 "raring") on a cc2.8xlarge and libhugetlbfs and HUGETLB_MORECORE=yes
does seem to work fine on that. The only thing is, it actually slows my application down a little!