I noticed a strange behaviour on the setup that I created on AWS (eu-south-1 region) to evaluate the migration from our current provider. I want to deploy 1-n EC2 instances that serves web requests through Apache, serving files stored on a common EFS volume. I already implemented and tested the RDS part, the PHP performance, the ElasticCache integration and so on. However, I noticed a 5 seconds delay on sporadic requests, a delay that seemed very deterministic and too much near the 5 seconds value. The EFS volume is in burst-mode, the credits are high (2T) and the percentage of usage is really low, so this should not be the problem.
I mounted the EFS volume with the suggested options, both with the "EFS mount helper" and the "NFS Client", nothing changed. So, I tried to restart from the scratch, installing just the default Apache web server (even tried with Nginx with similar results) and mounting the EFS volume and benchmarking from another EC2 instance with the following command:
siege -c 2 -r 20 -b http://35.152.48.17/efs-mount-point/efs-test/logo.png
With Ubuntu 18.04 and Ubuntu 20.04, the longest transaction is always above 5 seconds (5.12 - 5.42 seconds). With AmiLinux instead, the longest transaction is fast enough (0.15 seconds). Interestingly, if I lower the parallel clients from 2 to 1:
siege -c 1 -r 20 -b http://35.152.48.17/efs-mount-point/efs-test/logo.png
the longest transaction is ok also on Ubuntu, even if I let "siege" run for more repetitions:
siege -c 1 -r 10000 -b http://35.152.48.17/efs-mount-point/efs-test/logo.png
However, if we remove the EFS variable on the Ubuntu and we serve the files from the local EBS, the longest transaction is blazing fast (few milliseconds), so the problem arises only on EFS with Ubuntu (both 18.04 and 20.04). Maybe the suggested mounting options works for AmiLinux but lack something for Ubuntu AMIs?
The repro steps are so easy that it feels so strange to me:
- choose the Ubuntu 18.04 AMI;
- mount the EFS volume (either with the "EFS mount helper" or with the NFS client);
- install Apache updating just the serving directory.
Any suggestion?
I finally found the solution.
The problem happens only in kernels "5.4.0-1029-aws" and "5.4.0-1032-aws". The issue seems solved in "5.4.0-1034-aws" and "5.4.0-1035-aws" kernel versions.
So, you just need to upgrade the kernel:
Then, after reboot you should have the new kernel in place. Check it with the following command:
You should see this result:
Then you should not have latency anymore.