We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also said to be optimized for this particular scenario, making S3 look like a local hadoop filesystem.
So I am wondering, why would anyone choose either one of these 2 in terms of cost and performance?
This question could be a follow up to AWS S3 costs for when AWS EMR uses it but unfortunately I don't have the reputation to post a comment there.
Thanks in advance for the help.
AS You are using EMR for your Compute operations and S3 for storage ..
FSX when integrated against s3 would provide a high throughput on your jobs because of its high IOPS... This would indeed be helpful for your execution timelines.. But again that would invite a higher cost.
https://www.youtube.com/watch?v=ZADHiZa3Hjo&list=WL&index=21&t=2752s
Mentioned above is one the finest Reinvent link