I am running Linux on Amazon EC2 severs. I need to copy millions of files that total hundreds of gigabytes between two EC2 systems in the same availability zone. I don't need to sync directories, I just need to copy all the files in one directory over to an empty directory on the other machine.
What is the fastest way to do this? Has anyone seen or run performance tests?
rsync? scp? Should I zip them first? Should I detach the drive they are on and re-attach it to the machine I'm copying to, then copy them? Does transferring over the EC2's Private IP speed things up?
Any thoughts would be appreciated.
NOTE: Sorry this was unclear, but I'm copying data between two EC2 systems both in the same AWS availability zone.
If the files are already on an EBS volume (and if you care about them, why aren't they?):
Create a snapshot of the EBS volume containing the files on the first instance.
Create an EBS volume from that snapshot.
Attach the EBS volume to the second instance.
The new EBS volume may be a bit slow for a bit while it's filling in blocks from the snapshot, but it will be usable right away.
ALTERNATIVE (If the files are not already on an EBS volume):
Attach a new EBS volume to the first instance.
Copy the files from other disks to the new EBS volume.
Move the EBS volume to the second instance.
Use tar and netcat. If they're in the same subnet, and you're not too concerned about security.. This is a pretty neat solution. You can add stuff into the pipeline if you want security.. You could use gpg.. or compress first with gzip
On the receiving end do:
And on the sending end do:
You can use Amazon AWS Import/Export service. Ship the drive to them and let them do the copy for you. More expensive, but perfect in your situation when you need have many many GB of data transfer without waiting for weeks for the job to finish. Their link: http://aws.amazon.com/importexport/