Ping a Specific Port

Question

T. Brian Jones

Asked: 2012-02-09 16:26:24 +0800 CST2012-02-09 16:26:24 +0800 CST 2012-02-09 16:26:24 +0800 CST

What's the fastest way to copy millions of files ( hundreds of GBs ) between Amazon EC2 Servers?

772

I am running Linux on Amazon EC2 severs. I need to copy millions of files that total hundreds of gigabytes between two EC2 systems in the same availability zone. I don't need to sync directories, I just need to copy all the files in one directory over to an empty directory on the other machine.

What is the fastest way to do this? Has anyone seen or run performance tests?

rsync? scp? Should I zip them first? Should I detach the drive they are on and re-attach it to the machine I'm copying to, then copy them? Does transferring over the EC2's Private IP speed things up?

Any thoughts would be appreciated.

NOTE: Sorry this was unclear, but I'm copying data between two EC2 systems both in the same AWS availability zone.

3 Answers

Voted

Eric Hammond · Answer 1 · 2012-02-09T17:20:56+08:00

Best Answer

Eric Hammond

2012-02-09T17:20:56+08:002012-02-09T17:20:56+08:00

If the files are already on an EBS volume (and if you care about them, why aren't they?):

Create a snapshot of the EBS volume containing the files on the first instance.
Create an EBS volume from that snapshot.
Attach the EBS volume to the second instance.

The new EBS volume may be a bit slow for a bit while it's filling in blocks from the snapshot, but it will be usable right away.

ALTERNATIVE (If the files are not already on an EBS volume):

Attach a new EBS volume to the first instance.
Copy the files from other disks to the new EBS volume.
Move the EBS volume to the second instance.

10

Tom O'Connor · Answer 2 · 2012-02-10T07:57:34+08:00

Tom O'Connor

2012-02-10T07:57:34+08:002012-02-10T07:57:34+08:00

Use tar and netcat. If they're in the same subnet, and you're not too concerned about security.. This is a pretty neat solution. You can add stuff into the pipeline if you want security.. You could use gpg.. or compress first with gzip

On the receiving end do:

netcat -l -p 7000 | tar x

And on the sending end do:

tar cf - * | netcat otherhost 7000

4

devXen · Answer 3 · 2012-02-09T16:51:06+08:00

devXen

2012-02-09T16:51:06+08:002012-02-09T16:51:06+08:00

You can use Amazon AWS Import/Export service. Ship the drive to them and let them do the copy for you. More expensive, but perfect in your situation when you need have many many GB of data transfer without waiting for weeks for the job to finish. Their link: http://aws.amazon.com/importexport/

1

What's the fastest way to copy millions of files ( hundreds of GBs ) between Amazon EC2 Servers?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?