Ping a Specific Port

Question

Neel

Asked: 2019-09-09 23:00:58 +0800 CST2019-09-09 23:00:58 +0800 CST 2019-09-09 23:00:58 +0800 CST

Migrating 300GB Data from Linux Server to S3 Bucket

772

I have a linux dedicated server that has 300GB of uploaded files that I need to transfer to AWS Storage S3 since I am now changing the uploads to be saved in S3 bucket instead of local disk. I read that I can do the transfer using aws cli command to copy the directory over to the S3 bucket. My questions are:

When I do the cp command from aws cli, how long roughly can it take for a dedicated server to transfer 300GB of data over to S3 bucket? Both S3 and the server are in the same region.

These are my server specs:

RAID Policy Raid 1
Operating System    Cloud Linux
HDD Bay 1   480GB SSD
HDD Bay 2   480GB SSD
Network Bandwidth   10TB
CPU 6 Core E5-2620v2 - 2.00Ghz x2
RAM 64 GB

I completely understand there are many variables, but want to get an idea on rough estimation from people who have migrated data from linux server to S3 storage.

When I use the aws cli cp command, does it show the progress during that time? What happens if I get disconnected from SSH when the command is still running?
Is it safer for me to run the aws cli cp command using screen command?
During the transfer, will the server performance take a hit? This server has couple of websites running so do I need to take the site offline when during the data transfer or can I safely run the transfer even when the sites are live?

2 Answers

Voted

MLu · Answer 1 · 2019-09-09T23:31:08+08:00

MLu

2019-09-09T23:31:08+08:002019-09-09T23:31:08+08:00

300GB is not that much. The SSD disks can do ca 100MB/s read and if you are on 1Gbps network that’s roughly 100MB/s as well. So your 300GB should take around an hour to upload.

Yes it will show a progress, yes run it in screen and yes it will load up the server. On the other hand it’s only for an hour.

Hope that helps :)

4

Tim · Answer 2 · 2019-09-10T00:54:26+08:00

MLu's answer is good, this is in addition rather than instead of his answer.

Like MLu said, 300GB is not much, and won't take long. I've copied 1TB from New Zealand to Sydney S3 on a connection with 35ms latency and about 350Mbps bandwidth available, from memory it took about 4-6. You likely have more bandwidth and less latency. Using about 80 threads it used from memory about 100% of a Xeon core, so not much.

You might consider the s3 sync command, as if you need to stop it you can restart again more easily, rather than restarting the copy.

On a busy production server I would tune the s3 config file something like this. It will reduce the bandwidth and CPU usage at the expense of speed. This goes into ~.aws\configure or c:\users\username.aws\config . If you use a CLI profile this goes into that profile, not as default.

Config for few larger files

[default]
region = us-west-2 
output = json
s3 =
  max_bandwidth = 50MB/s
  max_concurrent_requests = 5
  max_queue_size = 100
  multipart_chunksize = 75MB
  multipart_threshold = 200MB

Config for many small files

[default]
region = us-west-2 
output = json
s3 =
  max_bandwidth = 50MB/s
  max_concurrent_requests = 5
  max_queue_size = 1000
  multipart_chunksize = 75MB
  multipart_threshold = 100MB

This reduces CPU / bandwidth from the default 10 concurrent requests, 1000 queue size, and imposes a 50MB/sec bandwidth limit (400Mbps). Tweak those however you like - 10 threads might be fine. I tend to upload large data files 1GB or more so I use larger chunks and a smaller queue, but if your files are smaller delete the last three lines.

Two directly answer your questions

One to four hours
Yes. Use "s3 sync" so you can more easily restart. If you run eg "s3://bucket-name/ \opt\data &" (note the &) I think it will keep running if your ssh session drops.
No idea - MLu says yes
As I said above, I used 60 - 80 threads and it used about one full Xeon core. If you use fewer threads it will use less resources. All in all it's not very resource intensive. It's quite intensive for the first few minutes while it queues up files, then occasionally CPU spikes while it queues more files

Migrating 300GB Data from Linux Server to S3 Bucket

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?