I have a linux dedicated server that has 300GB of uploaded files that I need to transfer to AWS Storage S3 since I am now changing the uploads to be saved in S3 bucket instead of local disk. I read that I can do the transfer using aws cli command to copy the directory over to the S3 bucket. My questions are:
- When I do the
cp
command from aws cli, how long roughly can it take for a dedicated server to transfer 300GB of data over to S3 bucket? Both S3 and the server are in the same region.
These are my server specs:
RAID Policy Raid 1
Operating System Cloud Linux
HDD Bay 1 480GB SSD
HDD Bay 2 480GB SSD
Network Bandwidth 10TB
CPU 6 Core E5-2620v2 - 2.00Ghz x2
RAM 64 GB
I completely understand there are many variables, but want to get an idea on rough estimation from people who have migrated data from linux server to S3 storage.
When I use the aws cli
cp
command, does it show the progress during that time? What happens if I get disconnected from SSH when the command is still running?Is it safer for me to run the aws cli
cp
command usingscreen
command?During the transfer, will the server performance take a hit? This server has couple of websites running so do I need to take the site offline when during the data transfer or can I safely run the transfer even when the sites are live?
300GB is not that much. The SSD disks can do ca 100MB/s read and if you are on 1Gbps network that’s roughly 100MB/s as well. So your 300GB should take around an hour to upload.
Yes it will show a progress, yes run it in
screen
and yes it will load up the server. On the other hand it’s only for an hour.Hope that helps :)
MLu's answer is good, this is in addition rather than instead of his answer.
Like MLu said, 300GB is not much, and won't take long. I've copied 1TB from New Zealand to Sydney S3 on a connection with 35ms latency and about 350Mbps bandwidth available, from memory it took about 4-6. You likely have more bandwidth and less latency. Using about 80 threads it used from memory about 100% of a Xeon core, so not much.
You might consider the s3 sync command, as if you need to stop it you can restart again more easily, rather than restarting the copy.
On a busy production server I would tune the s3 config file something like this. It will reduce the bandwidth and CPU usage at the expense of speed. This goes into ~.aws\configure or c:\users\username.aws\config . If you use a CLI profile this goes into that profile, not as default.
Config for few larger files
Config for many small files
This reduces CPU / bandwidth from the default 10 concurrent requests, 1000 queue size, and imposes a 50MB/sec bandwidth limit (400Mbps). Tweak those however you like - 10 threads might be fine. I tend to upload large data files 1GB or more so I use larger chunks and a smaller queue, but if your files are smaller delete the last three lines.
Two directly answer your questions
One to four hours
Yes. Use "s3 sync" so you can more easily restart. If you run eg "s3://bucket-name/ \opt\data &" (note the &) I think it will keep running if your ssh session drops.
No idea - MLu says yes
As I said above, I used 60 - 80 threads and it used about one full Xeon core. If you use fewer threads it will use less resources. All in all it's not very resource intensive. It's quite intensive for the first few minutes while it queues up files, then occasionally CPU spikes while it queues more files