I have a list of files I need to copy on a Linux system - each file ranges from 10 to 100GB in size.
I only want to copy to the local filesystem. Is there a way to do this in parallel - with multiple processes each responsible for copying a file - in a simple manner?
I can easily write a multithreaded program to do this, but I'm interested in finding out if there's a low level Linux method for doing this.
If you system is not thrashed by it (e.g. maybe the files are in cache) then GNU Parallel http://www.gnu.org/software/parallel/ may work for you:
This will run 10 concurrent
cp
s.Pro: It is simple to read.
Con: GNU Parallel is not standard on most systems - so you probably have to install it.
If you want to keep the directory structure:
Watch the intro video for more info: http://www.youtube.com/watch?v=OpaiGYxkSuQ
See also https://oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster/ for a discussion of parallel disk I/O.
There is no low-level mechanism for this for a very simple reason: doing this will destroy your system performance. With platter drives each write will contend for placement of the head, leading to massive I/O wait. With SSDs, this will end up saturating one or more of your system buses, causing other problems.
As mentioned, this is a terrible idea. But I believe everyone should be able to implement their own horrible plans, sooo...
for FILE in *;do cp $FILE <destination> &;done
The asterisk can be replaced with a regular expression of your files, or
$(cat <listfile>)
if you've got them all in a text document. The ampersand kicks off a command in the background, so the loop will continue, spawning off more copies.As mentioned, this will completely annihilate your IO. So...I really wouldn't recommend doing it.
--Christopher Karel
The only answer that will not trash your machine's responsiveneess isn't exactly a 'copy', but it is very fast. If you won't be editing the files in the new or old location, then a hard link is effectively like a copy, and (only) if you're on the same filesystem, they are created very very very fast.
Check out
cp -l
and see if it will work for you.Here's a distributed/parallel and decentralized file copy tool that will chunk up the file and copy all of the chunks in parallel. It'll probably only help you if you have an SSD that supports multiple streams or some sort of setup with multiple disk heads.
https://github.com/hpc/dcp
For the people who think that's not a great idea, I would say it depend. You can have a big raid system or a parallel filesystem which will deliver really better performance than one cp process can handle. Then yes, you need to use a "parallel tool".
Let's take this example :
then this
so each syscall write made by "cp" in this case is 64KiB and for 10s on my system I am able to deliver this bandwidth : 65536*166222/10 = 1089352499 =~ 1,08GB/s
Now, let's launch this workload with 2 process ( I have 4 core but my desktop is used for other stuff, and here it's just an example ) :
So we see we are able to near double the performance using 2 core to launch this.
So if we are in a context different than 1xHard drive to 1xHard drive but a raid array ( or multiple NVMe so not the most common case I agree but I work on this every day ), it show definitely a better performance to use multiple common in parallel.
You should try this:
This will copy the file passwd 3 times from /etc/ directory to your $HOME
Or if your file it is in your home directory
This will copy the file passwd 3 times into your $HOME