I have millions of images on my ubuntu cloud server. When I move a complete folder containing 12 million images using mv
command, it happens almost instantaneously. However, when I mv
only images(not the folder) then it takes some time. Is there a way to move all the images as quickly as folders ?
This is what is happening:
src folder has 12 million images and I move this to dst folder using
$ mv src ../dst
Happens immediately
Inside src folder I do this to move:
find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ {} +
This takes some time.
Is there a way to speed up the second process ?
TL;DR: No
For a smaller amount of files, you would not need
find
but, even in this simplified and smaller case, if you justit will take more time than moving the whole directory at once.
Why? The point is to understand what
mv
does.Briefly speaking,
mv
moves a number (that identifies a directory, or a file) from an inode (the directory containing it) to another one, and these indices are updated in the journal of the file system or in the FAT (if the file system is implemented in such a way).If source and destination are on the same file system, there is no actual movement of data, it just changes the position, the point where they are attached to.
So, when you
mv
one directory, you are doing this operation one time.But when you move 1 million files, you are doing this operation 1 million times.
To give you a practical example, you have a tree with a many branches. In particular, there is one node to which 1 million branches are attached.
To cut down these branches and move them somewhere else, you can either cut each one of them, so you make 1 million cuts, or you cut just before the node, thus making just one cut (this is the difference between moving the files and the directory).
It will still be slow because, as noted, the file system has to relink each file name to its new location.
However, you can speed it up from what you have now.
Your find command runs the exec once for each file. So it launches the
mv
command 12 million times for 12 million files. This can be improved in two ways.Add a plus to the end:
find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ +
Check the man-page to make sure it's supported in your version of
find
. The effect should be to run a series ofmv
commands with as many filenames as will fit on each command-line.Use
find
andxargs
together.find -maxdepth 1 -name '*.jpg' -print0 | xargs -0 mv -t ../../dst/
The
-print0
will use NUL, aka zero bytes to separate the file names. This plusxargs -0
fixes any problemsxargs
would otherwise have with spaces in file names. Thexargs
command will read the list of file names from thefind
command and run themv
command on as many file names as will fit.Your confusion comes from the file system abstraction which makes you believe that a folder contains files and other folders in a tree-like fashion. This is not actually true: all files and directories within a file system are located on the same level and identified with numbers of some sort, dependent on implementation. Directories are just special files which contain lists of other files.
When you "move" files inside a file system, actual files don't go anywhere. Rather, lists inside directories are updated to reflect the change.
mv src ../dst
moves a single list entry from directory.
to directory../dst
, so it's fast.find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/
has to move millions of entries, so it's slower. It may potentially be speeded up if you callmv
only once and not once per file, and themv
command itself may be optimized to move several directory entries in one step, but there is no way to make it as fast as when you move a single directory.A Simplified answer
moving a file is done is 3 steps:
this process is the same for a file or a folder.
and obviously doing this for 1 file is 100 faster than doing it for 100 files.
man link
is the add()man unlink
is the remove()mv
just uses those two commands above and adds a check in-between to prevent data loss.