How can I use docker without sudo?

Question

sankit

Asked: 2016-06-01 22:40:38 +0800 CST2016-06-01 22:40:38 +0800 CST 2016-06-01 22:40:38 +0800 CST

Why does moving some files in a folder take longer than moving the whole folder?

772

I have millions of images on my ubuntu cloud server. When I move a complete folder containing 12 million images using mv command, it happens almost instantaneously. However, when I mv only images(not the folder) then it takes some time. Is there a way to move all the images as quickly as folders ?

This is what is happening:

src folder has 12 million images and I move this to dst folder using
```
$ mv  src ../dst
```
Happens immediately

Inside src folder I do this to move:

find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ {} +

This takes some time.

Is there a way to speed up the second process ?

4 Answers

Voted

dadexix86 · Answer 1 · 2016-06-01T23:15:46+08:00

TL;DR: No

For a smaller amount of files, you would not need find but, even in this simplified and smaller case, if you just

mv *.jpg ../../dst/

it will take more time than moving the whole directory at once.

Why? The point is to understand what mv does.

Briefly speaking, mv moves a number (that identifies a directory, or a file) from an inode (the directory containing it) to another one, and these indices are updated in the journal of the file system or in the FAT (if the file system is implemented in such a way).

If source and destination are on the same file system, there is no actual movement of data, it just changes the position, the point where they are attached to.

So, when you mv one directory, you are doing this operation one time.

But when you move 1 million files, you are doing this operation 1 million times.

To give you a practical example, you have a tree with a many branches. In particular, there is one node to which 1 million branches are attached.
To cut down these branches and move them somewhere else, you can either cut each one of them, so you make 1 million cuts, or you cut just before the node, thus making just one cut (this is the difference between moving the files and the directory).

Zan Lynx · Answer 2 · 2016-06-02T00:28:18+08:00

Zan Lynx

2016-06-02T00:28:18+08:002016-06-02T00:28:18+08:00

It will still be slow because, as noted, the file system has to relink each file name to its new location.

However, you can speed it up from what you have now.

Your find command runs the exec once for each file. So it launches the mv command 12 million times for 12 million files. This can be improved in two ways.

Add a plus to the end:
find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ +
Check the man-page to make sure it's supported in your version of find. The effect should be to run a series of mv commands with as many filenames as will fit on each command-line.
Use find and xargs together.
find -maxdepth 1 -name '*.jpg' -print0 | xargs -0 mv -t ../../dst/
The -print0 will use NUL, aka zero bytes to separate the file names. This plus xargs -0 fixes any problems xargs would otherwise have with spaces in file names. The xargs command will read the list of file names from the find command and run the mv command on as many file names as will fit.

13

Dmitry Grigoryev · Answer 3 · 2016-06-02T01:54:09+08:00

Dmitry Grigoryev

2016-06-02T01:54:09+08:002016-06-02T01:54:09+08:00

Your confusion comes from the file system abstraction which makes you believe that a folder contains files and other folders in a tree-like fashion. This is not actually true: all files and directories within a file system are located on the same level and identified with numbers of some sort, dependent on implementation. Directories are just special files which contain lists of other files.

When you "move" files inside a file system, actual files don't go anywhere. Rather, lists inside directories are updated to reflect the change.

mv src ../dst moves a single list entry from directory . to directory ../dst, so it's fast.

find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ has to move millions of entries, so it's slower. It may potentially be speeded up if you call mv only once and not once per file, and the mv command itself may be optimized to move several directory entries in one step, but there is no way to make it as fast as when you move a single directory.

7

user257256 · Answer 4 · 2016-06-02T09:03:31+08:00

user257256

2016-06-02T09:03:31+08:002016-06-02T09:03:31+08:00

A Simplified answer

moving a file is done is 3 steps:

add() a link to the file to the inode list of the destination folder
check if the link was successfully added
remove() the link from the list of inodes of source folder if the check above was a success.

this process is the same for a file or a folder.
and obviously doing this for 1 file is 100 faster than doing it for 100 files.

man link is the add()
man unlink is the remove()
mv just uses those two commands above and adds a check in-between to prevent data loss.

4

Why does moving some files in a folder take longer than moving the whole folder?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?