Ping a Specific Port

Question

user13185

Asked: 2010-04-01 01:24:19 +0800 CST2010-04-01 01:24:19 +0800 CST 2010-04-01 01:24:19 +0800 CST

How to make `rm` faster on ext3/linux?

772

I have ext3 filesystem mounted with default options. On it I have some ~ 100GB files.

Removal of any of such files takes long time (8 minutes) and causes a lot of io traffic, which increases load on server.

Is there any way to make the rm not as disruptive?

11 Answers

Voted

janneb · Answer 1 · 2010-04-01T02:57:46+08:00

janneb

2010-04-01T02:57:46+08:002010-04-01T02:57:46+08:00

Upgrade to ext4 or some other modern filesystem that uses extents. Since ext3 uses the indirect blocks scheme rather than extents, deleting large files inevitably entails lots of work.

18

Matt McClure · Answer 2 · 2012-01-04T19:33:41+08:00

Best Answer

Matt McClure

2012-01-04T19:33:41+08:002012-01-04T19:33:41+08:00

The most interesting answer was originally buried in a comment on the question. Here it is as a first class answer to make it more visible:

Basically no method from here worked, so we developed our own. Described it in here: http://www.depesz.com/index.php/2010/04/04/how-to-remove-backups/ – depesz Apr 6 '10 at 15:15

That link is an incredibly thorough analysis of the exploration for and discovery of a workable solution.

Note also:

The article says:

As you can see, I used -c2 -n7 options to ionice, which seem sane.

which is true, but user TafT says if you want no disruption then -c3 'idle' would be a better choice than -c2 'best-effort'. He has used -c3 to build in the background and has found it to work well without causing the build to wait for ever. If you really do have 100% io usage then -c3 will not let the delete ever complete but he doesn't expect that is what you have based on the worked test.

14

Dennis Williamson · Answer 3 · 2010-04-01T01:46:07+08:00

Dennis Williamson

2010-04-01T01:46:07+08:002010-04-01T01:46:07+08:00

You can give ionice a try. It won't make it faster but it might make it less disruptive.

6

user126330 · Answer 4 · 2010-04-01T01:36:25+08:00

user126330

2010-04-01T01:36:25+08:002010-04-01T01:36:25+08:00

In terms of efficiency, using one rm per file is not optimal, as it requires a fork and exec for each rm.

Assuming you have a list.txt containing the files you want to remove this would be more efficient but it's still gonna be slow:

xargs -i rm {} < list.txt

Another approach would be to : nice -20 xargs -i rm {} < list.txt
(this will take less time but will affect your system greatly :)

or

I don't know how fast this would be but:

mv <file-name> /dev/null

or

Create a special mount point with a fast filesystem (using a loop device ?) , use that to store and delete your Huge files.
(maybe move the files there before you delete them, maybe it's faster or maybe just unmount it when you want files gone)

or

cat /dev/null > /file/to/be/deleted (so it's zero-sized now) and if you want it to disappear just rm -rf <file> now

or even better

drop the cat and just do # > /file/to/be/emptied

4

Nick Woodhams · Answer 5 · 2012-12-24T07:50:21+08:00

Nick Woodhams

2012-12-24T07:50:21+08:002012-12-24T07:50:21+08:00

I had problems getting the directory to delete at a reasonable pace, turns out the process was locking the disk and creating a pileup of processes trying to access the disk. ionice didn't work, it just continued to use 99% of the disk IO and locked all the other processes out.

Here's the Python code that worked for me. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their work, then continues. Works great.

import os, os.path
import time

for root, dirs, files in os.walk('/dir/to/delete/files'):
    file_num = 0
    for f in files:
        fullpath = os.path.join(root, f)
        os.remove(fullpath)
        if file_num%500 == 1:
            time.sleep(2)
            print "Deleted %i files" % file_num
        file_num = file_num + 1

1

Emmanuel Devaux · Answer 6 · 2015-01-31T01:51:53+08:00

Emmanuel Devaux

2015-01-31T01:51:53+08:002015-01-31T01:51:53+08:00

My two cents.

I ve already got this issue. "In sequential script that have to run fast, the process do remove a lot of file" .. So the "rm" will make that script speed close to the IO wait/exec time.

So to make thing quicker , I ve added another process (bash script) launched per cron.. like a garbage collector it remove all files in a particular directory.

Then I've updated the original script by replacing the "rm" by a mv to a "garbage folder" (rename the file by adding a counter at the end of its name to avoid collision).

This works for me, the script run a least 3 time faster. but it works well only if garbage folder and original file are under the same mount point (same device) to avoid file copy. (mv on same device consume less IO than rm)

Hope that help..

1

famzah · Answer 7 · 2010-04-01T07:10:57+08:00

famzah

2010-04-01T07:10:57+08:002010-04-01T07:10:57+08:00

Also note that the answer by Dennis Williamson, who suggests ionice as a workaround for the load, will work only if your block device uses the CFQ io scheduler.

0

Felipe Alvarez · Answer 8 · 2010-05-22T02:22:39+08:00

Felipe Alvarez

2010-05-22T02:22:39+08:002010-05-22T02:22:39+08:00

mv <file-name> /dev/null

/dev/null is a file not a directory. Can't move a file, to a file, or you risk overwriting it.

Create a special mount point with a fast filesystem (using a loop device ?) , use that to store and delete your Huge files. (maybe move the files there before you delete them, maybe it's faster or maybe just unmount it when you want files gone)

I don't think this is practical. It would use unnecessarily more I/O than the OP would like.

0

amphetamachine · Answer 9 · 2011-02-22T17:40:11+08:00

amphetamachine

2011-02-22T17:40:11+08:002011-02-22T17:40:11+08:00

You could try creating a loop file system to store your backups on.

# dd if=/dev/zero of=/path/to/virtualfs bs=100M count=1024 # 100 MB * 1024 = 100 GB
# mke2fs /path/to/virtualfs
# mount -t ext2 /path/to/virtualfs /mnt/backups -o loop

Then, when you want to clear out the backups:

# umount /mnt/backups
# mke2fs /path/to/virtualfs
# mount -t ext2 /path/to/virtualfs /mnt/backups -o loop

Presto! The entire virtual file system is cleared out in a matter of moments.

0

Juan Carlos · Answer 10 · 2012-03-13T01:57:24+08:00

Juan Carlos

2012-03-13T01:57:24+08:002012-03-13T01:57:24+08:00

You can use multitheading whith xargs

find . -type f | xargs -P 30 rm -rf

where 30 is the number of threads that you want to create. If you are using zero, the system creates maximum threads available to the user executing the task.

0

How to make `rm` faster on ext3/linux?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?