I have ext3 filesystem mounted with default options. On it I have some ~ 100GB files.
Removal of any of such files takes long time (8 minutes) and causes a lot of io traffic, which increases load on server.
Is there any way to make the rm not as disruptive?
Upgrade to ext4 or some other modern filesystem that uses extents. Since ext3 uses the indirect blocks scheme rather than extents, deleting large files inevitably entails lots of work.
The most interesting answer was originally buried in a comment on the question. Here it is as a first class answer to make it more visible:
That link is an incredibly thorough analysis of the exploration for and discovery of a workable solution.
Note also:
The article says:
which is true, but user TafT says if you want no disruption then
-c3
'idle' would be a better choice than-c2
'best-effort'. He has used-c3
to build in the background and has found it to work well without causing the build to wait for ever. If you really do have 100% io usage then-c3
will not let the delete ever complete but he doesn't expect that is what you have based on the worked test.You can give ionice a try. It won't make it faster but it might make it less disruptive.
In terms of efficiency, using one rm per file is not optimal, as it requires a fork and exec for each rm.
Assuming you have a list.txt containing the files you want to remove this would be more efficient but it's still gonna be slow:
Another approach would be to :
nice -20 xargs -i rm {} < list.txt
(this will take less time but will affect your system greatly :)
or
I don't know how fast this would be but:
or
Create a special mount point with a fast filesystem (using a loop device ?) , use that to store and delete your Huge files.
(maybe move the files there before you delete them, maybe it's faster or maybe just unmount it when you want files gone)
or
cat /dev/null > /file/to/be/deleted
(so it's zero-sized now) and if you want it to disappear justrm -rf <file>
nowor even better
drop the cat and just do
# > /file/to/be/emptied
I had problems getting the directory to delete at a reasonable pace, turns out the process was locking the disk and creating a pileup of processes trying to access the disk. ionice didn't work, it just continued to use 99% of the disk IO and locked all the other processes out.
Here's the Python code that worked for me. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their work, then continues. Works great.
My two cents.
I ve already got this issue. "In sequential script that have to run fast, the process do remove a lot of file" .. So the "rm" will make that script speed close to the IO wait/exec time.
So to make thing quicker , I ve added another process (bash script) launched per cron.. like a garbage collector it remove all files in a particular directory.
Then I've updated the original script by replacing the "rm" by a mv to a "garbage folder" (rename the file by adding a counter at the end of its name to avoid collision).
This works for me, the script run a least 3 time faster. but it works well only if garbage folder and original file are under the same mount point (same device) to avoid file copy. (mv on same device consume less IO than rm)
Hope that help..
Also note that the answer by Dennis Williamson, who suggests ionice as a workaround for the load, will work only if your block device uses the CFQ io scheduler.
/dev/null is a file not a directory. Can't move a file, to a file, or you risk overwriting it.
I don't think this is practical. It would use unnecessarily more I/O than the OP would like.
You could try creating a loop file system to store your backups on.
Then, when you want to clear out the backups:
Presto! The entire virtual file system is cleared out in a matter of moments.
You can use multitheading whith xargs
where 30 is the number of threads that you want to create. If you are using zero, the system creates maximum threads available to the user executing the task.