I'd like to delete an nginx cache directory, which I quickly purged by:
mv cache cache.bak
mkdir cache
service nginx restart
Now I have a cache.bak
folder which has 2 million files. I'd like to delete it, without disturbing the server.
A simple rm -rf cache.bak
trashes the server, even the simplest HTTP response takes 16 seconds while rm is running, so I cannot do that.
I tried ionice -c3 rm -rf cache.bak
, but it didn't help. The server has an HDD, not an SSD, probably on an SSD these might not be a problem.
I believe the best solution would be some kind of throttling, like how nginx's built in cache manager does.
How would you solve this? Is there any tool which can do exactly this?
ext4 on Ubuntu 16.04
Make a bash script like this:
Save it with name
deleter.sh
for example. Runchmod u+x deleter.sh
to make it executable.This script deletes all files passed to it as arguments, and then sleeps 0.5 seconds.
Then, you can run
This command retrieves a list of all files in cache.bak and passes the five filenames at a time to the delete script.
So, you can adjust how many files are deleted at a time, and how long a delay is between each delete operation.
You should consider saving your cache on a separate filesystem that you can mount/unmount as someone stated in comments. Until you do, you can use this one liner
/usr/bin/find /path/to/files/ -type f -print0 -exec sleep 0.2 \; -exec echo \; -delete
assuming your find binary is located under /usr/bin and you want to see the progress on screen. Adjust the sleep accordingly, so you don't over stress your HDD.You may want to try ionice on a script consuming a the output of a find command. Something like the following:
Depending on the filesystem each file delete may result in rewriting that entire directory. For large directories that can be quite a hit. There are additional updates required to the inode table, and possibly a free space list.
If the file system has a journal, changes are written to the journal; applied; and removed from the journal. This increases I/O requirements for write intensive activity.
You may want to use a filesystem without a journal for the cache.
Instead of ionice, you can use a sleep command to rate limit the actions. This will work even if ionice does not, but it will take a long time to delete all your files.
I got many useful answers / comments here, which I'd like to conclude as well as show my solution as well.
Yes, the best way to prevent such thing happening is to keep the cache dir on a separate filesystem. Nuking / quick formatting a file system always takes a few seconds (maybe minutes) at most, unrelated to how many files / dirs were present on it.
The
ionice
/nice
solutions didn't do anything, because the deleting process actually caused almost no I/O. What caused the I/O was I believe kernel / filesystem level queues / buffers filling up when files were deleted too quickly by the delete process.The way I solved it is similar to Tero Kilkanen's solution, but didn't require calling a shell script. I used rsync's built in
--bwlimit
switch to limit the speed of deleting.Full command was:
Now bwlimit specifies bandwidth in kilobyes, which in this case applied to the filename or path of the files. By setting it to 1 KBps, it was deleting around 100,000 files per hour, or 27 files per second. Files had relative paths like
cache.bak/e/c1/db98339573acc5c76bdac4a601f9ec1e
, which is 47 characters long, so it would give 1000/47 ~= 21 files per second, so kind of similar to my guess of 100,000 files per hour.Now why
--bwlimit=1
? I tried various values:I like the simplicity of rsync's built in method, but this solution depends on the relative path's length. Not a big problem as most people would find the right value via trial and error.