Ping a Specific Port

Question

nagylzs

Asked: 2016-09-05 22:02:15 +0800 CST2016-09-05 22:02:15 +0800 CST 2016-09-05 22:02:15 +0800 CST

Delete 10M+ files from ZFS, effectively

772

I have written a buggy program that has accidentally created about 30M files under /tmp. (The bug was introduced some weeks ago, and it was creating a couple of subdirectories per second.) I could rename /tmp to /tmp2, and now I need to delete the files. The system is FreeBSD 10, the root filesystem is zfs.

Meanwhile one of the drives in the mirror went wrong, and I have replaced it. The drive has two 120GB SSD disks.

Here is the question: replacing the hard drive and resilvering the whole array took less than an hour. Deleting files /tmp2 is another story. I have written another program to remove the files, and it can only delete 30-70 subdirectories per second. It will take 2-4 days to delete all files.

How is it possible that resilvering the whole array takes an hour, but deleting from the disk takes 4 days? Why do I have so bad performance? 70 deletions/second seems very very bad performance.

I could delete the inode for /tmp2 manually, but that will not free up the space, right?

Could this be a problem with zfs, or the hard drives or what?

8 Answers

Voted

ewwhite · Answer 1 · 2016-09-05T23:05:28+08:00

Best Answer

ewwhite

2016-09-05T23:05:28+08:002016-09-05T23:05:28+08:00

Deletes in ZFS are expensive. Even more so if you have deduplication enabled on the filesystem (since dereferencing deduped files is expensive). Snapshots could complicate matters too.

You may be better off deleting the /tmp directory instead of the data contained within.

If /tmp is a ZFS filesystem, delete it and create again.

35

Phill W. · Answer 2 · 2016-09-06T03:33:44+08:00

Phill W.

2016-09-06T03:33:44+08:002016-09-06T03:33:44+08:00

How is it possible that resilvering the whole array takes an hour, but deleting from the disk takes 4 days?

Consider an office building.

Removing all of the computers and furniture and fixings from all the offices on all the floors takes a long time, but leaves the offices immediately usable by another client.

Demolishing the whole building with RDX is a whole lot quicker, but the next client is quite likely to complain about how drafty the place is.

29

Ian Howson · Answer 3 · 2016-09-06T22:28:57+08:00

There's a number of things going on here.

First, all modern disk technologies are optimised for bulk transfers. If you need to move 100MB of data, they'll do it much faster if they're in one contiguous block instead of scattered all over the place. SSDs help a lot here, but even they prefer data in contiguous blocks.

Second, resilvering is pretty optimal as far as disk operations goes. You read a massive contiguous chunk of data from one disk, do some fast CPU ops on it, then rewrite it in another big contiguous chunk to another disk. If power fails partway through, no big deal - you'll just ignore any data with bad checksums and carry on as per normal.

Third, deleting a file is really slow. ZFS is particularly bad, but practically all filesystems are slow to delete. They must modify a large number of different chunks of data on the disk and time it correctly (i.e. wait) so the filesystem is not damaged if power fails.

How is it possible that resilvering the whole array takes an hour, but deleting from the disk takes 4 days?

Resilvering is something that disks are really fast at, and deletion is something that disks are slow at. Per megabyte of disk, you only have to do a little bit of resilvering. You might have a thousand files in that space which need to be deleted.

70 deletions/second seems very very bad performance

It depends. I would not be surprised by this. You haven't mentioned what type of SSD you're using. Modern Intel and Samsung SSDs are pretty good at this sort of operation (read-modify-write) and will perform better. Cheaper/older SSDs (e.g. Corsair) will be slow. The number of I/O operations per second (IOPS) is the determining factor here.

ZFS is particularly slow to delete things. Normally, it will perform deletions in the background so you don't see the delay. If you're doing a huge number of them it can't hide it and must delay you.

Appendix: why are deletions slow?

Deleting a file requires a several steps. The file metadata must be marked as 'deleted', and eventually it must be reclaimed so the space can be reused. ZFS is a 'log structured filesystem' which performs best if you only ever create things, never delete them. The log structure means that if you delete something, there's a gap in the log and so other data must be rearranged (defragmented) to fill the gap. This is invisible to the user but generally slow.
The changes must be made in such a way that if power were to fail partway through, the filesystem remains consistent. Often, this means waiting until the disk confirms that data really is on the media; for an SSD, that can take a long time (hundreds of milliseconds). The net effect of this is that there is a lot more bookkeeping (i.e. disk I/O operations).
All of the changes are small. Instead of reading, writing and erasing whole flash blocks (or cylinders for a magnetic disk) you need to modify a little bit of one. To do this, the hardware must read in a whole block or cylinder, modify it in memory, then write it out to the media again. This takes a long time.

Ole Tange · Answer 4 · 2016-09-08T04:10:49+08:00

Ole Tange

2016-09-08T04:10:49+08:002016-09-08T04:10:49+08:00

Ian Howson gives a good answer on why it is slow.

If you delete files in parallel you may see an increase in speed due to the deletion may use the same blocks and thus can save rewriting the same block many times.

So try:

find /tmp -print0 | parallel -j100 -0 -n100 rm

and see if that performs better than your 70 deletes per second.

7

AnoE · Answer 5 · 2016-09-06T07:13:59+08:00

AnoE

2016-09-06T07:13:59+08:002016-09-06T07:13:59+08:00

How is it possible that resilvering the whole array takes an hour, but deleting from the disk takes 4 days?

It is possible because the two operations work on different layers of the file system stack. Resilvering can run low-level and does not actually need to look at individual files, copying large chunks of data at a time.

Why do I have so bad performance? 70 deletions/second seems very very bad performance.

It does have to do a lot of bookkeeping...

I could delete the inode for /tmp2 manually, but that will not free up the space, right?

I don't know for ZFS, but if it could automatically recover from that, it would likely, in the end, do the same operations you are already doing, in the background.

Could this be a problem with zfs, or the hard drives or what?

Does zfs scrub say anything?

2

bwDraco · Answer 6 · 2016-09-07T09:44:40+08:00

bwDraco

2016-09-07T09:44:40+08:002016-09-07T09:44:40+08:00

Deleting lots of files is never really a fast operation.

In order to delete a file on any filesystem, you need to read the file index, remove (or mark as deleted) the file entry in the index, remove any other metadata associated with the file, and mark the space allocated for the file as unused. This has to be done individually for each file to be deleted, which means deleting lots of files requires lots of small I/Os. To do this in a manner which ensures data integrity in the event of power failure adds even more overhead.

Even without the peculiarities ZFS introduces, deleting 30 million files typically means over a hundred million separate I/O operations. This will take a long time even with a fast SSD. As others have mentioned, the design of ZFS further compounds this issue.

2

peter · Answer 7 · 2016-09-06T02:29:50+08:00

peter

2016-09-06T02:29:50+08:002016-09-06T02:29:50+08:00

Very simple if you invert your thinking.

Get a second drive (you seem to have this already)
Copy everything from drive A to drive B with rsync, excluding the /tmp directory. Rsync will be slower than a block copy.
Reboot, using drive B as the new boot volume
Reformat drive A.

This will also defragment your drive and give you a fresh directory (fine, defrag is not so important with an SSD but linearizing your files never hurt anything)

0

Paul Smith · Answer 8 · 2016-09-07T04:12:34+08:00

You have 30million entries in an unsorted list. You scan the list for the entry you want to remove and you remove it. Now you have only 29,999,999 entries in your unsorted list. If they are all in /tmp, why not just reboot?

Edited to reflect the information in the comments: Statement of problem: Removing most, but not all, of the 30M+ incorrectly created files in /tmp is taking a long time.
Problem 1) Best way to remove large numbers of unwanted files from /tmp.
Problem 2) Understanding why it is so slow to delete files.

Solution 1) - /tmp is reset to empty at boot by most *nix distributions. FreeBSD however, is not one of them.
Step 1 - copy interesting files somewhere else.
Step 2 - As root

 $ grep -i tmp /etc/rc.conf  
 clear_tmp_enable="YES" # Clear /tmp at startup.

Step 3 - reboot.
Step 4 - change clear_tmp_enable back to "No".
Unwanted files are now gone as ZFS on FreeBSD has the feature that "Destroying a dataset is much quicker than deleting all of the files that reside on the dataset, as it does not involve scanning all of the files and updating all of the corresponding metadata." so all it has to do at boot time is reset the metadata for the /tmp dataset. This is very quick.

Solution 2) Why is it so slow? ZFS is a wonderful file system which includes such features as constant time directory access. This works well if you know what you are doing, but the evidence suggests that the OP is not a ZFS expert. The OP has not indicated how they were attempting to remove the files, but at a guess, I would say they used a variation on "find regex -exec rm {} \;". This works well with small numbers but does not scale because there are three serial operations going on 1) get the list of available files (returns 30 million files in hash order), 2) use regex to pick the next file to be deleted, 3) tell the OS to find and remove that file from a list of 30 million. Even if ZFS returns a list from memory and if 'find' caches it, the regex still has to identify the next file to be processed from the list and then tell the OS to update its metadata to reflect that change and then update the list so it isn't processed again.

Delete 10M+ files from ZFS, effectively

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?