Can anyone explain in layman's terms what the difference between soft and hard limit is?
Should I set my soft and hard limit to be the same? Or should soft be significantly lower? Does the system benefit either way?
Background: physical server, about two years old, 7200-RPM SATA drives connected to a 3Ware RAID card, ext3 FS mounted noatime and data=ordered, not under crazy load, kernel 2.6.18-92.1.22.el5, uptime 545 days. Directory doesn't contain any subdirectories, just millions of small (~100 byte) files, with some larger (a few KB) ones.
We have a server that has gone a bit cuckoo over the course of the last few months, but we only noticed it the other day when it started being unable to write to a directory due to it containing too many files. Specifically, it started throwing this error in /var/log/messages:
ext3_dx_add_entry: Directory index full!
The disk in question has plenty of inodes remaining:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda3 60719104 3465660 57253444 6% /
So I'm guessing that means we hit the limit of how many entries can be in the directory file itself. No idea how many files that would be, but it can't be more, as you can see, than three million or so. Not that that's good, mind you! But that's part one of my question: exactly what is that upper limit? Is it tunable? Before I get yelled at—I want to tune it down; this enormous directory caused all sorts of issues.
Anyway, we tracked down the issue in the code that was generating all of those files, and we've corrected it. Now I'm stuck with deleting the directory.
A few options here:
rm -rf (dir)
I tried this first. I gave up and killed it after it had run for a day and a half without any discernible impact.
while [ true ]; do ls -Uf | head -n 10000 | xargs rm -f 2>/dev/null; done )
This is actually the shortened version; the real one I'm running, which just adds some progress-reporting and a clean stop when we run out of files to delete, is:
export i=0; time ( while [ true ]; do ls -Uf | head -n 3 | grep -qF '.png' || break; ls -Uf | head -n 10000 | xargs rm -f 2>/dev/null; export i=$(($i+10000)); echo "$i..."; done )
This seems to be working rather well. As I write this, it has deleted 260,000 files in the past thirty minutes or so.
ls -U
, and it took perhaps ten minutes to delete the first 10,000 entries with the command in #3, but now it's hauling along quite happily? For that matter, it deleted 260,000 in about thirty minutes, but it's now taken another fifteen minutes to delete 60,000 more. Why the huge swings in speed?find
that are not going to be significantly faster than my approach for several self-evident reasons. But does the delete-via-fsck idea have any legs? Or something else entirely? I'm eager to hear out-of-the-box (or inside-the-not-well-known-box) thinking.Final script output!:
2970000...
2980000...
2990000...
3000000...
3010000...
real 253m59.331s
user 0m6.061s
sys 5m4.019s
So, three million files deleted in a bit over four hours.
Occasionally I come across servers (Windows 2003 and 2008) with high processor % interrupt time. Is there a way to see what program or device is causing the interrupts?
Is there a standard way to list the parameter values of a loaded Linux module? I'm essentially probing for another answer to this Linux kernel module parameters question, because the module I'm interested in doesn't have a /sys/modules/<module_name>/parameters
interface.