I have a 3-level directory structure defined by 2 hex digits as such:
0A/FF/2B/someimagefile.gif
I have 300M small files in 1.5TB of compressed files that will populate these directories (we will have more files to come in the future, so I chose the directory structure to keep the mass of files from crashing a typical extX filesystem).
Unpacking these files moves at 1MB per second (or ~18 days to unpack). Ouchie!
I guess it was slow because I was creating the directory structure and then the files (done from Java APIs). So I set out to just create the directory structure alone in a bash loop.
The directories alone is about a 5 day task at current rate.
Any ideas on improving the speed that this moves?
UPDATE
One part of the puzzle is solved, using perl, rather than bash, creates the directories over 200 times faster, now it's an operation that give you a coffee break, not an extended weekend off.
But file creation is still extremely slow, even without needing to create the directories.
My final answer to this: "Don't do it".
I could not find a way to improve the speed beyond about 2Mbytes/sec when creating many small files. For terrabyte data volumes this is just too much inertia to work against.
We are following in the footsteps of facebook and dumping the files to a binary data store (or using a massive mysql/myisam table with BLOBs, experimenting now...).
It's a bit more complex, but eliminates the random seek problem associated with small files, and I can work with terrabyte volumes of data in a matter of hours, or a day, rather than weeks.
MongoDB has come in as another good option to investigate.
remount the filesystem with the options of noatime, nodiratime