I've got an EXT3 formatted drive on a Linux CentOS server. This is a web app data drive and contains a directory for every user account ( there are 25,000 users ). Each folder contains files that that user has uploaded. Overall, this drive has roughly 250GB of data on it.
Does structuring the drive with all these directories impact drive read/write performance? Does it impact some other performance aspect I'm not aware of?
Is there anything inherently wrong or bad with structuring things this way? Perhaps just the wrong choice of filesystem?
I've recently tried merging two data drives and realized that EXT3 is limited to 32,000 subdirectories. This got me wondering why. It seems silly that I built it this way, considering each file has a unique id that corresponds to an id in the database. Alas ...
This is easy to test the options for yourself, in your environment and compare the results. Yes, there is a negative impact on performance as the number of directories increases. Yes, other filesystems can help get around those barriers or reduce the impact.
The XFS filesystem is better for this type of directory structure. ext4 is probably just fine nowadays. Access and operations on the directory will simply slow down as the number of subdirectories and files increase. This is very pronounced under ext3 and not so much on XFS.
The answer isn't as simple as the choice of filesystem. Sane filesystems stopped using linear lists for directories long ago, meaning that the number of entries in a directory doesn't affect file access time....
except when it does.
In fact, each operation stays fast and efficient no matter the number of entries, but some tasks involve a growing number of operations. Obviously, doing a simple
ls
takes a long time, and you don't see a thing until all inodes have been read and sorted. Doingls -U
(unsorted) helps a little because you can see it's not dead, but doesn't reduce time perceptively. Less obvious is that any wildcard expansion have to check each and every filename, and it seems that in most cases the whole inode has to be read too.In short: if you can be positively sure that no application (including shell access) will ever use any wildard, then you can get huge directories without any remorse. But if there might be some wildcards lurking in the code, better keep directories below a thousand entries each.
edit:
All modern filesystems use good data structures for big directories, so a single operation that has to find the inode of a specific file will be quite fast even on humongous directories.
But, most applications don't do just single-operations. Most of them will do either a full directory or a wildcard-matching. Those are slow no matter what, because they involve reading all entries.
For example: lets say you have a directory with a million files called 'foo-000000.txt' through 'foo-999999.txt' and a single 'natalieportman.jpeg'. These will be fast:
ls -l foo-123456.txt
open "foo-123456.txt"
delete "foo-123456.txt"
create "bar-000000.txt"
open "natalieportman.jpeg"
create "big_report.pdf"
these will fail, but fail fast too:
ls -l bar-654321.txt
open bar-654321.txt
delete bar-654321.txt
these will be slow, even if they return very few results; even those that fail, fail after scanning all entries:
ls
ls foo-1234*.txt
delete *.jpeg
move natalie* /home/emptydir/
move *.tiff /home/seriousphotos/
First make sure that the ext3 partition has the
dir_index
flag set.If it is missing, you can enable it. You need to unmount the filesystem, then run:
Then mount the filesystem.
It makes no difference until you hit the ext3 32,000 names per directory limit. Upgrading to ext4 can get around that, as well as the other benefits ext4 has.
The more entries (files, and dirs) you have inside a single directory, the slower access is going to be. This is true for every filesystem, though some are worse than others.
A better solution is to create a directory hierarchy, like this:
And if you still need better performance, you can extend multiple levels:
Most mail systems use this trick with their mail queue files.
Also, I've found that with some filesystems, just having had in the past many entries in a directory will make that directory access slow. Do an
ls -ld
on the directory to see the size of the directory entry itself. If it's several MB or more and the directory is relatively empty, then you may be getting poor performance. Rename the directory out of the way, create a new one with the same name and permissions and ownership, and then move the contents of your old directory into the new one. I've used this trick many times to significantly speed up mail servers that had gotten slowed down by the filesystem.I developed a storage server recently that needed to create tens of millions of files and hundreds of thousands of directories. I compared XFS with ext4 and reiserfs. I found that in my case ext4 was slightly faster than XFS. Reiser was interesting but had limitations so that was dropped. I also found ext4 was significantly faster than ext3.
When you get lots of files per directory, file open time starts to suffer. File I/O does not. File deletion time also suffers. However, it's not too slow on ext4. It's quite noticeable under ext3 though. XFS and ext4 are quite fast on this.
When I last looked at XFS and was weighing up the advantages and disadvantage of using XFS over ext4, I found reports of data loss with XFS. I'm not sure this is still a problem or if it ever was, but it made me nervous enough to steer clear. As ext4 is the default fs in Ubuntu it won out easily over XFS.
So, in addition to tylerl's suggestion which will help from the management perspective, I suggest you can upgrade to ext4. The per directory limit is 64000 entries with ext4
Another benefit is the fsck time is substantially quicker. I've never had any issues with corruption.
The nice thing about ext4 is that you can mount an ext3 volume to ext4 to try out. See: Migrating a live system from ext3 to ext4 filesystem
A quote from that link:
So, go ahead and try it. Suggest you backup first.
There is DEFINITELY going to be some consequences of doing this. The primary one is going to be IO read/writes. Beyond that, it's just a very scary way of dealing with that type of data(at that scale).
In the past I've used XFS to get around the limits of Ext3 with success.
The first listing of file systems contents will take a while until the system has read all the directory/file information. Supplemental operations will be faster because the kernel now has the information cached.
I've seen admins run 'find /somepath 2>&1 >/dev/null' in cron on a regular basis to be keep the cache active, resulting in better performance.
I have some questions and some possible bottleneck findings.
First, is this a CentOS 5 or 6 system? Because in 6, we have an incredible tool called blktrace which is ideal for measuring impact in this kind of situations.
We can then parse the output with btt and get where the bottleneck is, application, filesystem, scheduler, storage - at which component the IO is spending most of the time.
Now, theoretically coming to your question, it will obviously increase the number of inodes and as you keep creating or accessing new or existing files or directories inside directories, access time will increase. The kernel has to traverse a more vast filesystem hierarchy and hence that without a doubt is an overhead.
Another point to note is that as you increase the number of directories, the inode and dentry cache usage will climb up meaning consumption of more RAM. This comes under slab memory, so if your server is running low on memory, that is another point of thought.
Speaking of a real world example, I recently saw that on a highly nested ext3 fs, creating a subdir for first time is taking around 20 seconds whereas on ext4 it is taking around 4 seconds. That is because how the block allocation is structured in different filesystems. If you use XFS or ext4 it is needless to say that you will get some performance boost, however minimal it might be.
So, if you are just asking for what is the right choice of filesystem, ext3 is a bit outdated. That's all I can offer without further data and benchmark.
It's not an option on CentOS 5, and not sure how much it's an option on CentOS 6, but I have a gut feeling that a B tree or B* tree based solution i.e. BTRFS would provide consistent, if not significantly better performance in your particular scenario, if only one could entrust it with one's precious data with a clear conscience (I still wouldn't).
But If you can afford to, you could test it.