I am using CentOS 5 with Plesk 9 (64-bit), I am running a site where users will be uploading pictures. With a 64 bit os, are there any limits to how many files I can store? All I care about is performance, and serving up the files. I'd prefer not to have 4 directories deep of scattered files. However, I am hoping, that at some point I could have 200-300 thousand images.
If you are using ext3, I found this quote (warning: spanish speaking site)
Further reading showed that ext3 doesn't have a 32K limitation, which can be empirically proven with
but it does have a 32K folder limit for folder, which can be tested with
This (unfounded) claim says that
This question from sister site stackoverflow.com could help too.
In general:
This depends greatly on the filesystem you use. Certain older versions of ext3 were attrocious with this, which is how the btrees came about. Reiser is a lot more performant with large numbers of files such as that. In older days I've had a Novell NSS directory on a NetWare server with 250,000, 4kb files in it due to a GroupWise flub and it worked just fine. Enumerating the directory sucked a lot, but accessing a specific file in that directory worked as fast as you'd hope. As this was 8 years ago, I must presume modern Linux filesystems can handle that with aplomb.
It depends on the filesystem you're using, not the 64-bit'ness of the operating system. With every filesystem, there's going to be some point at which the big-O costs of the algorithm used to search the directory are going to get the better of the computer.
If you can break the file hierarchy up into even just a two (2) tier hierarchy you'll see better long-term scalability.
File systems in Linux store directory in basically two ways:
As a flat list of files.
As a data structure (usually a B+Tree or related data structure).
The former gets progressively slower as files are added. The latter does not. Note that ls might still take forever since it has to lookup the inodes of all those files, the directory entries only contains the filename and inode number.
Ext3 directories are flat lists, with an option for a hashed tree index to speed things up.
XFS uses B+Trees.
But for either of these file systems, if you do an ls -l, it'll need to hit as many inodes as there are files. For name lookups (when opening a file for example) B+Tree and things like that will be much faster for large directories.
A hierarchy of directories makes it easier to manage the files however and so you might want to consider that possibility. Even a single layer of directories with, say, 4000 files limit each, would make things much easier to manage.
If you're going beyond a few hundred images, definitely consider two things:
I'd recommend using XFS, or, failing that, ReiserFS, with a two- or three-deep directory hierarchy divided up by two-byte pairs. e.g.
This will give you 256 directories in the first few levels, splitting images up over a total of 65535 separate directories (which is more than enough for 100-200k images and beyond). It will make things much faster and much more scalable, and make it a lot easier to maintain later on as well.
Most default configurations of ext3 have a limit 32K subdirectories per directory (can't rememeber the actual number now but we ran into just that issue a couple of weeks ago System was Debian/Etch at that time).
Might also hit you in some applications that use a lot of caching.
Consider not using ext3, certainly. http://kernelnewbies.org/Ext4#head-97cbed179e6bcc48e47e645e06b95205ea832a68 (shows new features in ext4) might be a helpful kicking off point.
Would say have a look at how squid organises its cache too (multiple layers of directories) as many files in one directory may prove tough to maintain. Long lists (generally) suck.
ext3 filesystems have htrees for big directories by default on most distros. do a
tune2fs -l /dev/sda1
(or whatever blockdevice you're using) and check the "Filesystem features:" line. if there's a "dir_index" among them, you're golden.note, however, that even the best directory structures can only make it fast to find one specific file. doing
ls
on a huge directory is going to be terrible, as would be any pattern matching, even if you know it matches a single file.for these reasons, it's usually better to add one or two levels of directories. usually using some bits of an ID to name the directories.
Its going to depend somewhat on what filesystem you're using on your Linux server.
Assuming you're using ext3 with dir_index, you should be able to search large directories quite fast so speed shouldn't be much of a problem. Listings (obviously) will take longer.
As for the max number of files you can put in the directory, I'm pretty sure you can work reliably up to 32,000 files. I'm not sure I'd want to exceed that (even though you probably can).