I'm going to be generating about 50 million HTML files that I'd like to serve. Each file has a unique id (e.g., thingy) and I'd like to serve them up as if they were all in one directory (e.g., example.com/thingy).
I suspect that putting 50 million files in one directory is asking the gods to strike me down, so I'm inclined to do it with nested directories (e.g., thingy is in /t/h/i/thingy). I think I could do this with Apache and mod_rewrite without too much pain, but I'm wondering if there are other options that make more sense.
If it matters, I intend to do this on Linux.
Are you certain that all (or a majority of) the 50M files will be requested? If not, and if your problem domain allows it, you could consider taking a "lazy computation" approach. That is, only generate (and then cache) those files that are actually requested.
Still, yes, you will want to use a nested directory structure (say 3+ levels deep), so that no single directory gets more than a few thousand files in it. Then, use mod_rewrite to convert requests to the actual physical file names, something like the following (but probably with more checks and logic):
Finally, some filesystems are better at efficient handling of large numbers of files than others, so you may want to do some testing and benchmarking with a few candidates (e.g. ext4, xfs, jfs, reiserfs) before going into production.
No, mod_rewrite is how you would do this.
For this kind of thing I would use a database, and serve the files from the database. You may be able to template the page so you don't have the full page in the database.
Depending on how you are generating the pages, consider putting the source you generate the page from in the database and generate the pages as needed. There are caching techniques that can be used to prevent the need to generate the page each time it is requested.