I need to store a lot of data contained in SQLite databases on a file server. I have the opportunity to split the data up into many files. This means that there is less risk of a large part of the data getting corrupted, is easier to move. Less problems with locking etc. My question is, how many files are too many files. 100.000? 1.000.000? 10.000.000 files? In other words, what is the overhead of creating a file on a file server? When I talk about overhead I am talking about number of rotations to create a file. I know about blocks and block sizes and I am not concerned about the storage wasted by storing in many files.
My question is not regarding whether it is best to store such a database on a fileserver at all and not utilize a proper database server utilizing other database software.
The environment is a microsoft environment, but I do not know anything specific about the file server.
More than 10,000 in a folder will give you trouble accessing it with explorer. This can be avoided by breaking it down into a tree of folders.
Also if your files are not a multiple of clustersize (usually 4KB) then they will waste the remainder per file. Depending on the filesize this can be significant or not.
Also access of many little files is slow due to overheads. This could limit the speed of things like backups. If you can design your usage to read larger files sequentially and do random access in memory you will be better off.
SQLite is a very cool product - but if you are accessing a database over a network, its a VERY bad idea to do so using a file-based access - even if the DBs are read-only and you don't have any concurrency to worry about, performance will be awful. You must have a very good reason for doing it this way.
In practice, assuming that performance, concurrency and locking are not issues, I would not expect any significant difference between creating 1000 files or writing the same data to 10 files as a batch, however this will vary massively depending on the nature of the underlying filesystem. OTOH, with lots of transactions occurring randomly across the files, I'd expect the smaller number of files to be more efficient. For reads, I'd expect a similar pattern. But there's only one way to find out for sure - try it.