Ping a Specific Port

Question

David

Asked: 2010-12-15 02:19:10 +0800 CST2010-12-15 02:19:10 +0800 CST 2010-12-15 02:19:10 +0800 CST

Cost of storing many files on a fileserver

772

I need to store a lot of data contained in SQLite databases on a file server. I have the opportunity to split the data up into many files. This means that there is less risk of a large part of the data getting corrupted, is easier to move. Less problems with locking etc. My question is, how many files are too many files. 100.000? 1.000.000? 10.000.000 files? In other words, what is the overhead of creating a file on a file server? When I talk about overhead I am talking about number of rotations to create a file. I know about blocks and block sizes and I am not concerned about the storage wasted by storing in many files.

My question is not regarding whether it is best to store such a database on a fileserver at all and not utilize a proper database server utilizing other database software.

The environment is a microsoft environment, but I do not know anything specific about the file server.

2 Answers

Voted

JamesRyan · Answer 1 · 2010-12-15T03:53:05+08:00

JamesRyan

2010-12-15T03:53:05+08:002010-12-15T03:53:05+08:00

More than 10,000 in a folder will give you trouble accessing it with explorer. This can be avoided by breaking it down into a tree of folders.

Also if your files are not a multiple of clustersize (usually 4KB) then they will waste the remainder per file. Depending on the filesize this can be significant or not.

Also access of many little files is slow due to overheads. This could limit the speed of things like backups. If you can design your usage to read larger files sequentially and do random access in memory you will be better off.

1

symcbean · Answer 2 · 2010-12-15T05:26:26+08:00

Best Answer

symcbean

2010-12-15T05:26:26+08:002010-12-15T05:26:26+08:00

SQLite is a very cool product - but if you are accessing a database over a network, its a VERY bad idea to do so using a file-based access - even if the DBs are read-only and you don't have any concurrency to worry about, performance will be awful. You must have a very good reason for doing it this way.

In practice, assuming that performance, concurrency and locking are not issues, I would not expect any significant difference between creating 1000 files or writing the same data to 10 files as a batch, however this will vary massively depending on the nature of the underlying filesystem. OTOH, with lots of transactions occurring randomly across the files, I'd expect the smaller number of files to be more efficient. For reads, I'd expect a similar pattern. But there's only one way to find out for sure - try it.

1

Cost of storing many files on a fileserver

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?