I am looking for DFS (distributed file system) that is fault tolerant and easy to maintain. I will have tons (100M+) of small files (from 1K to 500K). Files will be located in some directories what will build a logical structure of the data.
I will have average read load of 100Mb/s and write load 100Mb/s.
I would like some input as to which file system works best for the given requirements.
Any thoughts?
Ceph is a pretty interesting one, with some neat features. One that's particularly cool is that the replication function (which decides what OSDs data goes to) is really flexible, and can be tuned for your reliability needs.
The general idea is that there are 3 types of daemons:
The client has been upstream in the Linux kernel for some time now, and the server stuff runs entirely in userspace.
As far as performance goes, the original PhD thesis on Ceph noted that at 24 OSDs, the bottleneck was the throughput of the network switch and that performance scaled linearly with the number of nodes. (see the publications section on the ceph site). That was five years ago, and there has been a great deal of tuning since then.
On the subject of reliability, the project was started by the founder of Dreamhost and is being rolled out in their infrastructure.
GlusterFS, Lustre, etc... see http://en.wikipedia.org/wiki/List_of_file_systems for a list.
Also depends on what your trying to do. Workstations in a business accessing it? Internet-accessible?...?