we have a distributed application which uses large amounts of content (all kind of files). There are several servers that need to access the content. Right now the content is stored on each server redundantly. But this is getting ugly.
We want to store the content on a single storage instance with large hard discs. We then want to mount the file system of this storage instance from each of our servers.
I thought about using NFS, but the security scheme doesn't seem to fit. Right now I'm looking at Samba but I'm not sure if it is the right choice. All servers are Linux and Samba's main purpose is a Windows/Linux environment. What makes Samba interesting to me is the user level security.
Aside from security another major requirement is performance. Our servers need fast access to the content. That is as fast as possible over a LAN.
Is Samba a good choice? What other options are there? What about WebDAV?
EDIT: What I need to do: We have varying number of servers that need to access a growing number of files. We expect to become several TB. We call these files the 'content'. All servers have to use the same version of the content. The servers need concurrent read-only access to the content. The content is updated relatively seldom. Something between once a week and once a month but it likely become much more often. Right now it would be possible to sync the content on each server but that will become a pain in the near future. The update has to be quite snappy. We think it would be convenient to update/sync the content on only one server (storage server) and let all other servers mount the content as a remote filesystem.
All the best
Jan
Samba will almost certainly do what you want, and with fairly reasonable performance. It should have the necessary security controls to handle whatever use cases you've got in mind (your question is a bit short on details there).
It's hard to provide other recommendations, since you don't give a really good description of what you need to do, and what your constraints are. WebDAV probably isn't useful; it's not anything like a POSIX filesystem, and if you think you need blazingly fast performance, then you're probably wanting something that'll act like a full filesystem (arbitrary seeks, that sort of thing) which is going to be painful on WebDAV.
You also haven't talked about concurrent access to individual files, which has a strong bearing on your possible solution space. If only one client is accessing a given file at once, and especially if only one client will ever be updating a given file, then don't necessarily give up on periodic sync solutions -- they can do a good job, in the right conditions.
Finally, if it's mostly (or all) read-only, then consider making your data access higher-level. Rather than thinking that you have to have files, why not think in terms of useful application-specific abstractions? A common example of this is the humble SQL database -- rather than storing data in flat files and grovelling through it with custom code, some clever clods came up with the idea of a more specialised storage engine and the necessary verbiage to intelligently access it. It's not as flexible as a filesystem, but (in it's narrow niche) it's a damned sight quicker. Perhaps with a bit of imagination, you can come up with a similar abstraction for your data, which could save quite a lot of trouble?
NFS - Your storage instance is the NFS server. What you call your servers that want to mount the NFS file systems off of the storage instance are your NFS clients. Your NFS storage instance does have to know the IP addresses of your NFS clients, ie, your servers, but you do know those addresses already. For these addresses you can allow a whole subnet at a time and you will at least know what subnet your servers live on. Note that companies such as NetApp sell exactly this sort of thing, and, they work pretty well.
Samba - My experience is slightly different then how you want to use it (ie, user supplies username/password combos when then mount the shares) so I can't comment on your proposed use.
Both should work fine, and, both should be able to saturate a 1Gb ethernet interface with no problem. I suspect that that will be your upper bound in terms of how much data you can get off of your storage instance. You can, of course, use multiple ethernet interfaces to work around that, and then you will probably be limited by how fast you can move data off of what ever you buy for disks.
I think one of the key numbers you need to know before you start this is, how much data do each of your varying number of servers need to read per second? Then you need to know what is the maximum number of servers you will have. Does your proposed centralized solution supply that much data per second? Right now you have solved this by having each server be independent and have a copy of the data.
Regarding your specific requirements of performance and speed, SSH would fit perfectly. SSH uses the sftp protocol to exchange files and uses the native Linux user permission control. You can use a computer system with large storage volumes(possibly with hardware-level or software-level RAID or Encryption, EXT4 file system is perfect as it works magically quick) as a Dedicated Attached Storage(DAS). Set up SSH server on it and define different user access levels on your data just the way you do it on each computer system you currently have. Then accessing the content on this server is as easy as accessing local data. Setting up keyrings on each computer is essential to make authentication on the server secure.
Consolidating on a server can make life easier - but how do you assure availability - duplicate network cards? Raid? Sometimes replication can be a good thing.
Since we're talking about server-server communications is user level security so much of a requirement? Certainly user authentication in NFS is weak - but what about using NFS + authentication at a lower level in the network such as IPSEC? Or a shared filesystem on top of iSCSI on top of a VPN?
Depending on the pattern of access, and if availability of local storage is not a problem, the fastest solution might be something like AFS - where you effectively get a very large local cache which has the added advantage of being usable when the server goes down.