our company at present stores a lot of automatically generated files on disk, this presnetly is well over 200,000 files, there PDF's and are around 100k - 1M in size.
ive been asked to provide some evidence for the pro's and Cons of storing this data in files vs storing it as database records..
i would like to see us where possible store this data in a MS SQL or MySQL db etc rather than having 200,000 files knocking around a pile of local directories.
what id like from you guys is some good solid reasons for using either system so i can weigh up the difference and put my case forward.
I honestly cannot see any advantage gained by storing these documents in a database. Since the docs don't get altered, neither version control systems nor document management systems will add any value.
The best you can do really is to get them stored on a separate server with a file system that excels at fast retrieval (possibly XFS, read more here and here). What might help is a good organisation of the folder structure itself, e.g. in the case of insurance claims a superstructure by year and month, or in the case of insurance contacts a superstructure ordered by the first few digits/characters of the policy number.
There is some value is storing the files in a document oriented database. But, it depends entirely on how you are using the files, how often they are accessed, how fast they need to be accessed. There are also document management systems that may be a good fit. You need to detail out your use case first.
Is there anything broken or cumbersome about your current storage scheme? The transition cost of moving your files into a database will be nontrivial. Putting the pain of switching aside, here are some things to consider:
Data Consistency: you didn't specify what file system/platform you're using, but a database might provide better integrity checks for individual files.
Off-site Recovery: most DBAs worth anything know how to use the replication features of their database.
Backup: depending on the situation your database vendor may provide you with backup options (log assisted backup, snap shots, consistent hot backups) your OS may not provide.
Logging/Auditing: security features of most modern databases should provide you with a record of who has accessed each file.
Data Privacy: is encrypting data in your database of choice easier than on your OS?
Technically, there's a file system/OS based solution for each of the points I listed (eg, rsync, kernel level audting, file system encryption). If what you have is adequate for your current and projected needs, you can't beat the simplicity of a file system. However, if your organization has strong DBA skills and a thin Sys Admin team, you might be better off with a database. The decision might be easy if your DBA's already have established, proven procedures to meet all of your requirements.
It seems like it may not necessarily be "file system" vs. "database" as much as possibly more of a data management, access and protection comparison? Maybe around things like: