My lab is in the process of setting up a small server that holds data (mostly video and image data, plus a few documents) for the project our group is working on at a moment in time. Historically, after a research project ends, the data haphazardly ends up being archived in one hard drive, or a big pile of DVDs (or CDs in the olden days), and/or some of the video ended up in Sony DV cassettes or even VHS tapes (this lab has been active since the early '90s), OR a mixture of all the above...
Question: What is the best way for (1) consolidating them ALL into the same format AND storage medium, and (2) what's the best medium for long term archiving of such data for very occasional access (say, 30+ years?)? Unfortunately we don't have enterprise level budget (we are just a ~10 people lab), so can't do things that costs hundreds of thousands of dollars.
Thanks!
P.S. Considering our old video and images are of smaller resolution, but recent ones are huge, I think we are talking about 30~40 TB for the really old data, another 10~20 TB for recent data, then yearly additions of about 5 TB.
Unfortunately, there is no best way for you. 30 year archival of digital media is a very hard problem and takes routine investment. About the only formats guaranteed to be readable in 30 years are ASCII and UTF8, which are not video formats. Storage formats change, the 8 track reel-to-reel tapes we were using 30 years ago are nigh impossible to read these days even though the data is still on the tape (there is an interesting story about NASA rebuilding a 40 year old tape drive to get at some newly recovered/discovered Apollo data tapes). Your best bet is to commit to periodic, I'd say every 5 years, assessments of your archival environment with sufficient budget to bring old formats into newer formats.
You probably know better than I do, but the video landscape is changing rapidly. Realtime online editing is now possible, where it was only doable on seriously good kit even 10 years ago. Who knows how things will look 30 years hence.
That should get you to 30 years.
I totally agree with sysadmin1138's post in every way bar one caveat - I don't think you're going to have the budget to really achieve what you want.
There are 5 main functions you need to create;
So what you want to do can be done, I've done it myself a number of times over the past two decades or so - but none were cheap I'm afraid.
Good luck.
The others have given good advice about how to back your media up. I would suggest you spend some quality time looking at the library of congress guidelines:
http://www.digitalpreservation.gov/formats/index.shtml
You might also consider building a cheap whitebox ZFS array. You could probably do something to fit your needs for under $10k. As the drives die, replace them with larger ones, and so your storage capacity grows as you generate data. That would probably keep you going for quite a while, and you can replace it with a higher capacity device when it gets old. The advantage is that your data is online (and so it can be accessed as necessary), and is relatively well protected against bitrot, a serious problem when you have this much data.
A decent build option was put together here:
http://www.zfsbuild.com/
As difficult as it is for technologists, I would recommend immediately stopping thoughts about disks and technology. Break out your business problem into things that you have to make decisions about.
Example:
Be aware that if you store data in a lossy format, and then convert to another lossy format, and then another, your video quality will degrade with each transition.
The following is talking about audio, but the same generally applies:
http://www.vorbis.com/faq/#transcode
So it's probably best to pick a lossless format, because once you pick one lossy format, you're stuck with it.
Perhaps there's something I'm missing, couldn't you encode everything using an open format where the source code for the codecs is available, and then just stick it all on Amazon S3?
That way Amazon have to worry about the actual storage of the data, and, unless there are no computers that can compile C/C++ in 30 years' time, you'll be able to get at the information...