So we have a file share that was started 10 years or so ago and it started off with the best intentions. But now it's gotten bloated, there's files in there that nobody know who put them there, it's hard to find information, ect ect. You probably know the problem. So what I'm wondering is what do people do in this situation. Does anyone know of a decent program that can go through a file share and find files that no body has touched? Duplicate files? Any other suggestions on cleaning this mess up?
Well the file share is windows based and it's almost over 3TB. Is there a utility out there that can do some reporting for me. We like the idea of being able to find anything older then 6 months and then taking it to archive, only problem is with a file share this big that could be really hard to do by hand.
We counsel Customers to "scorch the earth" and start fresh, oftentimes.
I have yet to see a good solution that works that doesn't involve have non-IT stakeholders involved. The best scenario I've seen yet is a Customer that has had management identify "stewards" of various data areas and delegated control of the AD groups that control access to those shared areas to those "stewards". That has worked really, really well, but has required some training on the part of the "stewards".
Here's what I know doesn't work:
Things that I've seen work (some well, others not-so-well):
I agree with Evan that starting over is a good idea. I've done 4 "file migrations" over the years at my current company, and each time we set up a new structure and copied (some) files over, backed up the old shared files and took them offline.
One thing we did on our last migration might work for you. We had a somewhat similar situation with what we called our "Common" drive, which was a place where anyone could read/write/delete. Over the years, a lot of stuff accumulated there, as people shared stuff across groups. When we moved to a new file server, we set up a new Common directory, but we didn't copy anything to it for the users. We left the old Common in place (and called it Old Common), made it read-only, and told everyone they had 30 days to copy anything they wanted to the new directories. After that, we hid the directory but we would un-hide it on request. During this migration, we also worked with all the departments and created new shared directories and helped people identify duplicates.
We've used Treesize for years for figuring out who's using disk space. We've tried Spacehound recently and some of my co-workers like it, but I keep going back to Treesize.
After our most recent migration, we tried setting up an Archive structure that people could use on their own, but it hasn't worked very well. People just don't have the time to keep track of what's active and what's not. We're looking at tools that could do the archiving automatically, and in our case it would work to periodically move all the files that haven't been touched for 6 months off to another share.
At 3TB you probably have a lot uf huge unnecessary files and duplicated junk in there. One useful method I've found is to do searches, starting for files > 100MB (I might even go up to 500MB in your case) then take it down. It makes the job of finding the real space wasters more manageable.
My first order of business would be to use an enterprise file manager/analyzer/reporter/whatever-you-want-to-call-it such as TreeSize Professional or SpaceObServer. You can see what files are where, sort by creation data, access date and a host of other criterion including statistics on file types and owners. SpaceObServer can scan various file systems including remote Linux/UNIX systems via an SSH connection. That can give you great visibility into your collection of files. From there, you can "Divide and Conquer".
You might want to consider just blanket archiving anything more than six months old to another share, and watch for file accesses on that share. Files that are consistently accessed you could put back on the primary server.
Another option is something like the Google Search Appliance. That way you can let Google's app smartly figure out what people are looking for when they search for things and it will "archive" by putting less-accessed documents further down on the search page.
On our Windows 2003 R2 File Server we use the built-in reporting functionality of File Resource Monitor, it will send you, least used file lists along with other reports.
Perhaps the first step is to get an idea of the size of the problem. How much space is occupied by the file share? How many files are we talking about?
If you're lucky, you'll find that certain portions of the file share follow naming conventions, either on a per-user, per-business process, or per-department basis. This can help you parcel out the task of triaging the files.
In a worst-case scenario, you can take the whole thing offline and wait to see who complains. Then you can find out who they are and what they were using it for. (Evil, but it works.)
I think the best solution is to move to a new drive. If the number of people accessing the share is reasonable, ask them and find out which parts are truly needed. Move those to the new share. Then encourage everyone to use the new share. After some period of time, take down the old share. See who screams and then move that data over to the new share. If no one asks for something for 3-6 months, you can safely delete or archive it.
I move all existing data onto a new read-only shared folder: if the enduser needs to update a file, they can copy it into the fresh new shared drive.
This way, all the old stuff does stay available, but I can take out of the backup schedule.
On top of that, once every year, I remove folders (after checking that the archive is healthy) that haven't been updated/accessed for 3 years.