We had someone steal some files before quitting and it has eventually come down to a lawsuit. I've now been provided with a cd of files and I have to "prove" that they are our files by matching them to our files from our own file server.
I don't know if this is just for our lawyer or evidence for court or both. I also realize that I am not an impartial 3rd party.
In thinking how to "prove" these files came from our servers we realized I also have to prove we had the files before receiving the cd. My boss took screen shots of our explorer windows of the files in question with creation dates and file names showing and emailed them to our lawyer the day before we received the cd. I would have liked to have provided md5sums but I wasn't involved in that part of the process.
My first thoughts were to use the unix diff program and give console shell output. I also thought I could couple it with the md5 sums of both our files and their files. Both of these can easily be faked.
I'm at a loss of what I actually should provide and then again at a loss on how to provide an auditable trail to reproduce my findings, so if it does need to be proved by a 3rd party it can be.
Does anyone have any experience with this?
Facts about the case:
- The files came from A Windows 2003 file server
- The incident happed over a year ago and the files haven't been modified since before the incident.
The technical issues are pretty straightforward. Using a combination of SHA and MD5 hashes is pretty typical in the forensics industry.
If you're talking about text files that might've been modified-- say source code files, etc, then performing some type of structured "diff" would be pretty common. I can't cite cases, but there's definitely precedent out there re: the "stolen" file being a derivative work of the "original".
Chain-of-custody issues are a LOT more of a worry to you than proving that the files match. I'd talk to your attorney about what they're looking for, and would strongly consider getting in touch with an attorney experienced with this type of litigation or computer forensics professinal and get their advice on the best way to proceed so that you don't blow your case.
If you actually received a copy of the files I hope you did a good job of maintaining a chain-of-custody. If I were the opposing counsel I'd argue that you received the CD and used it as the source material to produce the "original" files that were "stolen". I'd have kept that CD of "copied" files far, far away from the "originals" and had an independent party perform "diffs" of the files.
Typically your attorney should already have a lot of this under control.
To prove the files are the same, md5 should be used. But even more than that, you need to prove chain of custody using auditable trails. If someone else has had the files in their custody, then you will have a hard time proving in court that the evidence wasn't 'planted'.
There are electronic evidence and forensics companies that deal specifically with this issue. Depending on how serious your company is about this case, you need to hire a lawyer that has knowledge in this area and can refer you to a firm who can assist you through this process.
An important question is how you log access to your firm's files, and how you manage version control over your firm's files.
As far as the files themselves, you want to use a tool like diff rather than a tool like md5 because you want to demonstrate that the files are the "same" except that one has one copyright notice at the start and the other has a different copyright notice at the start of the file.
Ideally you can demonstrate exactly where the files in question came from, and when they would have been copied from your environment, and who had access to those files at the time, and who made copies of them.
a) Yes, I have experience with this.
b) The answers above about using hashes answer only the question you asked in the title of this thread, not in the body. To prove you had them before you got the CD-ROM, you will need to provide logs of when they were last touched, something you probably don't have because this kind of information is rarely kept.
c) Having said that, your company probably does keep backups, and those backups have dates on them, and those backups can have files selectively restored from them for matching. If your company has a written backup policy, and the backups you kept match the policy, this will make it much easier to convince someone that you didn't fake the backups. If you don't have a policy but the backups are clearly marked, that might be sufficient (although the lawyer for the other side will question this up the wazoo).
d) If your company didn't keep backups, and all you have is the described screen shots, forget about it. You will have a very hard time convincing anyone that you are in control of your data well enough to "prove" that you had those files first.
diff is what I'd use, I think you're on the right track.
I was thinking MD5sum, and compare checksums. But any little difference could upset the checksums.
You should also have backups on tape or somewhere to prove you had them before XYZ time, since anyone could argue that you saved the files off the CD to the server (creation dates could be altered with some cleverness of time clock settings, pictures can be photoshopped, etc.)
You really need to find a way to establish, whether through backups or some other proof, that you had the files first, since they for some reason gave you the needed files that could have been used to conveniently manufacture your story (why did they do that??)
You need to find out from your lawyer, one who knows technology, what exactly is needed and perhaps talk to security people that specialize in digital forensics.
The fact is that unless someone here is a lawyer, all we can tell you is how to compare those files (md5sum) and that perhaps your best defense is old media backups to establish you had the files before getting the CD and hopefully before XYZ left with your data (emailed some of the files so you have timestamps from that? Still in archived data?)