A client sent us an external hard drive where at least half the files are corrupt. They are broad mix of filetypes (images, documents, etc) and there is no discernible pattern into which are corrupt. They appear as their original size, however, when I open them in a hexeditor they are filled with nothing but nulls. The data has been completely replaced with 00s.
What could cause this to happen? The files were likely copied onto the drive from another machine. Could this result from problems during a transfer or is it more likely the files are corrupt at the origin?
Seems like the metadata were correct, so files appear in the directory trees, have names, access modes etc, but the data itself is corrupt (was not reached a media).
How this is possible depends on the file system, mount options, caching modes for the drive and so on.
Let's take
ext4
for example, where it is relatively easy to make this to occur. Default mounting options use journal for metadata only, so the file system generally guarantees that on-disk structures will be correct in any case, and everything will look either as if nothing was made to the drive or the operation is applied completely. Just as in the ACID database. But the data isn't journalled by the default, so it is possible the system completed system call, reported a success to the application, created all necessary structures (in the journal only for now), while data is residing in the cache... and now power is cut. When you power the system again and mount this volume, the file system driver will replay the journal and the files will appear, but the data will be garbage left from previous block usage. That garbage could be zeros indeed. In the end, cutting the power during write is likely to produce zero-filled files. I'd expect the same result when unplugging the drive early (like pulling out the USB cable).This unplugging scenario is quite likely taking into account you're talking about external drive. Certainly this is possible with other file systems too.