I'm experimenting with deduplication on a Server 2012 R2 storage space. I let it run the first dedupe optimisation last night, and I was pleased to see that it claimed a 340GB reduction.
However, I knew that this was too good to be true. On that drive, 100% of the dedupe came from SQL Server backups:
That seems unrealistic, considering that there are databases backups that are 20x that size in the folder. As an example:
It reckons that a 13.3GB backup file has been deduped to 0 bytes. And of course, that file doesn't actually work when I did a test restore of it.
To add insult to injury, there is another folder on that drive that has almost a TB of data in it that should have deduped a lot, but hasn't.
Does Server 2012 R2 deduplication work?
Deduplication does work.
With deduplication, Size on disk field becomes meaningless. The files are no longer usual "files" but reparse points and don't contain actual data but metadata for dedup engine to reconstruct file. It is my understanding that you cannot get per-file savings as dedup chunk store is per volume so you only get per-volume savings. http://msdn.microsoft.com/en-us/library/hh769303(v=vs.85).aspx
Perhaps your dedup job had not yet completed, if some other data was not yet deduped. It's not super-fast, is time-limited by default and might be resource-constrained depending on your hardware. Check dedup schedule from Server Manager.
I have deployed dedup on several systems (Windows 2012 R2) in different scenarios (SCCM DP, different deployment systems, generic file servers, user home folder file servers etc) for about a year now. Just make sure you're fully patched, I remember several patches to dedup functionality (both Cumulative Updates and hotfixes) since RTM.
However there are some issues that some systems cannot read data directly from optimized files in local system (IIS, SCCM in some scenarios). As suggested by yagmoth555, you should either try Expand-DedupFile to unoptimize it or just make a copy of the file (target file will be unoptimized until next optimization run) and retry. http://blogs.technet.com/b/configmgrteam/archive/2014/02/18/configuration-manager-distribution-points-and-windows-server-2012-data-deduplication.aspx https://kickthatcomputer.wordpress.com/2013/12/22/no-input-file-specified-windows-server-2012-dedupe-on-iis-with-php/
If your SQL backup is actually corrupted, I do believe that it's because of a different issue and not deduplication technology related.
It looks like I may have jumped the gun saying that this sort of deduplication is not possible. Apparently, it is totally possible, because in addition to these uncompressed SQL Server backups, I also have VMWare snapshot-level backups of the host VM's.
As yagmoth555 suggested, I ran an
Expand-DedupeFile
on some of these 0-byte files and I got a totally usable file back at the end of it.I then looked at my testing methodolgy for how I determined that the files were no good, and I found a flaw in my tests (permissions!).
I also opened a 0-byte deduped backup file in a hex editor, and everything looked OK.
So I adjusted my testing methodology and everything actually seems to work. As I left it, the dedupes actually got better, and I've now saved more than 1.5TB of space thanks to dedupe.
I am going to test this more thoroughly before I give it a push into production, but right now it looks promising.
Yes, but I only seen the case of a hyperv cluster db dedup'ed. 4tb to 400g, and the VM was running. The OS was fully patched.
For your sql backup file, is it a dump that you can read in it? I would check the content. For that part I cant answer how it dedup ascii file.