I mean, I can look up the dictionary definition, but why is everyone suddenly talking about it in reference to virtual tape libraries? What's "new" here so that it's so much in the news lately?
I mean, I can look up the dictionary definition, but why is everyone suddenly talking about it in reference to virtual tape libraries? What's "new" here so that it's so much in the news lately?
Deduplication is where you look at the content of a data set, note all the duplicate bits that are present, and store the data just once, replacing all those otherwise copies of data with a pointer back to the one copy. It is particularly helpful with backups because when you back up things like servers so much of the data is the same. Imagine, for instance, you are backing up 1,000 Windows servers - much of the content on those boxes will be identical.
Deduplication is so popular today for 3 reasons:
Lately everyone is obsessed with building disaster recovery solutions that utilize off-site servers. To do this, you have to replicate a lot of production data to the remote site and bandwidth is a huge problem. Any reduction in the amount of data you have to replicate helps a lot.
The amount of data companies are retaining is exploding - thanks to cheaper storage and multi-industry requirements for retention of records.
The technology relatively recently hit the sweet spot. We've had things like deduplication for a long time (single instance storage, etc) which has helped but only in the last year or so have we seen real deduplication that can significantly reduce the amount of storage hit the mainstream.
One of the things we found out at my company in working with Netapp is that deduplication really only works well in a VM environment if you have your drives aligned. Which is a problem for us as we have a lot of Windows Server 2003 machines and none of the drives are aligned. Which means you barely recover around a fourth of the space possible if the drives are aligned correctly.
We are being told though that once the drives are aligned correctly we should be able to recover 40-60% of our space back with dedup.