Ping a Specific Port

Question

the-wabbit

Asked: 2012-10-27 08:59:52 +0800 CST2012-10-27 08:59:52 +0800 CST 2012-10-27 08:59:52 +0800 CST

Technical details for Server 2012 de-duplication feature

772

Now that Windows Server 2012 comes with de-duplication features for NTFS volumes I am having a hard time finding technical details about it. I can deduce from the TechNet documentation that the de-duplication action itself is an asynchronous process - not unlike how the SIS Groveler used to work - but there is virtually no detail about the implementation (algorithms used, resources needed, even the info on performance considerations is nothing but a bunch rule-of-thumb-style recommendations).

Insights and pointers are greatly appreciated, a comparison to Solaris' ZFS de-duplication efficiency for a set of scenarios would be wonderful.

1 Answers

Voted

sysadmin1138 · Answer 1 · 2012-10-27T09:52:04+08:00

As I suspected, it's based in the VSS subsystem (source) which also explains it's async nature. The de-dupe chunks are stored in \System Volume Information\Dedup\ChunkStore\*, with settings in \System Volume Information\Dedup\Settings\*. This has significant impacts to how your backup software interacts with such volumes, which is explained in the linked article (in brief: w/o dedupe support your backups will be the same size as they always are, with dedupe support you'll just backup the much smaller dedupe store).

As for the methods used, the best I could find was a research paper put out by a Microsoft researcher in 2011 (source, fulltext) at the Usenix FAST11 conference. Section 3.3 goes into Deduplication in Primary Storage. It seems likely that this data was used in the development of the NTFS dedupe feature. This quote was used:

The canonical algorithm for variable-sized content-defined blocks is Rabin Fingerprints [25].

There is a lot of data in the paper to sift through, but the complexity of the toolset they used, combined with the features we know are in 2012 already, strongly suggest that the reasoning in the paper was used to develop the features. Can't know for certain without msdn articles, but this is as close as we're likely to get for the time being.

Performance comparisons with ZFS will have to wait until the benchmarkers get done with it.

Technical details for Server 2012 de-duplication feature

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?