Ping a Specific Port

Question

David Mackey

Asked: 2011-03-13 07:33:37 +0800 CST2011-03-13 07:33:37 +0800 CST 2011-03-13 07:33:37 +0800 CST

How do you maintain file server integrity without going offline with chkdsk?

772

I'm just wondering how folks handle ongoing file system stability when using a Windows Server as a file server without taking the system offline to perform chkdsk /f or chkdsk /r? Obviously, one doesn't really want a file server to be unavailable...and file servers now have so much storage that it could take days to run a chkdsk...so how are you protecting data from corruption?

4 Answers

Voted

joeqwerty · Answer 1 · 2011-03-13T07:53:55+08:00

joeqwerty

2011-03-13T07:53:55+08:002011-03-13T07:53:55+08:00

In my opinion chkdsk is not a tool for performing preventive maintenance. If you're having to run chkdsk on a regular basis to correct problems then you have an underlying problem that needs to be solved.

5

sysadmin1138 · Answer 2 · 2011-03-13T08:38:05+08:00

I maintained file-servers with around 7TB of general user data. That 7TB was built up mostly of office-type files, so we're talking millions. I don't have an exact number because it takes so long to get, but somewhere between 7-12 million files in the various file-systems on our Server 2008 fail-over cluster.

We never run chkdsk except to fix problems, and we never defrag.

NTFS is now self-healing enough that we run into problems very, very rarely. When we do get into problems it's generally due to a fault in the storage-system infrastructure in some way; spontaneous fibre-channel array controller reboot, FC switch panic-and-reboot, that kind of thing. Yanking the power out of the back of the server is eminently surviveable.

In fact, we recently survived a catastrophic UPS failure. The entire room dropped hard, simultaneously. NTFS recovered with nary a peep, and no need to run chkdsk.

About defrag... our FC disk array has 48 drives in it, and as it is an HP EVA the stripes are randomly distributed across the spindles. This means that even largely sequential accesses are actually random as far as the drives are concerned, which further means that a significantly sequential file-system performs minimally better than a significantly fragmented one. Therefore, routine defrags do very little to help for a lot of I/O overhead.

As for preventative maintenance, NTFS is now automated enough to do nearly all of that by itself. Once in a while I'll run chkdsk in read-only mode to see if running it in full mode is worth it. So far on our cluster it has yet to be needed. Even on our 2TB, 4 million file LUN it runs in less than a day.

That said, there are some architectural decisions you can make that can help reduce the eventual need for an offline chkdsk and make it go faster if you ever need to do one:

Set the cache policy on your RAID/SAN controllers to not cache writes. However, this is why battery-backed cache exists, so the performance hit this will cause does not need to be taken. But this is the top thing to do to prevent an offline chkdsk.
Keep your LUNs smaller. File-count matters more than size. A 6TB LUN full of Ghost images will check a lot faster than a 512GB LUN full of 6KB files.
Maintain adequate free-space. A good rule of thumb based on entirely subjective criteria is no less than 15% free at any time.
If your data allows, use a block-size larger than the default 4KB block-size for NTFS. After doing some statistics on my files, I've found I can use 16KB blocks for most of my filesystems. Larger blocks mean fewer blocks to check, and also allow the storage subsystem to take better advantage of read-ahead. Yes, itty bitty files consume more space, but on our volumes it only added about 4% to total size.

Guido van Brakel · Answer 3 · 2011-03-13T07:43:07+08:00

Guido van Brakel

2011-03-13T07:43:07+08:002011-03-13T07:43:07+08:00

In the previous where I worked we used Tripwire. For more information you can take a look here:Tripwire File Integrity Manager

Here you will also find an overview of solutions in the market for File Integrity checking:File integrity checkers

2

Greg Askew · Answer 4 · 2011-03-13T09:09:26+08:00

Microsoft has published prescriptive guidance for improving the performance and minimizing downtime when running checkdisk:

NTFS Chkdsk Best Practices and Performance
https://www.microsoft.com/downloads/en/details.aspx?FamilyID=35a658cb-5dc7-4c46-b54c-8f3089ac097a

Of particular note:

Volume size has no effect on performance.
For volumes with large numbers of files (hundreds of millions/billions), the performance increase of utilizing more memory for chkdsk is dramatic.
Windows 2008 R2 chkdsk is between two - five times the performance of Windows 2008. Windows 2003 was so bad, they were probably too embarrassed to publish the statistics.
You should proactively check if the volume(s) are dirty before a scheduled restart. This can help mitigate the effect of unexpected multi-hour startup delays.

Not in the document, but highly recommended: using a multi-purpose server for file serving hundreds of millions of files increases that probability that a crash may occur, and a volume will be marked dirty. Measures should be taken to ensure that a crash would not occur. An example would be not using the file server as a print server (printer drivers have a long notorious history in blue screen land). Another example would be "file archiving software". A backup power source with extended runtime is highly recommended.

How do you maintain file server integrity without going offline with chkdsk?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?