Ping a Specific Port

Question

Wayne Conrad

Asked: 2017-05-20 12:53:10 +0800 CST2017-05-20 12:53:10 +0800 CST 2017-05-20 12:53:10 +0800 CST

Risk of not repairing "Structure needs cleaning" XFS errors

772

I have an XFS file system with file system errors affecting some non-critical files. I wish to repair it; the business wishes to continue to run with those errors. What are the known risks of not repairing an XFS file system that has "Structure needs cleaning" errors?

The business wishes to avoid the possibly lengthy maintenance window that will be needed. I have always taken it on faith that file system corruption must not be tolerated. The business is going to ask me for reasons to fix it other than my own FUD.

What kind of answers are needed

I already have an opinion; I need more than that.

Answers should be backed by evidence (anecdotes are OK, but only if they are documented first-hand. We don't need "someone told me" answers). Expert opinions are OK, such as answer from the XFS FAQ, or from a developer familiar with XFS internals).

No lay opinions, please. I'm looking for evidence, reliable anecdote, and XFS expert opinion.

Negative answers (e.g. "under similar circumstances, I ran for a year and experienced no serious problems) are OK.

File system details.

The file system is 5.4T, with 3.9T (72%) used.

There are 46.6M files.

Error details

There are 55 corrupt directories that cause applications such as ls and find to report "Structure needs cleaning", as mentioned in this XFS FAQ entry:

Q: I see applications returning error 990 or "Structure needs cleaning", what is wrong?

The error 990 stands for EFSCORRUPTED which usually means XFS has detected a filesystem metadata problem and has shut the filesystem down to prevent further damage. Also, since about June 2006, we converted from EFSCORRUPTED/990 over to using EUCLEAN, "Structure needs cleaning." The cause can be pretty much anything, unfortunately - filesystem, virtual memory manager, volume manager, device driver, or hardware. There should be a detailed console message when this initially happens. The messages have important information giving hints to developers as to the earliest point that a problem was detected. It is there to protect your data. You can use xfs_repair to remedy the problem (with the file system unmounted).

XFS errors logged to syslog all look like this:

XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xe0 [xfs], block 0x50
XFS (sdb): Unmount and run xfs_repair
XFS (sdb): First 64 bytes of corrupted metadata buffer:
ffff88073fa79000: 49 4e 41 ff 02 01 00 00 00 00 01 f6 00 00 01 f7  INA.............
ffff88073fa79010: 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ed  ................
ffff88073fa79020: 59 1b af d2 09 62 5c 17 4f e8 f8 73 00 00 00 00  Y....b\.O..s....
ffff88073fa79030: 57 e0 73 b2 27 23 63 cd 00 00 00 00 00 00 00 2f  W.s.'#c......../
XFS (sdb): metadata I/O error: block 0x50 ("xfs_trans_read_buf_map") error 117 numblks 16
XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.

These errors are repeated many times but only for two blocks.

2 Answers

Voted

shodanshok · Answer 1 · 2017-05-20T14:57:43+08:00

The filesystem should be really taken offline and checked/repaired, for at least two very good reason:

metadata error on directories will basically lock them out of your control. You can not ls them, or create/remove files inside them.
a metadata error can trigger XFS fail-safe mechanism - filesystem shutdown. If that happen, your customer will take an unscheduled downtime, maybe at the worst moment ever. It is much better to scheduler for downtime in quiet hours (ie: during the night).

Some suggestions:

before running the full-scale xfs_repair, you can dump all filesystem metadata using xfs_metadump and run a "dummy" xfs_repair on them. This will give you the possibility to observe what xfs_repair will do with/at your filesystem
be sure to have valid and recent backups before any repair attempt
if you really, really, really can not bring the filesystem down and if the files contained in the problematic directories are of no/little importance, you can try to remove the directories themselves. This will effectively "disconnect" the problematic metadata area. Be sure to understand that this is only a (bad) workaround; moreover, if the remove fails, XFS will probably shutdown the entire filesystem, forcing you to take the unplanned downtime.

ewwhite · Answer 2 · 2017-05-20T14:00:12+08:00

ewwhite

2017-05-20T14:00:12+08:002017-05-20T14:00:12+08:00

You should repair your filesystem because it could be indicative of an underlying problem with the storage array or hardware.

Make the time for downtime or maintenance... or make the case for better redundancy.

I would be checking into the health of the hardware at this point.

Assuming you're using an enterprise Linux OS (and not Arch Linux), there's a creative solution available. You could use whatever the current release of the Linux HotCopy utility/driver is and take a block-level snapshot of your filesystem. Mount that filesystem with something like:

mount -t xfs -o nouuid,norecovery /dev/hcp1 /some-mountpoint

From there, you can run and xfsrepair on the snapshot to get a feel for the severity of the issue, a list of issues and as a timing test.

Unmount and destroy the snapshot once done.

3

Risk of not repairing "Structure needs cleaning" XFS errors

What kind of answers are needed

File system details.

Error details

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?