Ping a Specific Port

Question

Ján Lalinský

Asked: 2019-09-22 13:18:28 +0800 CST2019-09-22 13:18:28 +0800 CST 2019-09-22 13:18:28 +0800 CST

How to avoid downtime when a need arises to check & fix a huge filesystem

772

I'm researching for ways to build and run a huge storage server (must be running Linux) where for all data arrays I can run consistency check and fix, while the usual applications using the arrays (reads and writes) keep on working as usual.

Say you have a many-TB of data on a single traditional Linux filesystem (EXT4, XFS) that is used by hundreds of users and suddenly the system reports consistency/corruption problem with it, or you know that the machine went down recently in a dirty way and filesystem corruption is very likely.

Taking the filesystem offline and running the filesystem check can easily take many hours/days of downtime, since neither EXT4 nor XFS can run check & repair while in normal operation; the filesystem needs to be taken offline first.

How to avoid this weakness of EXT4/XFS with Linux? How can I build a large storage server without ever needing to get it offline for hours for maintenance?

I've read a lot about ZFS and its reliability due to its use of data/metadata consistency checks. Is it possible to run consistency check and fix ZFS filesystem without taking it offline? Would some other new filesystem or some other organization of the data on disk be better?

One other option I'm thinking about is to divide the data array into ridiculously many (hundreds) of partitions, each having its own independent filesystem, and fix applications to know to use all those partitions. Then, when the need to check one of them arises, only that one will need to be taken offline. Not a perfect solution, but better than nothing.

Is there a perfect solution to this problem?

3 Answers

Voted

ewwhite · Answer 1 · 2019-09-22T13:51:25+08:00

ewwhite

2019-09-22T13:51:25+08:002019-09-22T13:51:25+08:00

This would be a case for XFS or ZFS. FSCK is not a concept in the ZFS world.

There's a good amount of skill in building something like this in a robust manner. If there's a budget for bringing in an expert or ZFS consultant, your organization should consider doing so.

3

shodanshok · Answer 2 · 2019-09-24T05:30:08+08:00

The crude reality is that legacy filesystems are not really well suited for multi-TB volumes. For example, RedHat recommend EXT4 filesystems no bigger than 50 TB; with the fsck time being one of the limiting factors.

XFS is in a better shape, both due to much faster xfs_repair (compared to the old xfs_check) and to the on-going project to add on-line scrub.

EXT4, XFS and other filesystems (BTRFS excluded) can be checked on-line by taking a snapshot of the main volume and running an fsck against the snapshot rather than the main filesystem itself. This will catch any serious error without requiring downtime, but it clearly need a volume manager (with snapshot capability) being in place under the filesystem. As a side note, this is one of the main reason why RedHat uses LVM by default.

That said, the most know and reliable filesystem with on-line scrubbing clearly is ZFS: it was designed from the start to efficiently support very large arrays, and its online scrub facility is extremely effective. If any, it has the opposite problem: it lack an offline fsck, which would be useful to correct some rare class of errors.

John Mahowald · Answer 3 · 2019-09-24T05:12:30+08:00

Do a business continuity analysis by asking the organization how much downtime for this storage is acceptable. Doing better than a handful of planned outages and a couple hours downtime per year usually requires investing in a multiple node solution.

Protect against as many downtime risks as you can think of. For example, a fire in the data center will shut things down for a couple hours, whatever the storage technology. If service must continue, replicate the data to a different system in a different building.

Regarding the file system, pick something you can fix and/or your vendor can support. EXT4 will strongly encourage you to fsck every so many mounts. XFS fsck doesn't do anything due to journal but xfs_check is offline. ZFS has no fsck, rather it has online scrubs.

Splitting data into multiple volumes might make sense to some extent. Would isolate failures, perhaps by organization unit or application. However, hundreds of small volumes just to keep fsck fast increases work. One advantage of centrally managed storage was supposed to be less administrative work.

For multiple node availability and performance, consider adding on another layer, a scale out distributed file system. Ceph, Lustre, Gluster, others. Quite different from one large array. Implementations vary in whether they use a file system underneath, and if they provide block or file protocols to users.

How to avoid downtime when a need arises to check & fix a huge filesystem

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?