Ping a Specific Port

Question

Elliot B.

Asked: 2018-03-19 23:17:46 +0800 CST2018-03-19 23:17:46 +0800 CST 2018-03-19 23:17:46 +0800 CST

RAID1 mdadm, automatically fail a drive and avoid a read-only file-system?

772

I am managing a server with two solid state drives configured in mdadm RAID1. The server is running RHEL6 with an ext4 filesystem.

This evening the server went offline shortly after nightly backups began and the console reported disk errors:

Upon logging into the console, I found that one of the disks had been marked failed by mdadm and the file-system was set to read-only.

Is there a way that I can configure mdadm to fail the drive before the file-system is re-mounted as read-only? I would much rather run as a single disk system for a short time (until a replacement disk can be installed) rather than immediately kick the file-system into read-only mode -- which would guarantee an outage.

1 Answers

Voted

Halfgaar · Answer 1 · 2018-03-20T01:58:12+08:00

It does that by default, but granted, I've had similar issues with this. MD is not really eager at failing disks (or in fact repairing sectors by re-writing them, which hardware RAID controllers do). That's why I set up my log monitoring to scan for 'ata exception' and e-mail me when that happens. At least with traditional HDDs, this allows you to see disk failures much faster.

If the file system is marked read-only, the error went higher up the chain, and the MD device also saw errors. Are you sure there were no errors on sdb?

Or, are you sure the drives failed at all? It can happen, just recently to me, that the entire PCI bus failed. All devices connected to it started spewing errors (all ATA and ethernet), and indeed the file systems were marked as read-only, and the MD arrays as failed. But obviously the disks or MD wasn't the issue.

To check if the drives were in error: I don't have much experience with SMART on SSD drives, but at least with HDD drives, the SMART log may show something; there is an error log in it, and you can look at the smart parameters, perhaps compare with the other disk.

If smartmontools is installed, you can do:

smartctl -a /dev/sda

You may also be interested in How do I troubleshoot my RAID array.

Edit: As for the PCI bus thing. It does look like your issue was localized to one disk or controller.

RAID1 mdadm, automatically fail a drive and avoid a read-only file-system?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?