I am managing a server with two solid state drives configured in mdadm RAID1. The server is running RHEL6 with an ext4 filesystem.
This evening the server went offline shortly after nightly backups began and the console reported disk errors:
Upon logging into the console, I found that one of the disks had been marked failed by mdadm and the file-system was set to read-only.
Is there a way that I can configure mdadm to fail the drive before the file-system is re-mounted as read-only? I would much rather run as a single disk system for a short time (until a replacement disk can be installed) rather than immediately kick the file-system into read-only mode -- which would guarantee an outage.
It does that by default, but granted, I've had similar issues with this. MD is not really eager at failing disks (or in fact repairing sectors by re-writing them, which hardware RAID controllers do). That's why I set up my log monitoring to scan for 'ata exception' and e-mail me when that happens. At least with traditional HDDs, this allows you to see disk failures much faster.
If the file system is marked read-only, the error went higher up the chain, and the MD device also saw errors. Are you sure there were no errors on sdb?
Or, are you sure the drives failed at all? It can happen, just recently to me, that the entire PCI bus failed. All devices connected to it started spewing errors (all ATA and ethernet), and indeed the file systems were marked as read-only, and the MD arrays as failed. But obviously the disks or MD wasn't the issue.
To check if the drives were in error: I don't have much experience with SMART on SSD drives, but at least with HDD drives, the SMART log may show something; there is an error log in it, and you can look at the smart parameters, perhaps compare with the other disk.
If smartmontools is installed, you can do:
You may also be interested in How do I troubleshoot my RAID array.
Edit: As for the PCI bus thing. It does look like your issue was localized to one disk or controller.