So, one of my servers has a hard disk failure. It's running software RAID, the system locked up and according to /proc/mdstat
(and /var/log/messages
), it's really down:
Personalities : [raid1]
md2 : active raid1 sdb2[1]
104320 blocks [2/1] [_U]
md5 : active raid1 sdb5[1]
2104448 blocks [2/1] [_U]
md6 : active raid1 sdb6[1]
830134656 blocks [2/1] [_U]
md1 : active raid1 sdb1[1]
143363968 blocks [2/1] [_U]
and
Nov 5 22:04:37 m38501 smartd[4467]: Device: /dev/sda, not capable of SMART self-check
However
when I do smartctl -H /dev/sda
, it passes the test. It also passes the test with smartctl --test=short /dev/sda
.
So, is smartctl
a broken testing tool, or am I doing something completely off?
Maybe an intermittent error with the drive electronics? That's the first thing that comes to mind. Be safe and replace the drive.