On the servers I have, with HDD or SSD, I have a cron that periodically runs:
/usr/sbin/smartctl --test=short/long /dev/sd1
(for each disk)
While it runs, it just looks at the output of /usr/sbin/smartctl -c /dev/sd1
,
looping until it no longer contains:
[0-9]+% of test remaining.
And then checks if it completed without errors:
( 0) The previous self-test routine completed
However, it appears that smartctl
doesn't yet support testing of NVMe, as of version 7.0, and as per: https://www.smartmontools.org/wiki/NVMe_Support
It does say that
The smartd daemon tracks health (-H), error count (-l error) and temperature (-W DIFF,INFO,CRIT)
but what does actually run the tests?
I'm not sure if the output of -H
and -l
update unless we run short/long tests?
I also read about nvme-cli
, but I don't seem to find ways of running health tests on disks with it.
Any ideas?
Using CentOS 7 here.
SMART self-test were conceived for mechanical disks. SATA SSDs almost completely mirrors earlier HDD interface-level behavior supporting such self-test but not doing very much when you run it, actually. NVMe drives dropped such SMART self-test routines entirely.
For flash-based disks one should really track cells wear, spare block count and reallocated sectors rather then relying on old self-test routines which are not supported on NVMe drives.