The smartctl tool allows initiating a long self-test (smartctl -t long /dev/sda
). However there's also badblocks
that I can run on a drive. How are the two related? If badblocks detects bad blocks, does the drive automatically update its SMART values (e.g. by updating its relocated sectors count)? Can badblocks replace smartctl -t long
, or vice versa?
Like I pointed out in my other answer, every modern hard drive has remapping space available (because especially at today's disk densities, no drive platter will be perfect - there will always be a few defects that the drive has to remap around, even on brand-new-never-been-used-came-off-the-assembly-line-into-my-hands drives).
Because of this, theoretically you should have a SMART failure reported before something like
badblocks
notices (end-user-visible) bad sectors on a drive.On modern hard disks any end-user-visible bad sectors (as might be reported by
badblocks
or automatically detected by the OS) are a final gasp and shudder of a dying disk.Ultimately SMART and
badblocks
test two different, but related, things:SMART is a self-monitoring tool:
The hard drive knows some information about its operating parameters, and has some meta-knowledge as to what is "normal" for some, and "acceptable" for others.
If the drive senses that certain parameters are "abnormal" or "unacceptable" it will report a pre-failure condition -- in other words the drive is still functional, but might fail soon.
For example: The spindle motor normally draws 0.10 amps, but now it's drawing 0.50 amps -- an abnormally high draw that may indicate the shaft is binding or the permanent lubricant on the bearings is gone. Eventually the motor will be unable to overcome the resistance and the drive will seize.
Another example: The drive has 1000 "remap" blocks to deal with bad sectors. It has used 750 of them, and the engineers that built the drive determined that number of remaps indicates something internally wrong (bad platter, old-age failure, damaged head) - the drive will report a pre-failure condition allowing you time to get your data off before the remap space runs out and bad sectors become visible.
SMART is looking for more than bad sectors - it's a more comprehensive assessment of the drive's health. You could have a SMART pre-failure warning on a drive with no bad sectors and no read/write errors (for example, the spindle motor issue I described above).
badblocks
is a tool with a specific (outdated) purpose: Find bad sectors.badblocks
comes from a time before SMART and bad-sector remapping. Back then we knew drives had imperfections, but the only way to map them out to prevent accidentally storing data there was to stress-test the disk, cause a failure, and then remember not to put data there ever again.The reason I say it is outdated is because the electronics on modern drives already do what
badblocks
does, internally and a few thousand times faster.badblocks
basically allows ancient drives that lack sophisticated electronics to re-map (or skip over) sectors that have failed, but modern hard drives already detect failed sectors and remap them for you.Theoretically you could use
badblocks
data to have the OS remap (visible) failures as if your modern disk was an ancient Winchester disk, but that's ultimately counterproductive -- Like I said previously ANY bad sectors detected withbadblocks
on a modern drive are a cause to discard the entire drive as defective (or about to fail).Visible bad sectors indicate that the drive is out of remapping space, which is relatively rare for modern disks unless they're old (nearing end of functional life) or defective (bad platters/heads from the factory).
So basically if running
badblocks
on a disk before you deploy it in production makes you feel better go ahead and do it, but if your disk was manufactured in this century and it shows a visible bad sector you should chuck it in the trash (or call in its warranty). For my money SMART status and defense in depth is a better use of my time than manually checking disks.I have to disagree with voretaq7 — SMART is not magic. When you have a drive and one of it's sectors goes bad, you'll not be able to read data from it anymore. So it is perfectly possible to have an unreadable file on a modern disk drive. SMART would mark this unreadable sector as "Current Pending" and "Offline Uncorrectable" when it would be first accessed after failure.
But when this sector would be written to again then it would be remapped to remapping space, unmarked and a "Reallocated_Sector_Ct" counter would increase. Then a whole drive would be readable again.
smartctl -t long
test is useful — it will test the whole drive space for unreadable sectors and log and mark as "Current Pending" and "Offline Uncorrectable" the first bad sector encountered when run. I'm configuring my servers to run this long test once per week on every drive. It does not affect normal drive functions too much, as OS requests always have priority over SMART scans.As on a server I always run disks in RAID1 mirrors, so when a long test finds a bad sector I can rewrite its contents using data from another drive in a mirror, forcing reallocation.
badblocks
is also useful sometimes — for example it'll test the whole drive and won't stop at a first error. It can test a single partition or any other part of a drive. You can use it to quickly check if a bad block was successfully reallocated.Good answers to this question are
https://superuser.com/a/693065
https://superuser.com/a/693064
Contrary to other answers I find badblocks not outdated but a very useful tool. Once I upgraded my pc with a new hard drive and it started running unstable. It took me quite a while to realize thanks to badblocks that the disk surface had defects. Since then I run full write-mode (destructible!) badblocks for every new hard drive I start using and never had that problem again. I highly recommend a
time sudo badblocks -swvo sdX.log /sev/sdX
for every new hard drive. It will test every single bit of the disk a few times for writing and reading and so can avoid a lot of trouble later.
During this test bad blocks will be mapped out by the drive. So the "Realocated Sector Count" should be noted before and after the test and compared with the SMART threshold since it will tell something about the health of the drive.
badblocks is a relic from old times and is not strictly useful, it can find a currently unreadable sector but the right thing to do with a bad sector is to recover the data from backup. What can be done if the data wasn't critical to you is to delete the associated file and write anything on that location, this will let the disk reallocate the sector if it thinks it needs to and continue working.
The disk self-test will also go around and test the entire media for various defects, it is supposed to use lower thresholds compared to what it uses in normal operation to see if the disk has many weak spots and based on vendor logic can decide that the disk is past its useful life and declare the test failed. At that point you should take all your data out or recover it from backup and replace the disk.
If a disk action (either by badblocks or normal operation) hits an unrecoverable read error the disk will automatically update its reallocation pending counter and when the reallocation is performed it will update the reallocation pending and the reallocated counters. A simple dd will get that happening as well.
If you need to choose between the two use smartctl -t long as it would have a better analysis of the disk.
I can also suggest the use of my diskscan utility https://github.com/baruch/diskscan, it works more like badblocks but tries to assess if there sectors that are going bad, sort of like a hard of hearing sector that takes a lot longer to read. This is indicative of a developing media problem and in future versions may also offer automatic attempt to help the disk fix this problem.