Ping a Specific Port

Question

Jeff Welling

Asked: 2011-08-01 06:38:22 +0800 CST2011-08-01 06:38:22 +0800 CST 2011-08-01 06:38:22 +0800 CST

Does this SMART selftest indicate a failing drive?

772

I'm wondering if the results of this SMART selftest indicate a failing drive, this is the only drive that comes up with 'completed: read failure' in the results.

# smartctl -l selftest /dev/sde
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)   LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      8981         976642822
# 2  Extended offline    Aborted by host               90%      8981         -
# 3  Extended offline    Completed: read failure       90%      8981         976642822
# 4  Extended offline    Interrupted (host reset)      90%      8977         -
# 5  Extended offline    Completed without error       00%       410         -

The drive doesn't yet show any signs of failure, aside from the output from that SMART selftest. This is the output from a different drive in the same system which is currently running a SMART selftest

# smartctl -l selftest /dev/sdc
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 30%     15859         -
# 2  Extended offline    Completed without error       00%      9431         -
# 3  Extended offline    Completed without error       00%      8368         -


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   176   175   021    Pre-fail  Always       -       4183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       48
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       8982
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       34
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       13
194 Temperature_Celsius     0x0022   111   101   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

6 Answers

Voted

Michael Hampton · Answer 1 · 2013-05-24T23:00:41+08:00

Best Answer

Michael Hampton

2013-05-24T23:00:41+08:002013-05-24T23:00:41+08:00

Hopefully you've long since replaced the drive, but since no one has yet directly answered the question...

You ran two tests, both of which failed to read the same logical sector of the disk, as indicated by Completed: read failure and the same LBA in both tests. This does indeed indicate the disk has a defect, and you should be able to have it replaced under warranty. Attempting to store data in this sector may or may not cause the drive to notice it's defective during the write process and remap the sector, but if the drive doesn't notice, and can't read the data later on, you've lost it.

12

Sgaduuw · Answer 2 · 2011-08-01T10:32:49+08:00

Sgaduuw

2011-08-01T10:32:49+08:002011-08-01T10:32:49+08:00

I want to add to the comments in the other answer, but I can't due to lack of rep, go figure.

You don't need to make a cron script, there is a smartd daemon in the smartmontools package that handles just what you want to do: regular checking of SMART status. All you need is to create a configuration and start the service. The smartmontools package also contains some sample scripts that smartd can call when something starts failing.

6

Bacon Bits · Answer 3 · 2011-08-01T07:12:12+08:00

Bacon Bits

2011-08-01T07:12:12+08:002011-08-01T07:12:12+08:00

Is your data worth risking on a suspect drive?

If it were me, I'd replace the drive and be thankful that SMART saved me a big headache.

5

Alexandr Priymak · Answer 4 · 2013-07-29T15:47:33+08:00

Alexandr Priymak

2013-07-29T15:47:33+08:002013-07-29T15:47:33+08:00

What will I do in your situation?

First of all I find out which files are affected. There are some instructions how to do this https://www.smartmontools.org/wiki/BadBlockHowto. Yeah. In your case it is harder because you have an array. But it is possible. Than, ensure that this file is backuped, than write zeros to the failing sector. Two things can happen.

The drive successfully writes zeroes to this sector. Current_Pending_Sector, Reallocated_Sector_Ct should be zeros afterwards.
The drive fails to write to this sector. Than it remaps this sector to a "spare" area.

In any case you end up with a fixed drive. You should restore your file from backup (because you overwrote one sector of it). Also you should rerun en extended self-test to ensure that there are no more errors.

Stay healthy!

P.S. I know that this post is kind of old. But I goolged it. And I think it is a good idea to provide another good answer.

2

cstamas · Answer 5 · 2011-08-01T11:59:18+08:00

Backup as soon as you can!
If this drive is still in warranty, then
- run the vendor's check utitity (you can usually get a boot cd)
- if this returns error then bingo, send it back and wait for replacement
- restore from backup
- problem solved - END

If this drive has no warranty then you are screwed
- there is still some hope...
- as this is actually a read error only it does not mean you cannot write to it
- after making a backup you can try to restore the backup as it will overwrite there unreadable sectors with new data which you can actually read back (well, usually this works, in the background the drive will remap these blocks to spare sectors most of the time)
- badblocks tool can be also used for this (you already have backups, right?)
  - you do not actually use this to test the disk (does not make much sense with never disks anyways), but to write to these sectors multiple times
- you can re-run the smart tests again and there is chance that these unreadable sectors "correct themselves"
- problem NOT solved, you only made the drive last longer, probably it will fail faster than normally maybe in a year depending on its usage, but hey disks are cheap, get a new one if your data is important for you - END

Falcon Momot · Answer 6 · 2013-05-24T23:00:42+08:00

Falcon Momot

2013-05-24T23:00:42+08:002013-05-24T23:00:42+08:00

The drive was likely on its way out. Being unable to read from part of the drive is most definitely a failure condition, and it is certainly possible for it to happen without other typical signs of disk failure. This type of thing isn't commonly transient; with no other signs it might be a weak head, a very slight alignment issue, or a defective area on a platter (cylinder?).

The other alternative is that there was a SMART bug; you really don't want to be running a drive with buggy firmware.

Anytime you see any error at all from SMART, it is a strong sign that you should get a new drive to avoid data loss. It's intended as an early warning system, in part.

0

Does this SMART selftest indicate a failing drive?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?