I'm wondering if the results of this SMART selftest indicate a failing drive, this is the only drive that comes up with 'completed: read failure' in the results.
# smartctl -l selftest /dev/sde
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 8981 976642822
# 2 Extended offline Aborted by host 90% 8981 -
# 3 Extended offline Completed: read failure 90% 8981 976642822
# 4 Extended offline Interrupted (host reset) 90% 8977 -
# 5 Extended offline Completed without error 00% 410 -
The drive doesn't yet show any signs of failure, aside from the output from that SMART selftest. This is the output from a different drive in the same system which is currently running a SMART selftest
# smartctl -l selftest /dev/sdc
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 30% 15859 -
# 2 Extended offline Completed without error 00% 9431 -
# 3 Extended offline Completed without error 00% 8368 -
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1
3 Spin_Up_Time 0x0027 176 175 021 Pre-fail Always - 4183
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 48
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 8982
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 46
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 34
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 13
194 Temperature_Celsius 0x0022 111 101 000 Old_age Always - 36
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2
Hopefully you've long since replaced the drive, but since no one has yet directly answered the question...
You ran two tests, both of which failed to read the same logical sector of the disk, as indicated by
Completed: read failure
and the same LBA in both tests. This does indeed indicate the disk has a defect, and you should be able to have it replaced under warranty. Attempting to store data in this sector may or may not cause the drive to notice it's defective during the write process and remap the sector, but if the drive doesn't notice, and can't read the data later on, you've lost it.I want to add to the comments in the other answer, but I can't due to lack of rep, go figure.
You don't need to make a cron script, there is a smartd daemon in the smartmontools package that handles just what you want to do: regular checking of SMART status. All you need is to create a configuration and start the service. The smartmontools package also contains some sample scripts that smartd can call when something starts failing.
Is your data worth risking on a suspect drive?
If it were me, I'd replace the drive and be thankful that SMART saved me a big headache.
What will I do in your situation?
First of all I find out which files are affected. There are some instructions how to do this https://www.smartmontools.org/wiki/BadBlockHowto. Yeah. In your case it is harder because you have an array. But it is possible. Than, ensure that this file is backuped, than write zeros to the failing sector. Two things can happen.
In any case you end up with a fixed drive. You should restore your file from backup (because you overwrote one sector of it). Also you should rerun en extended self-test to ensure that there are no more errors.
Stay healthy!
P.S. I know that this post is kind of old. But I goolged it. And I think it is a good idea to provide another good answer.
Backup as soon as you can!
If this drive is still in warranty, then
badblocks
tool can be also used for this (you already have backups, right?)The drive was likely on its way out. Being unable to read from part of the drive is most definitely a failure condition, and it is certainly possible for it to happen without other typical signs of disk failure. This type of thing isn't commonly transient; with no other signs it might be a weak head, a very slight alignment issue, or a defective area on a platter (cylinder?).
The other alternative is that there was a SMART bug; you really don't want to be running a drive with buggy firmware.
Anytime you see any error at all from SMART, it is a strong sign that you should get a new drive to avoid data loss. It's intended as an early warning system, in part.