After 3 years in 24x7 service a 1TB Seagate Barracuda ES.2 enterprise drive is showing signs of failure. S.M.A.R.T. reallocated sector count is high.
Wikipedia article suggests that the drive can still be used for less sensitive purposes like scratch storage outside of an array if remapped sectors are left unused.
A workaround which will preserve drive speed at the expense of capacity
is to create a disk partition over the region which contains remaps and
instruct the operating system to not use that partition.
In order to create such a partition it is necessary to fetch the list of remapped sectors. However there are no badblocks visible to the operating system. I.e. badblocks
returns an empty list.
Is there a way to recover the list of reallocated sectors?
Edit: This drive is from an array. We get a few of them failing every year and just throwing them away seems to be a waste. I am thinking of giving a second chance to the better parts of the platters.
Here is how the S.M.A.R.T. report looks now.
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda ES.2
Device Model: ST31000340NS
Serial Number: **********
Firmware Version: SN05
...
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 056 054 044 Pre-fail Always - 164293299
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 14
5 Reallocated_Sector_Ct 0x0033 005 005 036 Pre-fail Always FAILING_NOW 1955
7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail Always - 8677183434
9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 24893
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 14
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 097 097 000 Old_age Always - 3
190 Airflow_Temperature_Cel 0x0022 050 043 045 Old_age Always In_the_past 50 (0 6 50 32)
194 Temperature_Celsius 0x0022 050 057 000 Old_age Always - 50 (0 18 0 0)
195 Hardware_ECC_Recovered 0x001a 021 010 000 Old_age Always - 164293299
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 21
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 21
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
You don't.
You go buy another disk to replace it unless you just really like losing data.
I'd like to thank you for the advice and share some of the details that I've got from experiments.
In short, there is no easy way to get the list of reallocated sectors and even statistical methods of mapping the disk are heavily encumbered by the need to play against the logic of the firmware.
To test the drive I ran
badblocks -wv
with the default blocksize and monitored the reallocated sector count in the process. I made several observations.I observed that there was a sharp rise in the number of reallocated sectors when writing to the beginning of the disk. Then from the first 10G to 700G there was no change. This can be explained by the fact that certain RAID houskeeping data was stored at the beginning of the disk, therefore the wear in the small addresses area was higher than in the rest of the disk.
Then after a single error the disk turned itself into a blocked mode. That is every ATA command, even
IDENTIFY DRIVE
returnedABRT
. Even though the value of reallocated sectors was still positive. To explain this behaviour as David Schwartz suggested, I assumed that reserved sectors are somehow distributed over the address space of the drive. This means that the drive might have reserved sectors, yet a part of it may run out of sectors to remap. In this situation the firmware just blocks the drive.The drive returns out of the blocked mode only after powercycling the drive. When the old drives let the software keep track of bad blocks and avoid using them, modern drives do not give this opportunity. When the firmware thinks it cannot cope with the errors, it makes the drive unusable.
By running the value of reallocated sectors down to 02 I conclude that there are 2048 reserved sectors on this drive.
So-called low level formatting, or writing zeros to every accessible sector of the drive to reallocate the sectors from less reliable parts of the disk would not work because when the drive runs out of reserved sectors it changes the way it handles errors in a way that makes it much less convenient to use than a traditional drive that does not do any predictive failure analysis and simply reports an error.
If you have business data that is worth less than the cost of the drive then use them for that, if not then throw them away or give them to people from the department who understand the risks. Contact the manufacturer and see if they offer recycling.
If the drive is still under warranty, you can return it to the manufacturer via their RMA process for a free replacement, after sanitizing it first. (Secure Erase will wipe the entire drive, including reallocated or otherwise inaccessible sectors.) (I'm quite surprised nobody suggested this.) Otherwise, you do what @SpacemanSpiff said and buy a new drive.
actually an enhanced secure erase is better as that covers the reserved blocks as well.
However: If there are really that many bad sectors, the disk is a paperweight. Ditto if it won't reallocate them or declare them ok (Pending sectors occur when there's a read issue. Most of them are "soft" errors, usually caused by external vibration.)
I've had many drives like that, llf with manufacture's tools after changing the start position if that's where most of the bad sectors are and take 5-10% off the drive capacity. If it's a decent controller and software it'll use the unallocated as spares. I ran a WD 1800 cut down to 160 GB for 5 years without trouble until the controller was torched by a bad power supply. I am presently using a Samsung similarly for TV caps, removed 100 GB of a 2 TB, more errors in a transport stream than a drive would hope of introducing so it's not an issue for a while.
Hitachi, Samsung and WD llf tools seem to do a good job of remapping, don't know about Seagate yet as they've either went into disuse or suffered immediate catastrophic failure.
*Doing these things are a lot easier now with the ultimate boot disk.
If you really want to risk your data on this disk (I wouldn't) then use
dd
to write the disk entirely to zeros.This will cause the drive to reallocate the pending sectors and the whole surface of the disk will be usable. For a while ;-)