back story.. i took my 2 seagate 32000641AS drives out of an old dns-323 disk array.. i put them in my dell Precision T5600 since the dns-323 was old and becoming a pain to manage. i then created two separate XFS fs on them and mounted them. The disks i replaced these with were smaller 500G drives.. i copied the data from them onto a 300G usb encrypted thumbdrive. after i put in the seagates i copied the data back.. i use one for local backup and the other to run a virtualbox VM but both drives had the same data from the usb.
i noticed this today in the syslog for both drives.
smartd[809]: Device: /dev/sda [SAT], 19 Currently unreadable (pending) sectors
smartd[809]: Device: /dev/sda [SAT], 19 Offline uncorrectable sectors
smartd[809]: Device: /dev/sda [SAT], 19 Currently unreadable (pending) sectors
smartd[809]: Device: /dev/sda [SAT], 19 Offline uncorrectable sectors
however, no performance issues. also, when i was copying the data back from the usb drive, 1 DIR would not copy back over.. it kept giving me an I/O error.. i didnt need it so i just didnt copy it back.
is this the reason im getting the same exact errors on both drives? or is it coincidence?
so I did mkfs.ext4 on /dev/sdb1
and im now running
badblocks -s -v -n -f /dev/sda
Checking for bad blocks in non-destructive read-write mode
From block 0 to 1953514583
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: 1.43% done, 36:36 elapsed. (0/0/0 errors)
and so far no errors with badblocks, but then i got this in syslog
smartd[809]: Device: /dev/sda [SAT], 19 Currently unreadable (pending) sectors
smartd[809]: Device: /dev/sda [SAT], 19 Offline uncorrectable sectors
smartd[809]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 116 to 117
smartd[809]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
smartd[809]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
smartd[809]: Device: /dev/sdb [SAT], 35 Currently unreadable (pending) sectors
smartd[809]: Device: /dev/sdb [SAT], 35 Offline uncorrectable sectors
smartd[809]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 116 to 113
smartd[809]: Device: /dev/sdb [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 35 to 49
Yup!
SMART data is produced by the hard drive firmware itself; it isn't really possible for you to get false positives there. If the smart data is saying you have bad blocks on both your drives, then you have bad blocks. Some small chunks of the spinning rust have gone bad, and there's no way to fix them. This is a slow process; it happened as they were aging in their previous home. The fact that the numbers were identical at first is interesting, but not really shocking; the drives would have come from the same manufacturing lot, and thus have very similar properties. If you'd like to be sure, you can try looking into firmware updates; it's possible for bad firmware to cause the device to falsely detect errors. However, the likley explanaiton is the simple one; bad blocks on both.
Now, its not the end of the world; you've lost some data (on the drive), and the drives are likely to lose more, or fail outright. But you can keep using them, provided that said data is also going to another, likely-good drive. Depending on your RAID setup, it should maintain the two copies whenever a block comes up bad. Don't RAID the two aging drives into one, since when they go, they'll go together. And run
xfs_scrub
(or whatever your preferred filesystem is) over the RAID'd data on a routine basis, to detect more bad blocks.First, your data of:
It doesn't look like either of your two Seagate 32000641AS drives, but rather, your boot drive SDA. In either case, because they're Seagate 2G drives, they should be formatted in GPT, not MBR formats.
Regarding:
If you look at
man badblocks
you'll see that badblocks indicates not to run directly...The correct way to bad block a disk is:
sudo e2fsck -fccky /dev/sdXX
# where sdXX is the drive you want to testThe -k is important, because it saves the previous bad block table, and adds any new bad blocks to that table. Without -k, you loose all of the prior bad block information.
The -fccky parameter...