I have been running ZFS Raid1z with 5 disk under Ubuntu 12.04 for 3 years now with no problems at all.
Unfortunately the day of failing disk has come. I have lost a disk in the array, he simply went offline and after a few days the second one started to drop errors as well. As the system detected check sum errors on the second disk that has started to fail (some bad sectors according to SMART) it started to re-silver the array and when i got to the PC and saw the re-silvering was already at 40%, in order to avoid a catastrophe I have decided to stop the server asap.
So basically my array looks like almost like this, and somewhere it is mentioned that data's were lost :
NAME STATE READ WRITE CKSUM
Misu DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F1587B OFFLINE 0 0 0 (failed hdd)
scsi-SATA_ST3000DM001-9YN_Z1F14J7V ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F14JYL ONLINE 0 0 0
scsi-SATA_ST3000DM001-1CH_W1F1G04F ONLINE 0 0 0
scsi-SATA_ST3000DM001-1CH_W1F1G1H7 ONLINE 134 5 139 (failing hdd)
Since the resilver process take some time i'm quite afraid of replacing the first disk and hope that the second one, the one that has checksum errors will not fail. So i have decided to replace the PCB on the first failed disk since it had pcb problems and not mecanical problems.
So, if i manage to make the first disk running what shall i do next, how will zfs know that the disk was not replace (not sure but i believe that changing the pcb will change the serial number and stuff for that disk) and detect the disk as the original member?
Any other information that can help me not to make this worse?