Our backup "solution" includes hooking up a USB drive to the backup server, and running a custom script that rsync
s the data on to the USB drive. However, after a while, the drive becomes read-only. Here's the output of dmesg:
[2502923.708171] sdb: sdb1
[2502923.742767] sd 36:0:0:0: [sdb] Attached SCSI disk
[2502980.368020] kjournald starting. Commit interval 5 seconds
[2502980.482705] EXT3 FS on sdb1, internal journal
[2502980.482705] EXT3-fs: recovery complete.
[2502980.488709] EXT3-fs: mounted filesystem with ordered data mode.
[2590744.432168] usb 1-2: USB disconnect, address 36
[2590744.432655] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.432784] end_request: I/O error, dev sdb, sector 795108447
[2590744.432857] Buffer I/O error on device sdb1, logical block 99388548
[2590744.432925] lost page write due to I/O error on sdb1
[2590744.433002] Buffer I/O error on device sdb1, logical block 99388549
[2590744.433070] lost page write due to I/O error on sdb1
[2590744.433139] Buffer I/O error on device sdb1, logical block 99388550
[2590744.433207] lost page write due to I/O error on sdb1
[2590744.433275] Buffer I/O error on device sdb1, logical block 99388551
[2590744.433343] lost page write due to I/O error on sdb1
[2590744.433410] Buffer I/O error on device sdb1, logical block 99388552
[2590744.433478] lost page write due to I/O error on sdb1
[2590744.433545] Buffer I/O error on device sdb1, logical block 99388553
[2590744.433613] lost page write due to I/O error on sdb1
[2590744.433681] Buffer I/O error on device sdb1, logical block 99388554
[2590744.433749] lost page write due to I/O error on sdb1
[2590744.433817] Buffer I/O error on device sdb1, logical block 99388555
[2590744.433884] lost page write due to I/O error on sdb1
[2590744.433953] Buffer I/O error on device sdb1, logical block 99388556
[2590744.434021] lost page write due to I/O error on sdb1
[2590744.434089] Buffer I/O error on device sdb1, logical block 99388557
[2590744.434157] lost page write due to I/O error on sdb1
[2590744.443942] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.447945] end_request: I/O error, dev sdb, sector 795108687
[2590744.452065] Aborting journal on device sdb1.
[2590744.452065] __journal_remove_journal_head: freeing b_committed_data
[2590744.452410] EXT3-fs error (device sdb1) in ext3_ordered_writepage: IO failure
[2590744.453795] __journal_remove_journal_head: freeing b_committed_data
[2590744.454481] ext3_abort called.
[2590744.454548] EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
[2590744.454697] Remounting filesystem read-only
[2590744.457033] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #11968705 offset 0
[2590776.909451] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #122881 offset 0
[2590777.637030] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #30015490 offset 0
[2590949.026134] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591121.070802] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591211.109072] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591300.269439] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591357.322837] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591418.664452] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591572.792037] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591667.952082] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591669.639597] __ratelimit: 3981 messages suppressed
[2591669.639658] Buffer I/O error on device sdb1, logical block 61014530
[2591669.639698] lost page write due to I/O error on sdb1
I'm not unmounting the drive within my script; can anyone suggest what would be causing this, so I can fix it?
When that happens to me with a fixed disk, it means the disk is dying. Most likely this is what is happening here. If this is a backup drive that is repeatedly connected/disconnected/transported-between-locations, it is very possible that a shock or repeated thermal changes have resulted in a flaw. Most of these USB drives are not specially protected against drop/shock or thermal changes, they are just a standard SATA drive in a USB-to-SATA plastic housing.
My rule of thumb for disks, especially when it comes to backups, is: if there's a doubt, throw it out.
To rule out the USB infrastructure, you could run the disk extensively on another computer, which doesn't actually solve your problem since you still have to back up the computer.
More information further to David Mackintosh above (his answer is very good). The filesystem itself has the option to tell the kernel to remount it read-only when it encounters an error.
From the mount(8) man page:
I would warrant that if you're not mounting with errors=remount-ro then the filesystem has that set as an option (sample from my dumpe2fs below)
You might be able to find out what SMART thinks is wrong with the drive by running smartctl
I would agree with David, give serious consideration to replacing the drive. There won't be anything worse than having to recover all your data only to find that it's unreadable.