I'm running ubuntu server 20.04 with 4x 4TB drives in a RAID5 array using mdadm to provide a non-root, ext4 partition. I have a few rsync scripts to backup key folders and one of them generated an error so I reran it and located the offending file. It is a 3GB mp4 video. I'm looking for any additional guidance on how to to either repair this file or repair the file-system in some way so that the errors stop occurring. I don't mind losing the file but I want to make sure I'm not masking some other corruption if I simply delete it. I'm reasonably certain the hardware is ok (more on this below).
My troubleshooting notes so far:
- the file itself appears to "work" just fine (I can open it in vlc and scroll back and forth in time, it has no sound and the video is fairly boring so I'm not sure if there is a glitch, but vlc doesn't complain)
- if I try to copy the file anywhere, including onto the same drive, it gets about 1GB and in Nautilus an error pops up: "Error splicing file: Input/output error". The 1GB file fragment won't open in vlc.
- if I try to copy the file with "safecopy", it indicates problems:
safecopy DJI_0719.MP4 /media/BigData/SSoT/test.mp4
Low level device calls enabled mode: 1
Reported hw blocksize: 4096
Reported low level blocksize: 4096
File size: 3026885347
Blocksize: 4096
Fault skip blocksize: 65536
Resolution: 4096
Min read attempts: 3
Head moves on read error: 1
Starting block: 0
Source: DJI_0719.MP4
Destination: /media/BigData/SSoT/test.mp4
......................................... [40961]
......................................... [82945]
......................................... [124929]
......................................... [166913]
......................................... [208897]
....................................!![245901](+1007210496){X [245917]
XXXXXXX<<<<}[246016](+471040)
.!![246157](+577536){XXXXXXXX<<<<}[246272](+471040)
.!![246400](+524288){XXXXXXXX<<<<}[246528](+524288)
.!![246656](+524288){XXXXXX<<<<}[246748](+376832)
.!![246912](+671744){XXXXXX<<<<}[247004](+376832)
......................................... [287965]
......................................... [329949]
......................................... [371933]
......................................... [413917]
......................................... [455901]
......................................... [497885]
......................................... [539869]
......................................... [581853]
......................................... [623837]
......................................... [665821]
......................................... [707805]
.............................._ ;-} 100%
Done!
Recovered bad blocks: 0
Unrecoverable bad blocks (bytes): 36 (2220032)
Blocks (bytes) copied: 738985 (3026885347)
The resulting file also fails to open in vlc.
- none of the above generate any entries in dmesg
- unmounting the partition and running fsck doesn't seem to indicate any problems:
sudo fsck -p -f /dev/md0
fsck from util-linux 2.34
BigData: 744159/366272512 files (3.2% non-contiguous), 2317034203/2930164224 blocks
- I used
sudo echo check > /sys/block/md0/md/sync_action
to get mdadm to scan the drive, afterwards all drives are available andsudo cat /sys/block/md0/md/mismatch_cnt
returns 0 - smartctl indicates all drives in the array pass self checks with reallocated sector count, current pending sector, and offline uncorrectable all 0. I haven't yet tried low-level tests although since I'm encountering errors at the user level I would have assumed these would show up at the drive level as well.
sudo mdadm -E /dev/sdX1
(where X=a, b, c, and d) indicates bad blocks on drives sdb and sdd - both showing:
Bad Block Log : 512 entries available at offset 24 sectors - bad blocks present.
What's interesting is that those are the 2 drives I've already replaced in this array (e.g. sdb has about 3,000 hours, sda has 30,000).
Is it possible mdadm somehow needs to get its badblocks flushed / rebuilt / retested and is there a command to do this? I replaced the 2 problem drives over a year apart if that matters (i.e. if it's expected mdadm would fix the bad blocks each time the array gets repaired).
Thanks for any suggestions.