At the moment I am trying to migrate data from a software RAID5 (6 disks) over to a software RAID1. The command of choice was:
rsync -avxHACPX /mnt/old/ /mnt/new/
However, after the first few files the machine locks up completely. First I thought that this was because the disks were connected using a USB3 extension card, but even when directly connecting them using SATA the situation won't change.
At the moment I am running a long SMART test using smartctl
, but that will take a long time to finish. And in the meantime I'd like to find out how to diagnose such an issue.
Several years ago one used to be able to follow the kernel log on one of the terminals, but it seems Ubuntu doesn't provide that anymore. Since the log files don't contain anything useful whatsoever (likely because they don't get written once the error condition occurs) I'm left wondering how to diagnose such an issue?
My question is also more of a general question about diagnosing such an issue when I come across it. Right now the only chance I seem to have is to note down the files that contain errors and then skip those when copying. But that's the actual task. On a more abstract level I'd like to learn about strategies like what I could do in the past by watching the kernel console.
NB: I prefixed my invocation of rsync
with ionice -c 3 nice
first, but then after this gave the error, removed it. The error occurs independent of this.
Further information: source volume is XFS, target disk is ext4. I am always mount -o ro,remount
ing the source volume before starting the copy operation. The source volume also happens to be an LVM2 volume layered on top of the software RAID5 (md).
smartctl -a
output (filtered):
# for i in $(blkid |grep '/sd'|cut -b 1-8|sort|uniq); do echo $i; smartctl -a $i|grep -A 1 '^SMART Error Log Version'; done
/dev/sda
SMART Error Log Version: 1
No Errors Logged
/dev/sdb
SMART Error Log Version: 1
ATA Error Count: 1
/dev/sdc
SMART Error Log Version: 1
No Errors Logged
/dev/sdd
SMART Error Log Version: 1
No Errors Logged
/dev/sde
SMART Error Log Version: 1
No Errors Logged
/dev/sdf
SMART Error Log Version: 1
No Errors Logged
/dev/sdg
SMART Error Log Version: 1
No Errors Logged
/dev/sdh
SMART Error Log Version: 1
No Errors Logged
/dev/sdb
is one of the physical disks that comprise the source volume's physical volume.
The iostat
output you asked for:
sdc 0.00 3.50 0.00 2.00 0.00 22.00 22.00 0.45 226.00 0.00 226.00 78.00 15.60
sdd 38.50 0.00 86.00 0.00 6982.00 0.00 162.37 0.27 3.14 3.14 0.00 2.95 25.40
sde 39.50 0.00 88.00 0.00 7064.00 0.00 160.55 0.43 4.95 4.95 0.00 4.30 37.80
md1 0.00 0.00 625.00 0.00 34984.00 0.00 111.95 0.00 0.00 0.00 0.00 0.00 0.00
sdf 40.00 0.00 84.50 0.00 6994.00 0.00 165.54 0.40 4.73 4.73 0.00 4.43 37.40
sdg 0.00 107.00 0.00 250.00 0.00 18018.00 144.14 1.29 5.06 0.00 5.06 0.61 15.20
sdh 0.00 107.00 251.00 6.50 16034.00 2434.00 143.44 2.54 9.69 9.74 7.69 0.60 15.40
md6 0.00 0.00 0.50 113.50 2.00 2434.00 42.74 0.00 0.00 0.00 0.00 0.00 0.00
md5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
BIGDISK 0.00 0.00 625.00 0.00 34984.00 0.00 111.95 2.59 4.19 4.19 0.00 0.92 57.80
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 38.50 0.00 86.50 0.00 6982.00 0.00 161.43 0.40 4.58 4.58 0.00 4.09 35.40
sdb 39.00 0.00 87.00 0.00 6898.00 0.00 158.57 0.38 4.37 4.37 0.00 3.91 34.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 40.50 0.00 86.00 0.00 7028.00 0.00 163.44 0.30 3.51 3.51 0.00 3.16 27.20
sde 37.50 0.00 86.50 0.00 6972.00 0.00 161.20 0.39 4.51 4.51 0.00 4.05 35.00
md1 0.00 0.00 626.50 0.00 34772.00 0.00 111.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 38.50 0.00 86.50 0.00 7002.00 0.00 161.90 0.42 4.86 4.86 0.00 4.23 36.60
sdg 0.00 470.00 0.00 277.50 0.00 80506.00 580.22 68.39 246.57 0.00 246.57 2.54 70.40
sdh 0.00 459.50 128.50 152.00 8224.00 71834.00 570.82 72.01 256.88 6.66 468.42 2.52 70.80
md6 0.00 0.00 0.00 612.50 0.00 71834.00 234.56 0.00 0.00 0.00 0.00 0.00 0.00
md5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
BIGDISK 0.00 0.00 626.50 0.00 34772.00 0.00 111.00 2.69 4.30 4.30 0.00 0.93 58.20
/dev/sdb
is the device smartctl
reports with errors.
To answer the question and give further pointers to other people stumbling over this. It turned out that the motherboard of the machine was dying. Some of the capacitors had actually burst.
Lesson learned: don't rule out actual hardware failures in such really awkward cases.
btw: I was able to salvage all data from the RAID5 array.