This is a Mint 21.1 x64 Linux system, which has over the years had disks added to RAID arrays until we now have one array of 10 3TB and one array of 5 6TB. Four HDs dropped out of the arrays, two from each, apparently as a result of one controller failing. We've replaced controllers, but that has not restored the arrays to function. mdadm --assemble
reports unable to start either array, insufficient disks (with two failed in each, I'm not surprised); mdadm --run
reports I/O error (syslog seems to suggest this is because it can't start all the drives, but there is no indication that it tried to start the two apparently unhappy ones), but I can still mdadm --examine
failed disks and they look absolutely normal. Here's output from a functional drive:
mdadm --examine /dev/sda
/dev/sda:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 829c0c49:033a810b:7f5bb415:913c91ed
Name : DataBackup:back (local to host DataBackup)
Creation Time : Mon Feb 15 13:43:15 2021
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 5860268976 sectors (2.73 TiB 3.00 TB)
Array Size : 26371206144 KiB (24.56 TiB 27.00 TB)
Used Dev Size : 5860268032 sectors (2.73 TiB 3.00 TB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=944 sectors
State : clean
Device UUID : 6e072616:2f7079b0:b336c1a7:f222c711
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Apr 2 04:30:27 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 2faf0b93 - correct
Events : 21397
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAA..AA ('A' == active, '.' == missing, 'R' == replacing)
And here's output from a failed drive:
mdadm --examine /dev/sdk
/dev/sdk:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 829c0c49:033a810b:7f5bb415:913c91ed
Name : DataBackup:back (local to host DataBackup)
Creation Time : Mon Feb 15 13:43:15 2021
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 5860268976 sectors (2.73 TiB 3.00 TB)
Array Size : 26371206144 KiB (24.56 TiB 27.00 TB)
Used Dev Size : 5860268032 sectors (2.73 TiB 3.00 TB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=944 sectors
State : clean
Device UUID : d62b85bc:fb108c56:4710850c:477c0c06
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Apr 2 04:27:31 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : d53202fe - correct
Events : 21392
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
Edit: Here's the --examine report from the second failed drive; as you can see, it failed at the same time the entire array fell off line.
# mdadm --examine /dev/sdl
/dev/sdl:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 829c0c49:033a810b:7f5bb415:913c91ed
Name : DataBackup:back (local to host DataBackup)
Creation Time : Mon Feb 15 13:43:15 2021
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 5860268976 sectors (2.73 TiB 3.00 TB)
Array Size : 26371206144 KiB (24.56 TiB 27.00 TB)
Used Dev Size : 5860268032 sectors (2.73 TiB 3.00 TB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=944 sectors
State : clean
Device UUID : 35ebf7d9:55148a4a:e190671d:6db1c2cf
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Apr 2 04:27:31 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : c13b7b79 - correct
Events : 21392
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
The second array, 5x6TB, fell off line two minutes later when two disks quit. The two failed disks on this array, and the two on the other array, all connected to a single 4-port SATA controller card which of course has now been replaced.
The main thing I find interesting about this is that the failed drive seems to report itself as alive, but mdadm
doesn't agree with it. journalctl
doesn't seem to go back as far as 2 April, so I may not be able to find out what happened. Anyone have any ideas about what I can do to bring this beast back online?
mdadm
commands. With these backups at hand you can later attempt recovery on a VM outside the box.Update time
field in for failed drives in the output ofmdadm --examine /dev/sdX
to determine exact sequence of events when drives were falling out of the array. Sometimes the first drive failure comes unnoticed and bringing that old drive online will result in a catastrophic failure while trying to mount a filesystem.mdadm --assemble --force /dev/mdX
ormdadm --assemble --force --scan
. If it were not the case, you should force online only the last drive that fell off the array by specifying array member drives formdadm --assemble --force /dev/mdX /dev/sda /dev/sdb missing /dev/sdd
, note that the order of drives is important.assemble
I believe your array is currently in a degraded state with that/dev/sdh
marked offline. Look into the output ofcat /proc/mdstat
to determine that, do a backup, troubleshoot your hardware and rebuild your array completely after that.