I have a server with software RAID 1, two hot-swap sata disks. One hard drive started showing errors, I'm thinking about removing and replacing it, only problem is that I have no idea which of the two correspond to which devices. And I can't shut the server down to find out.
I have /dev/sda and /dev/sdb, /dev/sda is the failing one. Thought about doing something along the lines
# mdadm --manage /dev/md0 --remove /dev/sda1
then somehow stop/suspend the drive using tuning software and try to listen which of the two stopped, but that's not gonna work in a noisy server environment. Drive panels have no LEDs.
Thanks for any ideas!
Can you see S/N on disks? Use hdparm -i /dev/sda to get S/N and identify disk.
The A and B in sda and sdb should map to channels 1 and 2 (or 0 and 1) for your drives. If the system is set up so that they're labeled, you can tell that way. I don't know how your drives are structured with the wiring; I've had them numbered with small print on the motherboard so you can tell what port is going to what drive.
I supposed you could use your idea to then try feeling for vibration from the drives too, if there's enough room for you to feel the drives. Again depends on the way they're mounted.
An easy way to check which drive is which, if you have proper drive LEDs, is to just
dd if=/dev/sda of=/dev/null
And see which one has a light that is solidly stuck on.
Well, last year I wrote a script which translates that
ataX.YY
stuff to a device name, found here:Linux ATA errors: Translating to a device name?
However, my personal version of this script has gotten major enhancements since then (will now even show the controller which the HDD is connected to, for instance), so it was just a one-minute job for me to cut it down to your special purposes:
NOTE: The float_eval() auxiliary function, albeit not absolutely necessary, can avoid erroneous calculations in billions or trillions of bytes (GB resp. TB, not to be confused with GiB/TiB). Especially in TB range, such calculations may deviate more and more from their accurate values when calculated from block size in (long) integer. The main reason (or cause) is that we have never used a decimal point with HDD capacities before hitting the 1 TB mark in HDD capacities some years ago, so integer calculations may no longer be appropriate in all cases.
Besides, I would be interested in someone improving this script so that it shows serial numbers when there are two drives with identical manufacturer ID. Unfortunately, I haven't been successful in finding this information in
/sys/block/*
so far.