Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized:
[root@fs-2 ~]# tail -18 /var/log/messages
May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake }
May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset
May 5 16:54:45 fs-2 kernel: ata1: soft resetting link
May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:54:55 fs-2 kernel: ata1: soft resetting link
May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:05 fs-2 kernel: ata1: soft resetting link
May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
May 5 16:55:40 fs-2 kernel: ata1: soft resetting link
May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up
May 5 16:55:45 fs-2 kernel: ata1: EH complete
I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work:
[root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan
-bash: echo: write error: Invalid argument
[root@fs-2 ~]#
[root@fs-2 ~]# /root/rescan-scsi-bus.sh -l
[snip]
0 new device(s) found.
0 device(s) removed.
[root@fs-2 ~]#
[root@fs-2 ~]# ls /dev/sda
ls: /dev/sda: No such file or directory
I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting?
The operating system in question is RHEL5.3:
[root@fs-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS.
Here is the lscpi output:
[root@fs-2 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful.
The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output:
[root@www-1 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)
If your SATA controller supports hot swap, it should "just work(tm)."
To force a rescan on a SCSI BUS (each SATA port shows as a SCSI BUS) and find new drives, you will use:
On the above, < n > is the BUS number.
When a drive has failed in some circumstances Linux won't realise you've actually pulled it physically from the array. If you have that problem (as I did this morning) you can do the following:
e.g., in my case, /dev/sda had failed and I didn't want to reboot the server, so I did:
After I did that, the new drive (which had actually been physically added already) was immediately visible.
If it is not visible at this point, you can also do this to force a re-scan:
That "- - -" is wildcards for channel, id & LUN respectively, so you can restrict the scan to some subset if you want by specifying numbers instead.
Before you start, you could also:
Which will show you the path with the right host number to check in /proc/scsi/scsi for disappearence after removal.
I can't believe nobody mentioned AHCI yet... your SATA controller has to be in AHCI mode to enable hot swap. Check this by looking at the driver you are using:
See how it says "ahci" there.
If it doesn't, then just enable it in your BIOS. Also, some BIOSses, especially on servers or UEFI have a "Hot Swap = enabled/disabled" setting per disk which you should also enable if it exists.
How about this (seems to work in Ubuntu):
sudo partprobe
Here's why I needed to reboot the computer...
I just hot-swapped my /dev/sdc. I have used scsiadd -r 3 0 0 to power the old disk off before pulling it out. Then after installing the new disk the new disk didn't appear as /dev/sdc but rather as /dev/sdd. After a reboot, the disk would reappear as /dev/sdc again.
So it seems hotswap works Ok, it may be just that the /dev/sd* isn't the same anymore.
Could this be an answer to your problem?
In some cases hot-swap may need to be enabled on the BIOS of either the motherboard and/or the SATA controller. This completely depends on the make and model of both, but if you have on-board SATA controllers that should support hotswap then it's worth combing through the motherboard BIOS. SATA cards may or may not have their own BIOS settings, many lower-end cards don't, but server-grade cards typically do.
If I recall correctly I've needed to this with a number of Gigabyte motherboards, and perhaps some other makes. I needed it for a hot-swap SATA tray to work; with the feature disabled removing the drive didn't cause issues but a new drive wouldn't register until reboot. Enabling the setting worked as-expected, drives that were placed in the tray were immediately spun up and available to the OS.
For hotplug to work you must have the acpiphp module loaded.
obviously if you want this to work on boot, you will have to configure that to be loaded at boot time - one way is to create / edit /etc/rc.modules (which is called by rc.sysinit) and add the line :
remember if you create this file to chmod +x it, as it's called in that manner.
My DVD on my Fedora 16 machine is connected to a SATA interface. It was locked up and would not open or close. Running partprobe as root got my cdrom/DVD working again. I reckon it will help on anther machine where I have the occasional hot swap problem. Thanks!
The Fusion-MPT SAS controller you have is a low end RAID controller. If you're not using it for RAID, it may still be providing an unhelpful layer of obstruction/abstraction.
You may need to poke at the RAID controller with mpt-status or lsiutil to get it to actually scan the bus.
http://hwraid.le-vert.net/wiki/LSIFusionMPT has a nice amount of documentation, but I can't say I've verified it.