We have a 8-port 3ware 9650se raid card for our main disk array. We had to bring the server down for a pending power outage, and when we turned the machine back on, the raid card never started.
This card has been in service for a couple years without problems, and was working up until the shutdown.
Now, when we turn the machine on, the bios option rom that normally kicks in before the bootloader doesn't show up, none of the drives start, and when the OS tries to access the device, it just times out.
The firmware on it has been upgraded in the past, so it's possible we've hit some sort of firmware bug.
We're using it in a Silicon Mechanics R272 machine with gentoo for the OS. The OS eventually boots, but alas, without the card.
We've ordered a new one, but I'm worried that if we replace the card it won't recognize the existing array. Has anybody performed a card swap before?
Any help would be greatly appreciated.
Edit: These are the kernel errors we see:
3ware 9000 Storage Controller device driver for Linux v2.26.02.012.
3w-9xxx 0000:09:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
3w-9xxx 0000:09:00.0: setting latency timer to 64
3w-9xxx: scsi0: ERROR: (0x06:0x000D): PCI Abort: clearing.
3w-9xxx: scsi0: ERROR: (0x06:0x001F): Microcontroller not ready during reset sequence.
3w-9xxx: scsi0: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence.
3w-9xxx 0000:09:00.0: PCI INT A disabled
It's quite painless to swap 3ware cards.
Just make sure it's the same or newer model and that the firmware versions are the same. If the firmware versions are different, the disks won't import to the controller. (been there, done that)
Does the old card show up in
lspci
at all? I've had problems where the BIOS settings would get scrambled and cause the card to not show up at all. I had to reenable the PCI slot and also enable MSI for the 3Ware cards to appear again.This is Dan who posted previously, this time I've created an account :)
Anyway, now that my data was pulled.. I decided to screw around with the card and success!!
Downloaded LiveCD version of Ubuntu 10.04.3 LTS
Booted Live and ensured the card was detected ('tail /var/log/messages | grep 3w-')
Installed tw_cli from the following guy's repo: http://jonas.genannt.name
Downloaded the latest firmware (2.08.00.009) from CodeSet 9.3.0.8 for the 9500S-8 from http://www.3ware.com/support/downloadpageprod.asp?pcode=9&path=Escalade9500SSeries&prodname=3ware%209500S%20Series
Used tw_cli to flash the firmware (stock tw_cli from 3ware doesn't support this). I did not use the force flag, and flashed despite already having the same version.
Rebooted when it told me so.
BIOS now comes up as expected!
RMA my !@#. Perhaps I should share this with 3Ware. Big thanks to everyone for listening.
Some info on using 3ware 9650 raid cards in modern, common motherboards:
Avoid full size 9650 cards as they don't work with newer motherboards, bios fails to kick in after soft reset. In older motherboards they work fine (tested in core2 motherboards).
The low profile 9650SE cards are later made and they work fine in modern uefi, etc. motherboards.
They are still working (most of them made around 2007 perhaps?)
Did not see a failing battery yet, after 8-9 years (using them in ideal conditions, batteries always checked, charged).
You can switch cards, but use the same firmware (or newer if same version is not available). When building raids use the lower ports first, because you can also switch to a 9650 card with fewer ports easily as long as the higher ports are not used on the original card.
avoid the first x16 pci express port on the motherboard, some motherboards are expecting video cards there, causing strange behavior.
installing 3dm2 and cli is working out of the box in ubuntu (tested: 14.04LTS, 16.04LTS), just run the shell script from the install.
It's a pity that 3ware is no more, these are great products
if you use them still, sadly its time to swicth to something new. I'm afraid there is only LSI (now Broadband) to consider.
after Broadcom bought Avago they made changes to Avago website, drivers/downloads are harder to find for 3ware.
You should be good, i haven't done it with that particular card, but with many other Hardware raid cards. The only thing i would suggest you do is to toss the card in another machine, make sure it works, and is at the same BIOS level as your old card - downgrade if you have to.
3ware cards are excellent at array compatibility. Do ensure the firmware is no older then the old card (as far as you can determine), and you probably want to try and keep within the same series if possible.
Keep those two in mind and it just works.
I happened to do some repetitive booting in a machine that had a 9500S-8 and it appears to have suffered the same fate. I came across an article for the 9650 from 3ware saying how to fix it. I couldn't believe 3Ware's solution of the only option being to RMA the card.
Anyway, I haven't been successful in applying any of the said magic to revive the BIOS. Thankfully after a couple of reboots in a different machine, it's detected properly after booting (BIOS still not coming up), detected the raid array and I'm able to mount it and pull my data.
Both Ubuntu and Fedora distros show all card info except one: BIOS string not found. I'm going to pull my data before I start screwing with firmware updates, in the meantime, antiduh, if you're still around and reading this, do you have any additional info about the Redhat version or drivers or other procedure I can try? I'm not convinced a firmware update will solve this..
I have swapped an 8 port card for a 12 port card ( edit thinking about it was a 9500 not a 9650 ) and the other card has detected the array so I would have every expectation that it would work based on my experience.
We managed to bring the card back to life, magically. We took the card out of the machine and stuck it in a completely different machine running something redhat with very new drivers. The story goes that the first time it booted, the raid bios did not kick in during the boot (like we'd been seeing), but the kernel reported a lot of different errors. Eventually it was able to actually bring it up and then the next reboot the raid bios started working again and it booted cleanly. We put it back in the machine and everything came back to life.
To me, this sounds like a problem with microcode - i've seen some drivers for things like sound cards, soft raids, video cards, etc download some sort of microcode to the card when turning it on. If the last time that happened things went bad, or if it got corrupted due to the power blip from the UPSes kicking in when we lost power (walls down the hall turned into a waterfall), then that would certainly explain what happened.
Figured I'd post an update for all future googlers.
Edit 3-Jan-2012: @rakslice made the point that these cards often have battery back-ups attached. We hadn't tried to remove the battery (didn't think of it), but it's a great idea. Anybody else having this problem may want to try the same. We're still not sure if we fixed it because the Fedora kernel did some magic handshake to recover the card, or if we happened to leave it unpowered long enough for something to reset.
I've got a stable of 3Ware 9650SE cards and swapping is easy. I tested that before deploying as I have 4 and 8 port cards. However, recently my experience with 3ware soured badly. It started with a hang on the backup box with 5 x 1.5TB drives. The controller was unstable, when heavily loaded (just untaring a large tgz file), and would crash within a day of burn-in testing. A spare controller worked fine. Then a 2nd controller failed and I've sent the past 4 replacements back. They all fail within 48 hours of burn-in testing on the provided firmware or the latest. A raid 5 array of 5 to 7 drives will at times crash the system so badly that the card is not detected unless the system is powered down. A raid 5 array of 4 HDs will also fail - but it takes a few days instead of hours. The QA people will not talk to me as I don't use their approved motherboards - but I've got 3 different motherboards (all Asus, 2 AMD, one Intel) which I use for testing - and a failing card fails on all of them. The failures are basically a flurry of parity errors. Typically one will see messages about the card being unresponsive and being reset and then it just does an outright hang and corruption of the data being manipulated.
Right now I can't trust the cards. Only a burn-in test for a few days reveals if a card will be stable under load. Sending them in for warranty replacement seems to be a method to just swap a flaky card for a different flaky card!
Ive had excellent results with the 3ware 9650se. I have owned several of them: a few 2 port cards, a pair of 4 port cards, and one 12 port that I got used for a great price. I usually plug them into the PCI-e slot that is used for a video card, and they just work.
Although, I have found a bios setting that causes them to crash. Its called the PCI Latency Timer. I use a lot of AMD mainboards, and those that have this bios option will default to 64. Unless I set it to 32, nothing is stable.
Anyway, I'm about to upgrade one array to 5 x 2TB drives and I'll have to swap controllers, so your answers have given me hope.
Is the information about the array written to the drives? Is that how a different controller can import the array? (I need to see how thats done)