After a complete failure of an IT person came to my client's office to upgrade the harddrives on a server (windows 2003 RC2), our server now is basically unusable. The machine will boot up but after about an hour of uptime, something happens, not sure what, and the 2nd CPU goes completely red in Task Manager. Kernel mode and CPU usage are at 100%.
The System event log gets filled with this error -- 4-5 a minute:
The driver for device \Device\Scsi\viamraid1 detected a port timeout due to prolonged inactivity. All associated busses were reset in an effort to clear the condition.
I even let this run for 9 hours today after I left and there were 2000+ of those messages in that time frame.
The server is unusable, and has rendered my client completely unable to do business. I'm not an IT guy (I'm a programmer), it's Thanksgiving, and I am completely out of my element.
Anyone have any ideas about that message? Ever seen it before? Ever solved it?
More info: The server has 2 drives in a RAID 0 (I think, that or RAID 1) array. SCSI drives. The previous IT guy got it so messed up that he took the drives out of RAID supposedly, and now when booting, it has to boot to drive 0 just enough to read boot.ini, then we have to choose to boot from drive 1. We can't just boot to drive 1 for some reason.
At first I thought the issue was SQL Server related, as right before the server flipped out we had started an intensive query, but even after I stopped all SQL services, rebooted, etc, it still flipped out on its own after an hour. NOTHING was happening on the server. I mean, no one is in the office, no processes were started (that I know of), etc. Just flipped out.
That's a VIA SATA chipset, which makes my skin crawl being in a "server". It's got known compatibility issues with a variety of hardware, everything of power supplies that produce voltage out of VIA's spec to Seagate HDs with particular firmware. If you get to select between the two drives then your not using the hardware RAID, possibly software, but can't tell without more info.
An off the wall guess: The chipset isn't compatible with the new hard drives, and when they do a partial self test the chipset freaks out throwing a timeout error (seems like about the right time frame).
I'd start with looking for the newest firmware for the MB, chipset, and HDs. Then the newest drivers for the chipset and SATA controller.