I have a 1998-era Netware 3.12 server that runs everything on our campus: general ledger, purchasing, payroll, student information, grades, you name it. The server has an Adaptec RAID controller with two volumes:
- RAID 1, 2 17GB scsi disks, Seagate ST318417W
- RAID 5, 3 4GB scsi disks, 2 Seagate ST34573W and 1 ST34572W.
We are currently in the early stages of a project to replace this system, but you don't just jump into a new system like that and so I need to keep this server running until at least November 2011.
This week we had not one but two hard drives fail. Thankfully they are from different volumes and we're able to keep running for the moment, but given the close nature of these failures I have serious doubts that I'll be able to avoid catastrophic failure from this server through the November target as is without restoring the RAID redundancy — it'll only take one more drive failure anywhere and I'm completely hosed.
We are fortunate enough to have exact match "spares" lying around for both drives, but the spares are in unknown condition. I tried swapping just them in, but the RAID controller isn't smart enough to handle this and it renders the system unbootable.
As for the RAID controller itself, there is utility I can get into during POST via a Ctrl-A shortcut, but I can't do much useful from there. To actually manage volumes I must first boot in to Netware, at which point I can use CI/O Array Management Software Version 2.0
to actually look at volume information. I suspect that the normal way to manage things is to boot from a special floppy with the controller software on it, but that floppy is long gone.
Going through the options in the RAID software, I think the only supported way to replace a disk in an existing RAID volume is to physically add the disk, boot up and configure it as a "spare" for a volume, force the volume to use the spare to replace an existing down disk (and at this point I'm only guessing) so that the down disk becomes the spare, repair the volume, remove the spare from the volume, and then shut down and remove the disk. Then start all over for the other failed disk. All this amounts to a lot of downtime, assuming I can even make it work and that my spares are any good.
As for finding reliable spares, I have no clue where to even begin looking to find a new 4GB scsi drive, or even which exact scsi system I'm looking for, as it's gone through a few different iterations over time.
Another option is to migrate this to a virtual machine (hyper-v), but all previous attempts we've made in this area have failed to get very far. When this machine was installed I was just graduating from high school, and so it requires lower level knowledge of netware and dos than I ever developed, or if I did have since forgotten (I'm not exactly a dos neophyte, either).
Part of my problem is this is a high-use server, and taking it down for a few days to figure things out isn't gonna fly very well.
As for the question, I'm looking for anything that might be helpful in this situation: a recommendation on a place to find good spares from this era, personal experience repairing RAID volumes using a similar controller or building a hyper-v vm from an old netware server, a line on a floppy with better software for the RAID controller, recommendation on a good Novell consultant in Nebraska that would be able to put things right, a whole other option I haven't considered yet, etc.
Update:
For backups, we have good (recently verified via restore) backups of the data only -- nothing for the software that actually runs things.
Update 2:
Just a progress report that I currently have a working Netware 3.12 install in VMWare Virtual Server 2.0, thanks largely to the guide I found here:
http://cerbulescubogdan.blogspot.com/2010/11/novell-netware-312-on-vmware.html
The next steps are preparing empty netware volumes to match the additional volumes on my existing server, taking a dump of everything on the C:\ drive and netware volumes on my existing server, and figuring out from that information what modules need added to netware, installing my licenses (we do still have that disk, if it's any good), and moving data over.
I have approval to bring the server down for a week after the first of the year (sadly not before), so, aside from creating empty volumes, the rest of the work will have to wait until then.
Final Update (Jan 5, 2011):
I was able to get spares working in both raid arrays without data loss this week. Both are now listed by the controller as "FAULT TOLLERANT" (yay!). I was also able to build on the progress from my last update and now have a functional "spare" server in VMWare Server 2.0. The spare can run and use our erp software, but I can't put it into production because I can't (yet) print from that box (and I have no idea why). Even so, this VM will do in a pinch if I have no other choice, and between it and the repaired RAID arrays I'm comfortable living with the situation until I can junk the machine in November.
Epilogue (Jan 16, 2012):
The project to replace this server with a whole new system did go live as planned. Hurray for no more netware! All hail Sql Server! The King is dead. Long live the King!
We still plan to keep the old server running for a while longer, until after our post-fiscal year audit completes in August. But if a failure happens between now and then, no one would complain too much.
Get (and continue to get, daily or more frequently) good backups of the shared file data now. If you lose the machine you probably aren't going to be able to find the necessary diskettes (yep) to restore it. Get a copy of the DOS partition that Netware boots out of, if possible, too.
That sounds like an Adaptec AAA-131 RAID card (or something of that era). If I'm right you're not going to find much better management software because none exists (see http://www.adaptec.com/en-us/downloads/novell_netware/novell_netware/productid=aaa-131&dn=aaa-131.html for the last available versions). I used a lot of those cards "back in the day" and they worked okay.
If it is a AAA-131 be exeedingly careful when playing with its configuration. There is no way to configure a RAID set w/o wiping the disks on those cards. That means, for example, if you take the box down and attach some test disks and, say, clear the configuration and make a RAID set on them, when you plug the "production" disks back in there will be no way to use them without the card formatting them first. Yeah. It's that bad.
Novell Netware will run in the VMware hypervisors. I'd recommend contracting with somebody who has decent Novell Netware experience (there are people on here-- I'm looking at you, Sysadmin1138-- who have it) to help you get the contents of the server moved into a virtual environment where, at least, you can keep it going.
If your clients computers are modern and have a Microsoft networking client installed you might find that migrating to a Windows Server-based machine would actually be quick and easy. Bring the Windows Server machine up with the same name as the Netware server, expose a shared directory structure with the same UNC naming convention as the Netware machine, copy all the files over, and duplicate the permissions on the destination machine (by hand). It might not be all that difficult to do and you could "stage" the migration in a test lab beforehand and test some clients with it to decide what needs to be changed from a script / user environment perspective.
You can probably get some spare hardware from eBay. Anything you buy of that vintage, though, is going to have reliability problems, too.
If I were you, I'd be getting somebody good with Windows Server in there to help you stage a migration away from that box NOW. The case can probably be made to management to spend some money giving that you could lose the entire contents of the Netware box at virtually any time. The replacement box wouldn't need massive horsepower (given what you're replacing) so software licensing and backup would be your biggest costs. Client-related migration issues could be minimized by using a consultant who is good w/ scripting and can plan for the details of changing client-related settings thru logon and startup scripts.
I know, because I've done it (Hi Evan), that VMWare does have decent NetWare support. Even for the really old stuff (what you're running). NetWare of that vintage NOOPs the CPU when idle instead of HALTing it, so whatever CPU it is given in a VM will be pegged. This is what the VMWare Tools are for, they make it not do that. VMWare has been around since the 90's (and even has had a booth at BrainShare for several years) and has had to do it, this is why they have support. Microsoft's virtualization is new enough that they've never had to virtualize NetWare, so it doesn't work there.
If this server is as critical as you say, springing for some VMWare licenses should be an easy sell. At minimum, spring for a VMWare Workstation license, which will at least get this server into a virtual environment. VMWare Server is free (I believe) if you really have to. Once that job's done, you can consider moving it to something like ESXi until it can be more formally replaced.
There are other options, depending on your Linux skills. Novell has spent quite some time getting Xen (not KVM, Xen, though both use qemu) to support NetWare. It probably will work with NW3.12, though you'll need to be sure you use full virtualization mode, not paravirtualization.
That server is new enough it should have a CD-ROM drive in it, which will probably be your saving grace. Once you get your backup done, boot it to an ISO-Linux if your choice. It won't be able to get at the data, but it should see the hard-drive. At that point, do a complete
dd
copy of both volumes to somewhere else on your network. Those drive images can be used directly by qemu as virtual drives.There are ways to convert dd-generated images into VMware VMDK's, but I haven't used them myself. Google them, they're out there.
This isn't really helpful in terms of your question (quite frankly you already HAVE spares, and the only useful suggestion I have for sourcing vintage disks would be "Feed the drive model numbers to Google Shopping"), but before you touch anything else you should really MAKE DAMN SURE YOU HAVE A GOOD BACKUP AND CAN SUCCESSFULLY RESTORE IT TO A NEW MACHINE IN A USABLE STATE.
If this machine is as critical as it sounds from your description that should be your zeroth priority right now. If you haven't done a successful restore test on your backups in a while you should assume they're worthless, and you need to ensure that you can actually recover should this machine wheeze its last and die on you.
If another disk drops dead on you and you have no usable backups that's pretty much the ballgame. You'll be moving to your new system immediately, whether you're ready or not.
Just my $3.50.
Others have already addressed backups, etc., so I won't repeat any of that. There are a couple of things you can do to improve your chances of the system continuing to function.
Start by investing in a really good quality line filter and place that between the UPS and the server. Those old drives will by now be rather touchy about surges, spikes and even fairly small supply fluctuations.
I see from you update that you have already installed the spare drives but this is what I would have recommended: Before trying the spare drives in the server put them in another machine and stress the crap out of them with burn-in software or, if you can't get hold of that, continuous test cycles using regular drive test software. Keep that up for at least a few days before declaring the drives trustworthy. Old drives that have been in storage are notoriously unreliable and can fail at the drop of a hat.
Excellent suggestions up above. Try this also - on spare modern hardware, try doing a recovery of the whole system from your last full backup. Make sure the spare machine isn't on the network.
What's that, I fear you might say? You don't have backups and/or a restore procedure? Well, now you know what you're working on for the next week?
Answering only to doubly/triply/quadruply recommend making a backup every day until you do figure out a solution. If you can't easily replace the dead drives, your only solution is to migrate to new drives. Whether that means building a new server, or slowly migrating your existing server to use new drives in your existing server, its the only option.
We had 2 out of 3 hard drives fail in a single night in a 7 year old RAID 5 array. Our backups were grossly out of date. 8 days and $17,000 later, a data recovery firm was able to recover our entire Exchange server, but no one was pleased. (Except me, because I was supposed to be making backups every day. Onto hardware that I requested but no one would buy for me, but that fact was lost on everyone else...)
The one good thing to come from this was that the client immediately approved my 6 month old purchase request for replacement hardware. But, holy crap, it was an extremely stressful 8 days. Do yourself a favor, make a backup now, and start working on a contingency "get up and running on whatever hardware you can find in your office" backup plan now.