I have servers which are still running well but are over 5 years old. They are still doing the job perfectly and there will be no advantage to upgrade the servers, should I just let them run forever or should I schedule maintenance to replace the servers, or parts thereof, with new hardware? I fear that a server failure might cost data loss and more down time than scheduled maintenance. These servers are used for on-line point-of-sale, accounting, CRM and management information.
Preventive maintenance, such as replacing fans and vacuuming out dust, is not possible due to the remote location of the servers.
Also keep in mind the "bathtub curve" of failure rate with time. New hardware is more likely to fail than hardware that has been burned in for a while.
How do you tell a client who is very happy with a long-time trouble-free server that he now has to spend money to replace it because it is too old?
Finally, are there any monitoring tools for hardware problems such as voltage, temperature and fan speed that can be run remotely?
Here's a previous question and answers:
Do you continue to use your end-of-life server/network equipment?
And another one:
How often does your company replace all its servers?
At 5 years, for what sounds like mission-critical functions, I'd start looking at replacement even if they're working fine. But since they are working fine, I'd plan out a slow, careful replacement. Make sure you know how to build up the OS and apps on the replacement box, know how you're going to move the data over, how you'll switch from the old to the new.
As was stated in one of the answers linked above, I'd tell the client honestly why you need to replace the hardware. Increasing cost of maintenance and support contracts, difficulty in getting replacement parts, the application vendor's preference for supporting newer hardware, are possible factors, you'd have to make the case based on what your hardware and software vendors level of support is.
Probably - but with caution and attention to detail.
Things to keep in mind:
Explain it to the vendor in terms he or she is likely to understand. Explain that the servers are designed for a 4-5 year lifespan on average. While some will run longer than that (we've kept a server limping along for 7 years before... not proud of it, but that was in the days before virtualization), as you approach and exceed that age, the server will be more prone to breaking down.
Put it in terms of a car. After a certain point parts of the car break down or wear out, like the breaks, and need replacing. However, unlike a car, you can't just run down to the local repair place and get the server fixed. The vendor ends of life replacement parts, meaning they simply aren't available except from someone who has hoarded them and knows you now must pay a premium for them. And while you're searching for those parts and haggling over the purchase, the server will stay down.
Also, most folks look at replacing their cars as soon as their car loan is paid off. Given that it is easier to repair and maintain the car than it is those servers, especially given their remote location, point out the customer is taking a risk with their line of business that they wouldn't take in their own personal life.
Personally, I'm happy to run old hardware but, only when the risks have been properly considered. As an example, I have one rather old IBM server which is way out of warranty and I can no longer obtain the parts for it. However, the software that runs on it can be transferred to another machine in a matter of minutes. Should the machine fail I can replace it temporarily with a spare PC while I decide the best long term solution. All the steps required to do this are well documented, so even if I'm unavailable the task can be completed by someone else.
If the servers are adequate, then let vendor support be the deciding factor. If the vendor won't support the system, let your clients decide based on their preference.
If you're the vendor, well, then at some point you'll probably need to phase them out.
If you decide to run servers into the ground then sooner or later they will run into the ground. It's best to replace them before that time, in other words when they seem to be still running OK.
5 years is a pretty good innings for a server, and you seem to be coming to a crossover point where you think they may still be OK for another while, but you are having sufficient concerns that would warrant a replacement.
Worst case scenario is that a server collapses in the middle of a working day. From the sound of things I don't think you're going to be in a position to do an emergency migration and restore in a good timeframe if that happens. Your client should weigh up the cost of lost business (including salaries for staff who are sitting around doing nothing) versus the cost of replacement, and I think replacement will come out cheaper.
If the hardware and software are still well supported and understood, it seems silly to change for no reason. Are the servers located in a clean, climate controlled environment? They should keep on ticking for a while yet.
What level of redundancy do these old servers provide? Do they have redundant power supplies and RAID-protected storage with a tested, offline backup? In my experience, PSUs and drives are the parts most likely to be affected by age. As long as you are well protected, you shouldn't be looking at any kind of catastrophic downtime.
Avoiding preventative maintenance because the servers are out of your way seems like a bad plan. If you can get to the servers to replace them, or if there is a catastrophic failure, you should be able to get to them for maintenance and inspection.
Just don't let them get so old that nobody knows how they work or where to get parts for them.
We always decide when to replace the machine based on what it does and how critical its failure would be. Our main bulk of machines is actually moving into a virtualisation cluster to give an easier way to handle failures.
To stop the data loss, run backups. Lots of them. Machines at any age fail and if you are worried about data loss, you aren't doing enough backups.
In practice though, I've at least one critical machine running which is at least 5 years old. I don't know how old as it was running when the company was bought before my time. It isn't due for replacement anytime soon either :(
I'd also include the following - newer hardware is much more powerful than older hardware. You could consolidate several servers into one using virtualization. Virtualization when done correctly can make backups easier, decrease costs significantly, and can make disaster recovery easier. Newer servers also have more support for remote access/monitoring/repair such as Intel Vpro technology which allows you to get access even if the OS hasn't booted yet.
There is also the familiarization/standardization/sanitation/upgrade issues. After 5 years - really - who is familiar with the hardware and the setup? Is the setup and all the things like login scripts, security settings... up to your current standards? Have you run sanitation like removing old user accounts, cleaning up old workarounds? Are the components still available? I have a number of servers at client sites I cringe about since I know that if they fail, the motherboards/psu's and other parts are definitely not available.