Since starting working where I am working now, I've been in an endless struggle with my boss and coworkers in regard to updating systems.
I of course totally agree that any update (be it firmware, O.S. or application) should not be applied carelessly as soon as it comes out, but I also firmly believe that there should be at least some reason if the vendor released it; and the most common reason is usually fixing some bug... which maybe you're not experiencing now, but you could be experiencing soon if you don't keep up with .
This is especially true for security fixes; as an examle, had anyone simply applied a patch that had already been available for months, the infamous SQL Slammer
worm would have been harmless.
I'm all for testing and evaluating updates before deployng them; but I strongly disagree with the "if it's not broken then don't touch it" approach to systems management, and it genuinely hurts me when I find production Windows 2003 SP1 or ESX 3.5 Update 2 systems, and the only answer I can get is "it's working, we don't want to break it".
What do you think about this?
What is your policy?
And what is your company policy, if it doesn't match your own?
What about firmware updates (BIOS, storage, etc.)?
What about main O.S. updates (service packs)?
What about minor O.S. updates?
What about application updates?
My main interest is of course in updating servers, as client patch management is usually more straightforward and there are well known tools and best practices to handle it.
Security and agility should be balanced against stability and uptime when determining your patching strategy. Your push-back approach for this should be along the lines of 'Okay, but you need to know that we're now at risk of these servers becoming compromised and having our data stolen, or having the servers be rendered non-functional' and 'Okay, but you need to know that this impacts our vendor support for this system, and future ability to have this system interact with new systems'.
Against the longer-term 'not broke, don't update' mentality, you should make it clear that:
Hope this gives you some leverage and best of luck in convincing the aboves to take things seriously. As always, establish a paper trail that proves you've apprised management of the risks they're taking.
This is an endless debate and reasonable people will disagree. If you're talking about user PCs, I agree they need to be updated. If you're talking about servers, consider a separate policy for servers that face the internet and those that don't. I don't know about your servers but in my environment, maybe 10% of our servers have ports open to the internet. These internet-facing servers get highest priority when it comes to security patches. Servers that don't face the internet are lower priority.
Security gurus will argue that this approach is problematic because if a hacker ever does get into your network, the unpatched servers will allow exploits to fritter through the network like wildfire and that's a reasonable argument. Still, if you keep those internet-facing servers locked down tight and properly configure your firewall to only open those ports that are absolutely needed, I think this approach works and can often be used to appease managers that are afraid of patches.
If you're only relying on Windows Update for patches (you didn't mention which OS you're running but I'm mostly a Windows guy so this is my reference), take a look at the actual hotfixes that get released every month. I have some servers that if I run Windows Update on them I'll be told I need 50+ patches but if I scroll through those patches and research each of them, I'll find that 90% of the items being patches are not security related but fix bugs that affect services I don't run on that box. In larger environments where you use a patch management system, it's common to review everything that gets released and only bother with what is absolutely necessary and that usually amounts to about 10% of what Microsoft releases.
My argument is that this debate about "to patch or not to patch" suggests you have to be one one side or the other when, really, this is a huge grey area.
I can only talk about servers but we have a 'Quarterly update' regime, on four predetermined and announced dates per year we bunch up update requests, apply them to our reference environment, run them for a month to test stability and if good roll out during the following n days/weeks. On top of this we apply an emergency update policy where we have the ability to deploy into reference, test and roll out urgent updates within a day or two if the severity is such - although this has only been used 2/3 times in the last 4 years or so.
This twin-approach ensures that our servers are reasonably, but not stupidly, up to date, that updates are driven by subject-matter-experts (i.e. firmware, drivers, OS, app staff) not vendors but that it also allows rapid fixes if required. Oh course we're lucky to have very few different hardware models across the whole business (<10 server varients) and sizable, and up to date, reference platforms to test against.
My view is that the best course is pretty much smack in the middle of your two extremes. e.g. Why do you so desperately want to upgrade ESX if there's no demonstrable reason for doing so, possibly breaking working systems in the process? Sure it might be vulnerable if it was public facing but there should be no way it can be directly accessed from outside your network, so where's the risk? Are there any bugs or lack of features that are actually presenting you with a reason to upgrade?
Upgrading for the sake of it, which is really what you're proposing ("but you could be experiencing soon"), even while claiming you're not, is an absurd and dangerous road to travel. Unless you can present an actual reason, as opposed to some theoretically possible reason, you're never going to convince others if they're opposed to the upgrade.
If you believe there is a real reason to perform an upgrade you should document both the pros and cons, and there are always cons, and present that to those higher up the tree. Properly documented there should be little resistance. If you can't provide a convincing argument then sit back and give that fact some serious thought.
Edit
I thought I should make it clear that I see a vast difference between applying required security and stability patches as compared to performing software or OS upgrades. The first I implement after proper testing. The latter I do only if there is a real benefit.
I've worked at different firms that had policies all over the continuum from "apply patches ASAP, we don't care if they break something we have working -- we'll back them out then" to "nothing gets applied without two weeks of testing." Both extremes (and points in between) are fine as long as the Company understands the tradeoffs.
That's the important point: there is no right or wrong particular answer to this question; it's a matter of balancing stability vs. safety or features in your particular environment. If your management chain understands that delaying patches for testing may make them more vulnerable to malware, thats fine. Likewise, if they understand that applying patches as soon as they are available may not work or even break your particular system configuration, that is also fine. Problems exist when these tradeoffs are not understood.
Security updates get sent to a staging server, then production after they've shown that they don't blow things up. Unless there's a real bleeping emergency (which I have hit a few times :( ), in which case PRODUCTION NOW. Other updates only as required, after spending time in staging.
I think the first thing to do is to "classify" updates by severity, and have patch schedule based on the classification. There is no doubt zero-day security fixes must be applied right away; whereas Service Pack can wait after careful evaluations.