I am running KVM with some Ubuntu VM's as guest machines. The guest machines contain an application that does not need to be run most of the time, but once every few months, there are unexpected, random triggers that require it to be run immediately (<5 second delay) for just a few hours.
If I keep the VM always running, I waste a lot of CPU resources, because the VM is mostly inactive 99.99% of the year.
If I hibernate the VM state into disk, starting the application would require booting the VM up, which takes too long on my machine (minutes).
I'd like to pause/suspend the VMs into memory, because resuming the VM seems instantaneous. And while the VM is inactive, I can re-use the CPU resources elsewhere (although I understand that I cannot re-use the memory).
Is it recommended to pause guest VMs for long periods of times (months or years)? Will it be reliable to resume? What are best practices to make sure it will resume normally when I need it months later?
I was thinking of buying ECC ram for the host machine to protect against random bit flips. But is there anything else I should be doing?
No, leave the VM running.
While paused, you cannot maintain the application or the OS instance. At minimum, security updates every couple months.
Already running will be faster than resume. Better than 5 seconds does not leave a lot of time for delay.
Speaking of time, the time is probably wrong in the guest. Not obvious how to address this for the resume case, see How to keep time on resumed KVM guest with libvirt?
Resume does not save you resources. Storage and RAM is already spent. CPU you can overcommit a little. In other words, assume the idle CPU of this guest - and it idles most of the time - is available other guests on the host.
Consider peak use in your capacity planning: what happens when it runs on top of typical workload? Buy CPU for your compute hosts when necessary. Sometimes that is the price for maintaining a fast response time.