I have about twenty servers with different webapps. Every 4 hours runs rsnapshot
task and backs up all of them to a backup server.
Accidentally today I discovered that backup failed last 4 days due to input/output failure in file system. fsck
fixed the issue, however 4 backup days are lost.
Is there any way to check if backups are ok?
Right now I use munin
monitoring system, if it does matter, though it check only server health (memory, cpu, hdd, etc) without any software checks.
I can integrate a script that will check a FATAL ERROR
s entries in rsnapshot log, however I'm not sure will it be enough?
May be there is a system for bootstrapping environment from backup to check its integrity. Unfortunately I didn't find enough information about it.
Ensure you are also monitoring your filesystem free space, monitor system logs for critical / severe messages, SMART output for your disks, network and backup services (ssh / rsync).
Regarding verifying your backups, you may want to setup your webapps environment in parallel and recover your backup periodically. Your backups are as good as your recovery.