On site we have near 3 dozen servers, a mixture of physical and virtual, some backup to tape and others to disk, a few important machines do both.
All in all we have about 40 backup jobs. I record the success or failure, and probable cause, for each job. Now I've been asked to put all this information in to a "report". Which will be viewed by non-IT persons. Lots of graphs
So, I would appreciate opinions on what could consider worthwhile and should be included.
So far I've got:
- Overall Success Rate
- Off-Site Backup Success Rate
- Month Success Rate
- Day Success Rate - just for IT, handy for seeing failing tapes.
- Common Failure Causes
This can include items of worth just for IT.
Successfully completed and tested restores is the most important metric you could provide.
It might be useful to report time taken for each backup. As backup job duration usually increases over time, you want to be aware of any that suddenly take longer or might impinge on the operational timeframe of the applications on the host.