How do you handle a large volume of cron mails (from a thousand servers) in a smart way? Main use case: a cron error comes in, but its severity does not warrant immediate action. However, I do not want an email about it every other minute. Obviously I could silence it, but then I will forget about the error.
Sample error: a periodic unattended-upgrades
failed because there was not enough memory available.
Ideally, I would use an Opbeat or Sentry-like service but for cron output. It would allow me to aggregate (on server and command), assign and mute incidents.
But hopefully somebody else has implemented something clever already.
Thanks for your suggestions!
Update: I found a sentry-cron
utility @ https://pypi.python.org/pypi/cron-sentry which seems to do what I want. For Opbeat this is also possible, but nobody has written a wrapper yet.
0 Answers