Short overview: Is Alert more severe than Critical.
RFC 5424 briefly defines syslog severity levels and gives a short description. Each syslog level is given a code 0 - 7. It was my understanding that 0 (Emergency) was most severe and 7 (Debug) was least.
However I'm questioning 1 (Alert) and 2 (Critical). The definitions in RFC 5424 are:
- Alert: action must be taken immediately
- Critical: critical conditions
However on this site they give a longer description (which is obviously personal opinion) but define them as:
- Alert: Should be corrected immediately - notify staff who can fix the problem - example is loss of backup ISP connection
- Critical: Should be corrected immediately, but indicates failure in a primary system - fix CRITICAL problems before ALERT - example is loss of primary ISP connection
This seems backwards as it implies that Critical is more severe than Alert even though the RFC 5424 seems to place Alert as more severe. I was just wondering if there's an official stand on this or any best practices?
Critical indicates that something bad is about to happen. Alert indicates that something bad already happened.
Take a look at Building Scalable Syslog Management Solutions on Cisco.com for a good read about managing syslog.
I think what it means by those examples is that if an Alert status is triggered, then Critical has already happened. In the example, it states that Critical is when the Primary ISP goes down, then Alert happens when the Backup ISP goes down. (So both the Primary and Backup ISP's are down). The Backup ISP going down in itself is probably not an Alert, because the Primary ISP would still be up. (Maybe a Critical). Similarly, the Primary ISP going down is only a Critical and not an alert, because the system would still be functioning albeit on the Backup ISP. (Still important to fix asap.)
I think the authors of syslog inadvertently switched critical and alert. Language-wise, alert is akin to 'be advised; pay attention' ('BOLO' in crime shows is a good analogy), 'critical' is akin to 'handle this problem ASAP', and 'emergency' is akin to 'drop what you are doing and fix this NOW'.
The following hypothetical situation might better illustrate the use of Alert and Critical
The drive 0 problems are only critical because its mirror is OK. Drive 1's heat problem is an alert because the only drive in the RAID is having trouble; its bad sector count is an emergency because the drive the drive has two problems and is the only drive left in the array.
Alas, syslog is too entrenched now to change the order of those two labels.