While setting up some hosts in Check_MK for SNMP-only monitoring I've found some hosts where snmpbulkwalk
appears to 'hang' and then timeout while processing a certain OID.
eg:
OMD[prod]:~$ snmpbulkwalk -v 2c -c public compute01.domain.com .1.3.6.1.4.1.2021
UCD-SNMP-MIB::memIndex.0 = INTEGER: 0
UCD-SNMP-MIB::memErrorName.0 = STRING: swap
UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 88109052 kB
UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 88109052 kB
UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 131860964 kB
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 94429952 kB
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 182539004 kB
UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 kB
UCD-SNMP-MIB::memShared.0 = INTEGER: 0 kB
UCD-SNMP-MIB::memBuffer.0 = INTEGER: 188772 kB
UCD-SNMP-MIB::memCached.0 = INTEGER: 6685180 kB
UCD-SNMP-MIB::memSwapError.0 = INTEGER: noError(0)
UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:
UCD-SNMP-MIB::laIndex.1 = INTEGER: 1
UCD-SNMP-MIB::laIndex.2 = INTEGER: 2
UCD-SNMP-MIB::laIndex.3 = INTEGER: 3
UCD-SNMP-MIB::laNames.1 = STRING: Load-1
UCD-SNMP-MIB::laNames.2 = STRING: Load-5
UCD-SNMP-MIB::laNames.3 = STRING: Load-15
UCD-SNMP-MIB::laLoad.1 = STRING: 3.91
Timeout: No Response from compute01.domain.com
snmpwalk
, on the other hand, works just fine:
OMD[prod]:~$ snmpwalk -v 2c -c public compute01.domain.com .1.3.6.1.4.1.2021
UCD-SNMP-MIB::memIndex.0 = INTEGER: 0
UCD-SNMP-MIB::memErrorName.0 = STRING: swap
UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 88109052 kB
UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 88109052 kB
UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 131860964 kB
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 94424732 kB
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 182533784 kB
UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 kB
UCD-SNMP-MIB::memShared.0 = INTEGER: 0 kB
UCD-SNMP-MIB::memBuffer.0 = INTEGER: 188772 kB
UCD-SNMP-MIB::memCached.0 = INTEGER: 6685188 kB
UCD-SNMP-MIB::memSwapError.0 = INTEGER: noError(0)
UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:
UCD-SNMP-MIB::laIndex.1 = INTEGER: 1
UCD-SNMP-MIB::laIndex.2 = INTEGER: 2
UCD-SNMP-MIB::laIndex.3 = INTEGER: 3
UCD-SNMP-MIB::laNames.1 = STRING: Load-1
UCD-SNMP-MIB::laNames.2 = STRING: Load-5
UCD-SNMP-MIB::laNames.3 = STRING: Load-15
UCD-SNMP-MIB::laLoad.1 = STRING: 3.97
UCD-SNMP-MIB::laLoad.2 = STRING: 4.51
UCD-SNMP-MIB::laLoad.3 = STRING: 4.35
UCD-SNMP-MIB::laConfig.1 = STRING: 12.00
UCD-SNMP-MIB::laConfig.2 = STRING: 12.00
UCD-SNMP-MIB::laConfig.3 = STRING: 12.00
UCD-SNMP-MIB::laLoadInt.1 = INTEGER: 397
UCD-SNMP-MIB::laLoadInt.2 = INTEGER: 451
UCD-SNMP-MIB::laLoadInt.3 = INTEGER: 434
UCD-SNMP-MIB::laLoadFloat.1 = Opaque: Float: 3.970000
UCD-SNMP-MIB::laLoadFloat.2 = Opaque: Float: 4.510000
UCD-SNMP-MIB::laLoadFloat.3 = Opaque: Float: 4.350000
...
This is happening across 3 different servers with identical configurations, and I can't find anything in the logs or snmpd config that would seem to indicate any issue.
Any ideas what the issue might be, or what more I can look at?
Snmpbulkwalk initiates internal server repetitions to walk through mib tree. Server does not respond untill it retrives "max-repetitions" number of variables or end of mib tree is reached. Retrieving some variables may demand valuable time.
Important note: snmpwalk walks trough a requested subtree exactly but snmpbulkwalk may retreive additional variables (after end of subtree is reached) due to behaviour described above. Thus it can stumble on these additional variables which could never be touched with snmpwalk.
Try to decrease the option corresponding to "max-repetitions" and/or encrease timeout option for snmpbulkwalk.
I am experiencing the same problem with
snmpbulkwalk
when querying columns of tables for some network equipment. Reducing themax-repetitions
helps.