We run a system designed to integrate with Microsoft SQL Server, in a clustered setup. Currently we have both database nodes monitored under Nagios, and everything's fine except the database host check:
check_tcp -H $HOSTADDRESS$ -p 1433
Probably, one can't monitor a failover cluster directly like this. How should I monitor the cluster's database availability instead?
When it comes to SQL Server on a cluster, it must be installed as a virtual node, whether it's a default instance or a named instance. So a TCP check where the hostname is the virtual name with the correct port should work. For instance, if I have a two node cluster with physical nodes:
Don't monitor either of those for SQL Server. When SQL Server is installed, let's say I chose the virtual node name of Virtual1, then it's Virtual1 you'll want to monitor. And if SQL Server is up, then the cluster is up. If you just want to check to see if the cluster is up, look for the Network Name of the cluster group itself.
Why not connect to the same ip/hostname with a mssql-plugin as your application does? To complete monitoring I would add a host for the virtual ip and two host for every cluster node. So if one of your nodes goes down you get a notification but you know that the cluster is still ok, because you didn't get a notification.