Version 10 (modified by jonmills, 8 years ago)

--

Overview

The challenge in monitoring an environment like a Eucalyptus cluster is that it is always changing. Virtual machines are created and destroyed all the time. When virtual machines are running, we want to monitor them. When they no longer exist, we want to stop monitoring them. And most of all, we don't want to constantly alter the configuration of our monitoring system by hand to add and remove these hosts and their affiliated checks. This is where OMD shines, because we can combine the utility of Nagios eventhandlers with the ability of Check_MK to (re-)inventory hosts, rebuild Nagios object configuration, and reload Nagios. The result is a dynamic system that always knows what to monitor, and what not to monitor.

'Check_MK inventory' Eventhandler: Used to add new services discovered by Check_MK

  • The first step is to set up an eventhandler that can respond to a situation in which the service check "Check_MK inventory" discovers a new service.
    • ( $USER4$ is a Nagios custom macro defined in $OMD_ROOT/etc/nagios/resources.cfg -- it corresponds to the value of $OMD_ROOT itself )
    • Nagios has lots of built-in Macros you can use inside your Nagios configuration.
  • Check out our example config file from code.renci.org SVN:

cmk_reinventory Eventhandler script

  • A script that re-writes Check_MK's configuration files, then reloads Check_MK, which in turn re-compiles Nagios configuration, and reloads the Nagios daemon.
  • SVN source:

Removing VMs: Host Check Eventhandler

  • In Nagios, a Host Check is always a ping check, and the responses are UP or DOWN depending on whether the host could be reached.
  • We want to define an eventhandler that is triggered by the DOWN state of a host, but only for hosts with the Check_MK tag 'vm'
  • If the host has a 'vm' tag, and is in a DOWN state, and is no longer listed as 'running' or 'pending' by euca-describe-instances, then we want to remove it from Check_MK's hosts.mk & ipaddresses.mk files, and reload Check_MK & Nagios
extra_nagios_conf += r"""
define command {
    command_name    del_vm
    command_line    $USER4$/local/bin/del_vm.sh $HOSTNAME$ $HOSTSTATE$
}
"""
extra_host_conf["event_handler"] = [
	( "del_vm", [ "vm" ], ALL_HOSTS ),
]	
extra_host_conf["event_handler_enabled"] = [
	( "1", [ "vm" ], ALL_HOSTS ),
]

'del_vm' eventhandler script