Version 8 (modified by jonmills, 8 years ago)

--

Overview

The challenge in monitoring an environment like a Eucalyptus cluster is that it is always changing. Virtual machines are created and destroyed all the time. When virtual machines are running, we want to monitor them. When they no longer exist, we want to stop monitoring them. And most of all, we don't want to constantly alter the configuration of our monitoring system by hand to add and remove these hosts and their affiliated checks. This is where OMD shines, because we can combine the utility of Nagios eventhandlers with the ability of Check_MK to (re-)inventory hosts, rebuild Nagios object configuration, and reload Nagios. The result is a dynamic system that always knows what to monitor, and what not to monitor.

'Check_MK inventory' Eventhandler: Used to add new services discovered by Check_MK

  • The first step is to set up an eventhandler that can respond to a situation in which the service check "Check_MK inventory" discovers a new service.
    • ( $USER4$ is a Nagios custom macro defined in $OMD_ROOT/etc/nagios/resources.cfg -- it corresponds to the value of $OMD_ROOT itself )
    • Nagios has lots of built-in Macros you can use inside your Nagios configuration.
  • Check out our example config file from code.renci.org SVN:

cmk_reinventory Eventhandler script

  • A script that re-writes Check_MK's configuration files, then reloads Check_MK, which in turn re-compiles Nagios configuration, and reloads the Nagios daemon.
  • SVN source:

Removing VMs: Host Check Eventhandler

  • In Nagios, a Host Check is always a ping check, and the responses are UP or DOWN depending on whether the host could be reached.
  • We want to define an eventhandler that is triggered by the DOWN state of a host, but only for hosts with the Check_MK tag 'vm'
  • If the host has a 'vm' tag, and is in a DOWN state, and is no longer listed as 'running' or 'pending' by euca-describe-instances, then we want to remove it from Check_MK's hosts.mk & ipaddresses.mk files, and reload Check_MK & Nagios
extra_nagios_conf += r"""
define command {
    command_name    del_vm
    command_line    $USER4$/local/bin/del_vm.sh $HOSTNAME$ $HOSTSTATE$
}
"""
extra_host_conf["event_handler"] = [
	( "del_vm", [ "vm" ], ALL_HOSTS ),
]	
extra_host_conf["event_handler_enabled"] = [
	( "1", [ "vm" ], ALL_HOSTS ),
]

'del_vm' eventhandler script

#!/bin/bash
#
# Event handler script for re-inventorying a host when the
# "Check_MK Inventory" check comes back telling you that there
# are unchecked services on a host.

export PATH="/omd/sites/nagios/lib/perl5/bin:/omd/sites/nagios/local/bin:/omd/sites/nagios/bin:/omd/sites/nagios/local/lib/perl5/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/local/bin:/opt/local/sbin"

# These are bash args brought in from the command line
LOG=/tmp/del_vm.sh
HOSTNAME=$1
HOSTSTATE=$2

case "$HOSTSTATE" in

UP)
	# Do nothing on ok
	;;

DOWN)

	# We need to verify the $INSTANCE is gone, using euca-describe-instances
       	RESULT=$(euca-describe-instances | egrep '(running|pending)' | grep ${HOSTNAME} >/dev/null; echo $?)
       	if [ $RESULT = 1 ]; then

	#Logging...
	touch $LOG
	echo $0 > $LOG
	echo `date` >> $LOG
	echo "HOSTNAME is $HOSTNAME" >> $LOG
	echo "HOSTSTATE is $HOSTSTATE" >> $LOG
	echo " " >> $LOG

	# Clean up cmk
	echo "Running cmk --flush $HOSTNAME" >> $LOG
	${OMD_ROOT}/bin/cmk --flush $HOSTNAME >> $LOG

	# Remove the VM from Check_MK
	echo "Removing $HOSTNAME from hosts.mk" >> $LOG
	/bin/sed -i '/'$HOSTNAME'/ d' ${OMD_ROOT}/etc/check_mk/conf.d/hosts.mk >> $LOG
	echo "Removing $HOSTNAME from ipaddresses.mk" >> $LOG
	/bin/sed -i '/'$HOSTNAME'/ d' ${OMD_ROOT}/etc/check_mk/conf.d/ipaddresses.mk >> $LOG

	# Now re-inventory && reload
	echo "Running cmk -IIu" >> $LOG
	${OMD_ROOT}/bin/cmk -IIu >> $LOG
	echo "Running cmk -O" >> $LOG
	${OMD_ROOT}/bin/cmk -O >> $LOG

	fi
        ;;

esac

exit 0