Nagios Configuration

Once you have the WebReboot® Nagios® Plugin installed and correctly configured, you can begin configuring your Nagios installation to use the plugin. The following subsections indicate how.

  1. General Configuration
  2. Command Definitions
  3. Host Checks
    1. Check Host On
  4. Host Event Handlers
    1. Power-on Host
  5. Service Checks
    1. Check Host Temperature
    2. Check Host Temperature in Fahrenheit
  6. Service Event Handlers
    1. Reboot Host
    2. Power-off host

General Configuration

The WebReboot Nagios Plugin relies on the host_name value in order to determine which port on the WebReboot a command should be applied to. As such, your host_name values must be unique and must match the name programmed into the WebReboot. If duplicate names are found, the command will fail because it will not be able to tell which WebReboot port it should work with. Likewise, if there is no matching hostname on the WebReboot, the command will not proceed.

Command Definitions

The installer placed a command definition file in $NAGIOS_CONFIG_DIR/webreboot.cfg. If your Nagios installation is not configured to load files out of a configuration directory, you will have to modify your configuration to load its contents.

All commands defined in the webreboot.cfg file have a command_line value pointing to $NAGIOS_PLUGIN_DIR/nagios.py. if you changed the value of NAGIOS_PLUGIN_DIR from its default, you must manually modify each command_line value, using the appropriate $NAGIOS_PLUGIN_DIR path.

Host Checks

check_host_on

Out of the box, the only way Nagios can check the status of a host is by performing some sort of network-based activity -- normally a ping. The problem with this approach is that you never know if the server is on and simply not responding to network requests or if the server is truly off.

The WebReboot Enterprise can monitor your host's actual power state. As such, the WebReboot Nagios Plugin can check whether or your server is powered on or off and trigger an event handler if it is off.

The following is an example of how to use the plugin to check the power state of a host named "nagios-testbed". In this example, if the host is down, the power_on_host event handler will be executed to power the host back up. You can use any event handler that you would like, however; it need not be an event handler from the WebReboot Nagios Plugin.

# Sample host check configuration. # (WebReboot Enterprise only) define host { host_name nagios-testbed alias Nagios testbed address 192.168.50.243 use generic-host check_command check_host_on event_handler power_on_host } Screenshots:

Host Event Handlers

power_on_host

Just as Nagios has no means to check the power state of a server out of the box, it has no means to power on a server. The WebReboot can, however, and using the power_on_host event handler, you can configure Nagios to power on a host whenever it is determined to be off.

The example for check_host_on shows how this interaction may work. In this case, a check command indicates that the host is down and then triggers the event handler to turn the host back on. Note that any host check, not just those from the WebReboot Nagios Plugin, can be used to trigger the event handler. This is important for WebReboot 3.0 users, since the WebReboot 3.0 cannot check a host's power state. In this case, another method can be used to check a host's state, as in the following example:

# Sample host check configuration for WebReboot 3.0 users. define host { host_name nagios-testbed alias Nagios testbed address 192.168.50.243 use generic-host check_command check-host-alive event_handler power_on_host }

Service Checks

check_host_temperature

Many servers do not have an easy means of indicating their temperatures. Those that do often only report the CPU temperature, without regard to the ambient temperature of the server. The WebReboot Enterprise can monitor temperature via a WebReboot Advanced Server Card. This card can report the ambient temperature of a server, which is a better indicator of overall server heating issues. For example, if an air conditioning units stops working, the server's ambient temperature will raise much more quickly than its CPU temperature will.

The following example shows how this service check may be used. In this case, Nagios will put the host in a WARNING state if the temperature exceeds 30.0C and will put the host in a CRITICAL state if the temperature exceeds 35.0C. The power_off_host event handler from the WebReboot Nagios Plugin is used to turn the power off once the temperature reaches a critical state. This is a sample action taken to prevent physical damage to the hardware. Just as with any of the other plugins, you can use any event handler that you would like. Likewise, you may choose to use no event handler at all and simply allow the plugin to update the Nagios Web display with temperature values.

# Check the ambient temperature of the host. # (WebReboot Enterprise only) define service { host_name nagios-testbed service_description Temperature check_command check_host_temperature!30.0!35.0 use generic-service event_handler power_off_host }

The plugin's arguments, which are delineated by "!", are as follow:

ARG1: Threshold temperature in Celsius at which the host should be placed into a WARNING state.
ARG2: Threshold temperature in Celsius at which the host should be placed into a CRITICAL state.

check_host_temperature_in_fahrenheit

This plugin is identical to check_host_temperature, with the notable exception that it deals with Fahrenheit values.

# Check the ambient temperature of the host in Fahrenheit. # (WebReboot Enterprise only) define service { host_name nagios-testbed service_description Temperature check_command check_host_temperature_in_fahrenheit!80.0!90.0 use generic-service event_handler power_off_host }

The plugin's arguments, which are delineated by "!", are as follow:

ARG1: Threshold temperature in Fahrenheit at which the host should be placed into a WARNING state.
ARG2: Threshold temperature in Fahrenheit at which the host should be placed into a CRITICAL state.
Screenshots:

Service Event Handlers

reboot_host

Oftentimes, the best corrective action that can be taken is to reboot a host. This can typically be performed via a service or operating system mechanism. If, however, the remote access service is unavailable or the operating system has crashed (e.g., kernel panic or "blue screen of death"), then the only way to reboot the server is via a hardware reset. The WebReboot can provide this hardware reset that would otherwise be unavailable to Nagios.

The following example monitors a host's SSH server. If the server becomes unavailable, there is no means to access this host remotely. As such, we will try to issue a hard reboot to see if that will fix the problem.

# Check that SSH hosts are up. # (WebReboot Enterprise and WebReboot 3.0) define service { hostgroup_name ssh-servers service_description SSH check_command check_ssh use generic-service event_handler reboot_host }

power_off_host

On occassion, it is necessary to turn a host off. For example, the temperature may be too hot and the host must be turned off to prevent damage to the hardware. In these cases, the host can be shutdown properly at a software level. There are other cases, however, that require a hardware shutdown. For example, if a host is compromised and a rootkit is installed that prevents remote logins, it may be necessary to shut that machine off. This is a feature that Nagios cannot provide out of the box, but when used in conjunction with the WebReboot, can.

The following example monitors a host's load and shuts down the host if it exceeds a threshold. An unusually high load for a host can mean that it has been compromised. As a precautionary measure, the host may shut down to prevent the rootkit from causing more damage and allows a system administrator to take it offline for forensic investigation.

# Define a service to check the load on the local machine. # (WebReboot Enterprise and WebReboot 3.0) define service { use generic-service host_name localhost service_description Current Load check_command check_load!5.0!4.0!3.0!10.0!6.0!4.0 event_handler power_off_host }