Naemon - Graphing Performance Info With MRTG

Introduction

The naemonstats utility allows you to graph various Naemon performance statistics over time using MRTG. This is important because it can help you:

Ensure Naemon is operating efficiently
Locate problem areas in the monitoring process
Observe the performance impacts of changes in your Naemon configuration

Since MRTG is not exactly state of the art anymore, you could also use PNP or Graphite.

Sample MRTG Configuration

Sample MRTG configuration file snippets for graphing various Naemon performance statistics can be found in the mrtg.cfg file located in the sample-config/ subdirectory of the Naemon distribution. You can create graphs of other performance information if you’d like - the samples just provide you with a good starting point.

Once you copy these sample entries into your MRTG config file (/etc/mrtg/mrtg.cfg) you should have some new graphs the next time MRTG runs.

Example Graphs

I’ll describe what a few of the sample MRTG graphs mean and what they can be used for…

Active Host Checks - This graph shows how may active host checks (regularly scheduled and on-demand) have occurred over time. Useful for understanding: Host checks Predictive host dependency checks Cached checks
Active Service Checks - This graph shows how may active service checks (regularly scheduled and on-demand) have occurred over time. Useful for understanding: Service checks Predictive service dependency checks Cached checks
Cached Host and Service Checks - This graph shows how may cached host and service checks have occurred over time. Useful for understanding: Cached checks Predictive host and service dependency checks
Passive Host and Service Checks - This graph shows how may passive host and service checks have occurred over time. Useful for understanding: Passive checks
Hosts/Services Actively Checked - This graph shows how many (of the total number of) hosts and services were last checked actively over time. Useful for understanding: Active checks
Hosts/Services Passively Checked - This graph shows how many (of the total number of) hosts and services were last checked passively over time. Useful for understanding: Passive checks
Average Service Check Latency and Execution Time - This graph shows average service check latency and execution times over time. Useful for understanding: Service checks Performance tuning Consistently high latencies can be an indication that one of more of the following variables need tweaking: max_concurrent_checks check_result_reaper_frequency max_check_result_reaper_time
Average Service State Change - This graph shows the average percent state change (a measure of volatility) for services over time, broken down by services that were last checked either actively or passively. Useful for understanding: Flap detection
Average Host Check Latency and Execution Time - This graph shows average host check latency and execution times over time. Useful for understanding: Host checks Performance tuning Consistently high latencies can be an indication that one of more of the following variables need tweaking: max_concurrent_checks check_result_reaper_frequency max_check_result_reaper_time
Average Host State Change - This graph shows the average percent state change (a measure of volatility) for hosts over time, broken down by hosts that were last checked either actively or passively. Useful for understanding: Flap detection
External Commands - This graph shows how may external commands have been processed by the Naemon daemon over time. Unless you're processing a large number of external commands (as in the case with distributed monitoring setups), this graph may appear mostly empty. Monitoring external commands can be useful for understanding the impacts of: Passive checks Distributed monitoring Redundant/failover monitoring