Graphios

Oct 15, 2014

New graphios 2.0!

What's new?

Support for multiple backends (graphite, statsd, librato) (and multiples of each backend if you want)
Support for using your service descriptions instead of custom variables
Install options (pip, setup.py, rpms)
Bugfixes
- mulitple perfdata in 1 line sometimes did weird things
- quotes in your labels/metrics were sometimes in carbon
- labels with multiple '::' could mess up

Introduction

Graphios is a script to emit nagios perfdata to various upstream metrics processing and time-series (graphing) systems. It's currently compatible with [graphite], [statsd], [Librato] and [InfluxDB], with possibly [Heka], and [RRDTool] support coming soon. Graphios can emit Nagios metrics to any number of supported upstream metrics systems simultaenously.

Requirements

A working nagios / icinga / naemon server
A functional carbon or statsd daemon, and/or Librato credentials
Python 2.6 or later (but not python 3.x) (Is anyone still using 2.4? Likely very little work to make this work under 2.4 again if so. Let me know)

License

Graphios is released under the GPL v2.

Documentation

The goal of graphios is to get nagios perf data into a graphing system like graphite (carbon). Systems like these typically use a dot-delimited metric name to store each metric hierarcicly, so it can be easily located later.

Graphios creates these metric names one of two ways.

by reading a pair of custom variables that you configure for services and hosts called _graphiteprefix and _graphitepostfix. Together, these custom variables enable you to control the metric name that gets sent to whatever back-end metrics system you're using. You don't have to set them both, but things will certainly be less confusing for you if you set at least one or the other.
by using your service description in the format:

_graphiteprefix.hostname.service-description._graphitepostfix.perfdata

so if you didn't feel like setting your graphiteprefix and postfix, it would just use:

hostname.service-description.perfdata

If you are using option 2, that means EVERY service will be sent to graphite. You will also want to make sure your service descriptions are consistant or your backend naming will be really weird.

I think most people will use the first option, so let's work with that for a bit. What gets sent to graphite is this:

graphiteprefix.hostname.graphitepostfix.perfdata

The specific content of the perfdata section depends on each particular Nagios plugin's output.

Simple Example

A simple example is the check_host_alive command (which calls the check_icmp plugin by default). The check_icmp plugin returns the following perfstring:

rta=4.029ms;10.000;30.000;0; pl=0%;5;10;; rtmax=4.996ms;;;; rtmin=3.066ms;;;;

If we configured a host with a custom graphiteprefix variable like this:

<pre> define host { host_name myhost check_command check_host_alive _graphiteprefix ops.nagios01.pingto } </pre>

Graphios will construct and emit the following metric name to the upstream metric system:

ops.nagios01.pingto.myhost.rta 4.029 nagios_timet
ops.nagios01.pingto.myhost.pl 0 nagios_timet
ops.nagios01.pingto.myhost.rtmax 4.996 nagios_timet
ops.nagios01.pingto.myhost.rtmin 3.066 nagios_timet

Where nagios_timet is the a unix epoch time stamp from when the plugin results were received by Nagios core. Your prefix is of course, entirely up to you. In our example, our prefix refers to the Team that created the metric (Ops), becuause our upstream metrics system is used by many different teams. Afer the team name, we've identified the specific Nagios host that took this measurement, because we actually have several Nagios boxes, and finally, 'pingto' is the name of this specific metric: the ping time from nagios01 to myhost.

Another example

Lets take a look at the check_load plugin, which returns the following perfdata:

load1=8.41;20;22;; load5=6.06;18;20;; load15=5.58;16;18

Our service is defined like this:

<pre> define service { service_description Load host_name myhost _graphiteprefix datacenter01.webservers _graphitepostfix nrdp.load } </pre>

With this confiuration, graphios generates the following metric names:

datacenter01.webservers.myhost.nrdp.load.load1 8.41 nagios_timet
datacenter01.webservers.myhost.nrdp.load.load5 6.06 nagios_timet
datacenter01.webservers.myhost.nrdp.load.load15 5.58 nagios_timet

As you can probably guess, our custom prefix in this example identifies the specific data center, and server-type from which these metrics originated, while our postfix refers to the check_nrdp plugin, which is the means by which we collected the data, followed finally by the metric-type.

You should think carefully about how you name your metrics, because later on, these names will enable you to easily combine metrics (like load1) across various sources (like all webservers).

Using metric_base_path to add a universal prefix

In an environment where multiple things are feeding metrics into your backend service, it can be handy to differentiate by source. Normally, you would need to prepend the graphiteprefix to all services and hosts, but in some cases, this isn't possible or feasible.

When you want everything to be prepended with the same string, use the metric_base_path setting:

metric_base_path	= mycorp.nagios

Note that quotes will be preserved. Also, _graphiteprefix and _graphitepostfix will be applied in addition to this string, so if you are already adding mycorp.nagios to your prefix, you will end up with mycorp.nagios.mycorp.nagios.metricname

A few words on Naming things for Librato

The default configuration that works for Graphite also does what you'd expect for Librato, so if you're just getting started, and you want to check out Librato, don't worry about it, ignore this section and forge ahead.

But you're a power user, you should be aware that the Librato Backend is actually generating a differet metric name than the other plugins. Librato is a very metrics-centric platform. Metrics are the first-class entity, and sources (like hosts), are actually a separate dimension in their system. This is very cool when you're monitoring ephemeral things that aren't hosts, like threads, or worker processes, but it slightly complicates things here.

So, for example, where the Graphite plugin generates a name like this (from the example above):

datacenter01.webservers.myhost.nrdp.load.load1

The Librato plugin will generate a name that omits the hostname:

datacenter01.webservers.nrdp.load.load1

And then it will automatically send the hostname as the source dimension when it emits the metric to Librato. For 99% of everyone, this is exactly what you want. But if you're a 1%'er you can influence this behavior by modifying the "namevals" and "sourcevals" lists in the librato section of the graphios.cfg

Automatic names

Version 2.0: Graphios now supports automatic names, because custom variables are hard. :)

This is an all or nothing setting, meaning if you turn this on all services will now send to graphios (instead of just the ones with the prefix and postfix setup). This will work fine, so long as you have very consistent service descriptions.

To turn this on, modify the graphios.cfg and change:

use_service_desc = False

to use_service_desc = True

You can still use the graphite prefix and postfix variables but you don't have to.

Big Fat Warning

Graphios assumes your checks are using the same unit of measurement. Most plugins support this, some do not. check_icmp) always reports in ms for example.

Installation

This is recommended for intermediate+ Nagios administrators. If you are just learning Nagios this might be a difficult pill to swallow depending on your experience level.

Hundreds of people have emailed me their success stories on getting graphios working. I have been using this in production on a medium size nagios installation for a couple years.

There are now a few ways to get graphios installed.

1 - Use pypi

    pip install graphios

NOTE: This will attempt to find your nagios.cfg and add the configuration
steps 1 and 2 for you (Don't worry we back up the file before touching it)

NOTE2: If you get the error:
Could not find a version that satisfies the requirement graphios
This is a because graphios is still in the beta category. I will remove
this in a few weeks, so until then you need to:

    pip install --pre graphios

2 - Clone it yourself

    git clone https://github.com/shawn-sterling/graphios.git
    cd graphios

Then do one of the following three things (depending what you like best):

1 - Python setup

    python setup.py install

2 - Create + Install RPM

    python setup.py bdist_rpm
    yum localinstall bdist/graphios-$version.rpm

3 - Copy the files where you want them to be

    cp graphios*.py /my/dir
    cp graphios.cfg /my/dir

Configuration

Setting this up on the nagios front is very much like pnp4nagios with npcd. (You do not need to have any pnp4nagios experience at all). If you are already running pnp4nagios , check out my pnp4nagios notes (below).

Steps:

(1) graphios.cfg

The default location for graphios.cfg is in /etc/graphios/graphios.cfg, it also checks the same directory as the graphios.py is.

Your graphios.cfg can live anywhere you want, but if it's not in the above locations you will need to modify your init script

Graphios

Install / Use

README

Graphios

Introduction

Requirements

License

Documentation

Simple Example

Another example

Using metric_base_path to add a universal prefix

A few words on Naming things for Librato

Automatic names

Big Fat Warning

Installation

Configuration

(1) graphios.cfg