Sensu Presentation from CentOS Dojo Phoenix

I was invited to speak at CentOS Dojo in Phoenix, AZ recently (May 2013) about the Sensu monitoring framework. I wanted to do something a little bit different than past presentations and try to show some use cases that fit what Sensu can do rather than just do a basic introduction to Sensu.

Check out the presentation below. The first half is an overview of Sensu (most of the audience at CentOS Dojo had not heard of Sensu yet) and the second half introduces (whet’s the appetite?!) some cool uses of Sensu such as automated cleanup and decommissioning of EC2 nodes, routing checks to different teams in Pagerduty, embedding “playbook” documentation in checks to help speed up MTTR.

Many thanks to @jeremy_carroll for his feedback and assistance.

https://speakerdeck.com/joemiller/practical-examples-with-sensu-monitoring-framework

Leave the first comment

Speeding up Vagrant with parallel provisioning

Vagrant is an amazing tool. It's quite substantially changed my workflows in a variety of areas. It's a particularly interesting tool for building packages or running tests across multiple OS's or distributions from a single set of scripts.

A recent example of the usefulness of Vagrant is the new packaging and testing work undertaken in the Sensu project. The project set out to build a new set of native OS packages with a goal of making Sensu easy to deploy on a variety of platforms and without a lot of the friction that sometimes accompanies Ruby apps. As part of the packaging effort we needed a simple mechanism to build native packages on the relevant platforms, ie: .deb's on debian and .rpm on redhat/centos.

We ended up using a combination of Vagrant and some homegrown tools such as Bunchr.

You can see the work in these 2 repos:

Both codebases contain a para-vagrant.sh script that is used in place of the normal vagrant up to kick off parallel provisioning tasks. sensu-tests is the more interesting example as it runs a set of rspec tests against Sensu across 14 VM's and this will likely grow to encompass other OS's in the future. The tests are executed as Vagrant provisioners (a combo of Chef and shell to call rspec).

The simplest way to use multi-VM's with Vagrant is the typical vagrant up. However, this will boot and run the provisioning tasks sequentially on each VM. With 14 VM's to test, this process can take a long time.

Can we speed this up? Yes. In fact we were able to reduce the time taken to run the sensu-build tasks from about 33 minutes to 12 minutes, and reduced sensu-tasks from almost 90 minutes to 15!

Here was the first attempt at a parallelization script:

Continue Reading

Sensu handler sets

In past articles we have covered some of basics of Sensu handlers. A nice feature we haven't touched on yet is handler "sets". Handler sets were added around v0.9.2 and can be quite useful for saving time when modifying your handler.

For example, consider you have a standard set of handlers that you assign to most of your checks — pagerduty, irc, campfire. Now, suppose you want to add the GELF (graylog2) handler to all of your monitors as well. If each of your checks is defined as such:

{
  "checks": {
    "all_disk_check": {
      "notification": "Diskspace Too Low",
      "command": "PATH=$PATH:/usr/lib64/nagios/plugins:/usr/lib/nagios/plugins  check_disk -w 25% -c 15% /",
      "subscribers": [ "all" ],
      "interval": 60,
      "handlers": ["pagerduty", "irc", "campfire"]
    }
  }
}

… you would need to modify every check's "handlers" attribute to include your new "gelf" handler. If you have a lot of checks this can be a little bit of a burden.

Continue Reading

AMQPcat, a netcat-like tool for messaging fun

If you have read @ripienaar's excellent series of articles on common messaging patterns you probably noticed a handy CLI tool for working with STOMP queues called stompcat. I looked around for something similar for AMQP brokers but couldn't find anything quite the same. There is amqp-utils but I had some issues with these and the tools didn't work quite like I was hoping. So I wrote amqpcat with the idea of providing a similar tool to stompcat.

Available on github and rubygems.org: https://github.com/joemiller/amqpcat

Continue Reading

Sensu and Graphite

Updated October 16, 2012: Removed “passive”:”true” from the graphite amqp handler definition in Sensu. This is too brittle. Sensu will fail to start unless graphite has started first and created the exchange on the RabbitMQ server. By matching the “durable”:”true” setting that graphite expects, then we can start either service in any order.

Updated October 16, 2012: Updated graphite amqp handler definition to use the new “mutator”‘s in Sensu 0.9.7. See the for details on backwards-incompatible changes as Sensu moves towards a 1.0.0 release.

It's been pretty exciting to see the number of folks getting involved with Sensu lately, as judging by the increased activity on the #sensu channel on Freenode. One of the most common questions is how to integrate Sensu and Graphite. In this article I'll cover two approaches for pushing metrics from Sensu to Graphite.

Remember: think of Sensu as the "monitoring router". While we are going to show how to push metrics to Graphite, it is just as easy to push metrics to any other system – Librato, Cube, OpenTSDB, etc. In fact, it would not be difficult at all to push metrics to multiple graphing backends in a fanout manner.

Continue Reading

Re-use Nagios plugins in Sensu for quick profit

In my previous article I mentioned a key strength of Sensu is the ability to re-use existing Nagios plugins. This is a powerful feature of Sensu. Nagios has been around for at least 1000 years according to most recent archaeological discoveries, which means a vast amount of human effort (and capital) has gone into creating Nagios plugins. Being able to leverage this prior effort is a huge win. In this article I’ll demonstrate creating a Sensu check with the check_http Nagios plugin.

Continue Reading

Getting started with the Sensu monitoring framework

(5/15/2012) NOTE: This guide has been superseded by the official ‘Install Guide‘ doc on the Sensu wiki. The new process utilizes the simpler Omnibus-style Sensu packages and covers installation on Debian/Ubuntu platforms as well. Please use this guide instead of the instructions below.

I’m excited about Sensu, a new open source monitoring framework, and I’d like to help others get started with it as well. So, after observing the frequent questions from new visitors to #sensu on Freenode I thought perhaps the best way to do that is to write a blog article to help folks get started. If you still have questions after reading this, feel free to come by #sensu on Freenode.

In this article I will provide a brief overview of Sensu with some background, walk through a client and server install, and then I will show you how to add a check and a handler. This should lay the groundwork for future articles with more examples on how to get the most value out of Sensu in your infrastructure.

Before we start, I owe a huge thanks to @jeremy_carroll for the many hours of work he put into building RPM’s for Sensu. His work on packaging will undoubtedly save many folks quite a bit of time.

What is Sensu?

Sensu is the creation of @portertech and his colleagues at sonian.com. They have graciously open-sourced the project and made it available to all of us searching for a modern monitoring platform (or anyone searching for an alternative to Nagios.)

Sensu is often described as the “monitoring router”. Put another way, Sensu connects the output from “check” scripts run across many nodes with “handler” scripts run on Sensu servers. Messages are passed via RabbitMQ. Checks are used, for example, to determine if Apache is up or down. Checks can also be used to collect metrics such as MySQL statistics. The output of checks is routed to one or more handlers. Handlers determine what to do with the results of checks. Handlers currently exist for sending alerts to Pagerduty, IRC, Twitter, etc. Handlers can also feed metrics into Graphite, Librato, etc. Writing checks and handlers is quite simple and can be done in any language.

Key details:

  • Ruby 1.8.7+ (EventMachine, Sinatra, AMQP), RabbitMQ, Redis
  • Excellent test coverage with continuous integration (travis-ci)
  • Messaging oriented architecture. Messages are JSON objects.
  • Ability to re-use existing Nagios plugins
  • Plugins and handlers (think notifications) can be written in any language
  • Supports sending metrics into various backends (Graphite, Librato, etc)
  • Designed with modern configuration management systems such as Chef or Puppet in mind
  • Designed for cloud environments
  • Lightweight, less than 1200 lines of code

Continue Reading

Correlating Puppet changes to events in your infrastructure using graphite

Sometimes it is pretty obvious when Puppet changes something in your infrastructure and bad things happen in a big dramatic way. Other times it’s not so obvious. It can be invaluable to be able to correlate changes made by Puppet to other events happening in your infrastructure.

For example, in this diagram we have plotted the load average from a group of servers. Blue vertical lines mark points in time when puppet modified a resource on a host in the group. We can see that immediately following a puppet change the load spiked on one of the servers.

Code available on github:

Leave the first comment

List of statsd server implementations

Statsd is a simple client/server mechanism from the folks at Etsy that allows operations and development teams to easily feed a variety of metrics into a Graphite system. For more info on statsd read the seminal blog article on Statsd “Measure Anything, Measure Everything”.

As would be expected there are statsd clients in many languages. But, there are also many implementations of the statsd server. This is nice because each organization can pick the one that best fits them. For example, a python shop might prefer to deploy a python based statsd instead of Etsy’s original node.js implementation. Also, there are some statsd implementations that diverge from the original design and provide additional features.

I could not find a single resource that listed all of the different implementations, so I figured I would try to start one here.

  • Etsy’s statsd: node.js. The Original
  • petef-statsd: ruby. Supports AMQP.
  • statsd_rb: ruby.
  • quasor/statsd: ruby. can send data to graphite or mongoDB
  • py-statsd: python (including python client code).
  • zbx-statsd: python, based on py-statsd. Sends data to Zabbix instead of graphite.
  • statsd.scala: scala. Sends data to Ganglia instead of Graphite. Different messaging protocol, uses JSON.
  • txStatsD: python + twisted, from the folks @ Canonical
  • statsd-librato: node.js. Fork of etsy’s statsd for sending data to Librato instead of graphite from the folks @ Engine Yard.
  • estatsd: erlang. From the folks @ Opscode
  • metricsd: scala. Should be drop-in compatible with etsy’s statsd, but with support for additional metric types (eg: meter, gauge, histogram)
  • statsd-c: C. compatible with original etsy statsd
  • statsd (librato): node.js.  Librato’s officially maintained fork of statsd based on the changes from Engine Yard. Supports multiple graphing services including Librato Metrics
  • bucky: python. A unique spin on statsd that supports collecting data from statsd clients, collectd, and metricsd, with output to graphite. The ability to translate collectd plugin names to be more graphite-friendly is very compelling.
  • clj-statsd-svr: Clojure.
  • statsite: C. Statsite is designed to be both highly performant, and very flexible, using libev to be extremely fast.

Deprecated:

  • statsite: python. Replaced by a new implementation in C, see above.

Please leave a comment if you have an implementation that should be listed here. Feedback on any of the above implementations would be helpful too.

5 comments so far, add yours

Network Link Conditioner in Xcode 4.1, Lion

Previously, I wrote a post about using the ‘dummynet’ functionality in Mac OSX’s ipfw(8) firewall to simulate a variety of networking conditions, such as:  bandwidth, packet loss, latency (delay). This is a great feature for testing software under a variety of network conditions but it can be a little tough to use unless you’re comfortable at the command line, or even better, have unix scripting skills since there are multiple commands required to create even simple scenarios.

Then, today I noticed that Apple now includes a new prefPane in Xcode 4.1 and Lion called “Network Link Conditioner” that simplifies all of this, and even includes a few profiles to get you started (eg: “Wifi, Average case”, “3G, Lossy Network”.) Pretty cool feature. Especially useful for iOS developers. Screenshot below.

  • Install: find and run /Developer/Applications/Utilities/Network Link Conditioner/Network Link Conditioner.prefPane

 

Leave the first comment