Yes Mum, I'll Behave: Beginning Behaviour Driven Infrastructure


So I like to think I know a bit about enterprise monitoring and configuration management. I’ve done a lot of it over the years across multiple platforms and using a bunch of tools - both proprietary and open source. I’ve even written two books about the open source tools, Nagios and Puppet. But all this time I’ve been doing it wrong. Really badly wrong.

The typical enterprise monitoring and configuration management set-up is generally something like this: central server(s) manage and monitor a number of services on local and/or remote hosts. Digging down, for a web server these checks might be something like:

  • Is the Apache package installed and the appropriate version?
  • Is the Apache service running?
  • Can I connect to the HTTP port and is HTML returned?

Multiply this by a few hundred iterations of hosts and types of services and you’re probably looking at your typical Nagios, Puppet, Cfengine, Hyperic, Tivoli or Patrol set-up. Add logging, alerts, graphs and reporting and this is probably pretty close to the environment that most system administrators manage and monitor every day. All the bases covered, appropriate alerts when things go down, reporting for your management, etc, etc.

So that’s all good right and we don’t need to do anything more? Nope, not quite. All this monitoring misses something critical - we’re not actually monitoring that the service does what it should. Yes, it matters whether Apache is installed, the Apache service is running, and you can connect to HTTP but does this actually prove anything about the availability of the service we’re managing and providing for our customers? No again. You can connect to the port, have the service running and still not be delivering the right content or providing the appropriate functionality to the customer. And ultimately that’s what our jobs are all about - delivering service to the customer. Whether internal (“the business”) or an external customer, they don’t care about the infrastructure. Nor the technology, its configuration or anything else about the widgets that deliver the services they use. They just want “technology” to be:

  1. Available,
  2. Functional, and
  3. Cost-effective.

To deliver (and measure) the first two items on that list we’re going to need more than just a check that says the Apache server is up. We need to demonstrate that the service delivered by that infrastructure was available to our customers AND functioning as intended. If it isn’t functioning as intended, all the availability in the world is meaningless because the customer isn’t getting what they want.

(Needless to say most enterprise monitoring measures of “availability” are bogus. Using an ICMP ping of a host, uptime or checking a process as a measure of availability merely demonstrates that the asset is up. It doesn’t demonstrate that the asset is performing the function it should hence doesn’t actually measure “availability”.)

All is not lost though, we have the technology, we can rebuild your monitoring environment: better, stronger and more relevant. How? By stealing someone else’s idea. You see developers face the same challenge of delivering appropriate functionality. In their case an application may compile and run but produce incorrect output or worse no output at all. Like our monitoring, this leaves our developer short on knowing whether they are delivering functionality to the customer. So to ensure that their applications do what they have promised, developers test them.

There are lots of different kinds of testing: functional tests to confirm things work, performance testing, user acceptance tests to ensure user experience is suitable. But one kind of testing has become increasing important: behavioural testing. Behavioural testing checks that each function, method or procedure not only works but behaves in the intended way. Developers call this methodology Behaviour Driven Development or Test Driven Development (BDD and TDD for short). In a BDD/TDD environment each component of your code is tested to ensure it is behaving correctly. The basic element of this testing is called a unit test. In BDD, unit tests are developed for each function to determine whether it is fit for use. Let’s look a simple example of a function, one that adds numbers, and a unit test to confirm its behaviour is correct. We’ll start by articulating our function (in pseudo code).

def addition(val1,val2)
  print "Total =" val1 + val2

So if our function compiles, executes and returns a result does that mean it works? No, because we can’t guarantee it returns the right result. To overcome this gap in our knowledge, we devise a simple unit test (again in pseudo code).

test addition
  val1 = 4
  val2 = 6
  total = 10

  result = addition(val1,val2)
  if result != total then print "Function addition failed - incorrect total"

In our test we first set the input values and what the resulting output should be (in the object-orientated world these are called “mock” objects and are designed to simulate real objects in a controlled way). We then run the function we’d like to test and check that the returned result matches the mock output. If the result doesn’t match then we return an error message and we know we need to fix the function. This combination of functional tests and behavioural testing means that not only do we ensure our applications runs but that when it does run it is behaving correctly.

So can we apply Behaviour Driven Development to our infrastructure to test that it is behaving correctly? Enter Behaviour Driven Infrastructure. Behavioural Driven Infrastructure or BDI applies the principles of Behavioural Driven Development to the management of infrastructure. And boy is it cool. Let’s jump right in and see what a BDI check might look like. Remember our Apache checks? Let’s design some simple behavioural checks to supplement these, checks that step up from monitoring our infrastructure into monitoring the behaviour of our service:

  • The site contains some static value or content
  • The site contains some dynamic content that can be validated, for example data drawn from a database
  • When I click or follow links I get the right pages returned
  • When I fill in a form the values are validated
  • When I press a button the form is submitted
  • When I select a field or drop-down the right values are populated

Notice the key difference between the checks we defined earlier and these checks? These checks involve the behaviour of the service rather than its binary status. Instead of the site being “up” we’re testing that the site returns the right content or in other words that the site behaves correctly. We can see that determining tests for a website is relatively easy but what about other types of infrastructure and services? You can develop similar tests for a wide variety of infrastructure:

  • SSH - check that a particular user can login, an inappropriate user fails and is logged or alerted
  • SMTP - check that the daemon receives an email and delivers it, check it rejects mail it should reject, check authentication works
  • IMAP - Check you can receive email from a mailbox, check authentication works
  • MySQL/database/LDAP/directories - check you can query a record and that the record returned is correct
  • Load balancer - check connections are switched between hosts
  • DNS - check output of DNS queries is correct
  • Backups - backup and restore a file
  • Nagios/Enterprise Monitoring - Check tests pass, fail, escalate, send notifications
  • Samba/NFS - create, change, delete a file
  • Sudo - check you can run a sudo command and check inappropriate sudo commands fail and log

Notice what we’re trying to do? We’re testing that the service does what the customer expects it to do. This not only proves the service is behaving the way we want it to but also demonstrates that the service is available.

So all of these tests sound easy when written down like this but how do we implement them? We’re going to articulate our BDI tests in plain English using a tool called Cucumber and also introduce you to a spin-off tool called Cucumber-Nagios (which I talked about in a previous post).

Cucumber is a behavioural testing framework used heavily in the Ruby community (and in the Java, .NET, Flex communities too) that is simple and easy to learn

  • even for non-developers. Cucumber-Nagios takes this a little further to combine Cucumber with built-in testing frameworks (web using Webrat and SSH using Net::SSH for example) and outputs test results as Nagios check data. Perfect for immediate integration into your existing enterprise monitoring solution (and easily hack’able to output as other data formats also). The beauty of Cucumber-Nagios is that it comes with pre-built tests components that we can adapt to suit our environment. Cucumber has two components:

    1. Plain text tests called “features” which contain the different scenarios we want to test, and
    2. Supporting code called “steps” to actually test each “feature” and its associated scenarios.

Let’s create a simple behavioural test for our website. We start by installing Cucumber-Nagios via a Gem:

$ sudo gem install cucumber-nagios

On some distributions you may need to install _gemcutter _first:

$ sudo gem install gemcutter

$ sudo gem tumble

The cucumber-nagios gem will install the cucumber-nagios-gen binary which we will use to create a project to hold our tests.

$ cucumber-nagios-gen project test_project

Here we’ve told the cucumber-nagios-gen binary to create a project called test_project. A project is a mini-application that contains the right directory structure and files to run our tests. Change into the resulting directory:

$ cd test_project

We then need to bundle some supporting gems into the project to allow it to be self-contained:

$ gem bundle

Now we have a local version of cucumber-nagios-gen installed in the bin directory of the project and we can use this to create some features to test:

$ bin/cucumber-nagios-gen feature content

This creates a feature in a file called content.feature (each Cucumber file can contain one feature and must have a suffix of .feature). Let’s open this file and examine our feature:

  It should be up

  Scenario: Visiting home page
    When I go to
    Then the request should succeed

Cucumber uses a business-readable domain-specific language called Gherkin to write its features. Let’s deconstruct what each section of our feature means:

 Feature: Some terse yet descriptive text of what is desired
   In order to realize a named business value
   As an explicit system actor
   I want to gain some beneficial outcome which furthers the goal

   Scenario: Some determinable business situation
     Given some precondition
       And some other precondition
     When some action by the actor
       And some other action
       And yet another action
     Then some testable outcome is achieved
       And something else we can check happens too

   Scenario: A different situation

The feature starts with a description, in our case and then some text that describes the business value of the feature, that the website should be up. Each scenario used to validate that business value is then listed, each with it’s own description and a series of steps that they involve. There are three types of steps - Given, When or Then:

  1. Givens - put the system into a known state
  2. Whens - describe the key action that is being performed
  3. Thens - observe the outcomes

Which can be summarised as: Given some condition When I do this action Then I will see this outcome.

Each step in our scenario has to start with one of these types (but you don’t need to use all of them) and you can see in our example feature we’re using a When and a Then:

 When I go to
 Then the request should succeed

Simple and plain English. There is a bit more to Gherkin that we haven’t touched on (but you can read about at that link) but let’s try to run our feature now using the cucumber-nagios binary (obviously you need to be connected to the Internet for the feature to work appropriately):

$ bin/cucumber-nagios features/
Critical: 0, Warning: 0, 2 okay | passed=2, failed=0, nosteps=0, total=2

We can see that we’ve selected and executed our feature and it has returned some Nagios plug-in output (which appears as Critical, Warning, or Okay) and that 2 steps are Okay (or passed).

But wait a second, how have they passed? We haven’t written any code at all and it works? Well as I mentioned Cucumber-Nagios contains a set of pre- defined steps for a variety of common tasks. You can use these steps and not have to write any code. Let’s look at the associated step we’ve just used. This pre-defined step is contained in the features/steps/webrat.steps file in our project:

When /^I go to (.*)$/ do |path|
  visit path

You can see it’s a very simple bit of code that uses a regular expression to check our feature file for some specific language, in this case the words “I go to URL”. The regular expression captures a URL and passes it to Webrat which runs the visit function and returns the result. This is passed to the next step:

Then the request should succeed

Cucumber-Nagios also contains a set of pre-defined steps for handling the results, the Then steps. These steps are contained in the features/steps/result_steps.rb file. In this case we’ve used the following step:

Then /^the (.*) ?request should succeed/ do |_|
   success_code?.should be_true

This step checks the result of the When step and if it registered a success then the step passes and returns an Okay result.

Now, let’s see how we can add another scenario to our feature.

  It should be up
  You should be able to click on the Videos link

  Scenario: Visiting home page
    When I go to
    Then the request should succeed

  Scenario: Clicking on the Videos link
    When I go to
      And I follow "Videos"
    Then I should see "Google Videos"

In our new scenario we’ve tested following a link on the Google site to the Google Videos site. We’ve also used another piece of Cucumber statement, And, which is a cleaner way of writing multiple Given-When-Then steps.

This new When step also uses a pre-defined step from Cucumber-Nagios:

When /^I follow "(.*)"$/ do |link|

And a pre-defined Then step:

Then /^I should see an? (\w+) message$/ do |message_type|
  response.should have_xpath("//*[@class='#{message_type}']")

We can then run our new scenario:

$ bin/cucumber-nagios features/
Critical: 0, Warning: 0, 5 okay | passed=5, failed=0, nosteps=0, total=5

And see that we’ve now got 5 steps that pass, including our 3 new steps. Simple eh? And readily extensible.

So we’ve seen that Cucumber and BDI allows infrastructure testing but we can also extend this concept further. So we’ve seen that Cucumber can establish that the service is behaving correctly but what if we also used it to test for business logic and business rules also? Let’s take a simple example, expressed here as a feature:

  It should be up
  And I should be able to find the current Rate

  Scenario: Checking the current interest rate
    When I visit ""
    Then I should see the Rates section
    Then I should see current Rate
      And the current Rate should equal 5%

You’d need to write an additional step to return the Rate value in Cucumber- Nagios to get this example to work but assuming we had we could then run our new feature:

$ cucumber-nagios features/
Critical: 0, Warning: 0, 4 okay | value=4.000000;;;;

In this feature we’ve checking four steps:

  1. That the Home Loan site is up
  2. That it has a section called “Rates”
  3. That the section contains a Rate, and
  4. That the Rate equals 5%

So in this single scenario we’ve killed a lot of birds with one cucumber. We’ve confirmed the site is up, we’ve confirmed that the right content is being delivered and lastly we’ve confirmed that the information being delivered is correct. This goes beyond our traditional infrastructure testing to confirm that a piece of business data, our home loan rate, is correct. Using this test we could then configure Nagios to alert the appropriate business contact that the website was displaying the wrong rate. And hey presto we’re delivering business value.

This feature is just the tip of the iceberg. We could do a whole collection of other things, for example test transaction limits for a finance application, confirm access controls for applications, check the output of reports, and anything else where business logic or rules are exposed and can be exercised and tested.

Lastly, it’s important to remember we’re not replacing our traditional checks when using Cucumber. We still need to watch the low level components we’re just adding another, more powerful and insightful layer to our monitoring. A layer that more accurately represents the value that you offer to the business than graphs of ICMP ping results.