Posts Tagged ‘puppet’

What DevOps means to me…

February 19th, 2010

Over the last year or so a bunch of presumptuous European sysadmins and developers, joined by some of their American brethren and even a couple of us antipodeans (there are others too!) have been talking about a concept called DevOps.  DevOps is the merger of the realms of development and operations (and if truth be told elements of product management,QA, and *winces* even sales should be thrown into the mix too).

The Broken

So … why should we merge or bring together the two realms?  Well there are lots of reasons but first and foremost because what we’re doing now is broken.  Really, really broken.  In many shops the relationship between development (or engineering) and operations is dysfunctional to the point of occasional toxicity.

Here’s an example I think everyone will be at least partially familiar with: the minefield that is project to production software deployment.  Curse along as I explain.

Development builds an application, the new hotness which promises customers all the whizz-bang features and will make the company millions.  It is built using cutting edge technology and a brand new platform and it has got to be delivered right now.  Development cuts code like crazy and gets the product ready for market ahead of schedule.  They throw their masterpiece over the fence to Operations to implement and dash off to the pub for the wrap party.

Operations catches the deployment and is filled with horror.

The Operations team summarises their horror and says one or more of:

  • The wonder application won’t run on our infrastructure because {it’s too old, it doesn’t have capacity, we don’t support that version}
  • The architecture of the application doesn’t match our { storage, network, deployment, security } model
  • We weren’t consulted about the { reporting, security, monitoring, backup, provisioning } and it can’t be “productionised”.

But Operations persevere and install the new hotness – cursing and bitching throughout.  Sadly, after forcing the application onto infrastructure and bending and twisting the architecture to get it running, the performance of the new application can be summed up as “epic fail”.

Operations sighs and starts logging problems and passing issues back to the Development team.  Their responses generally come from the following pool:

  • It’s not our fault – our code is perfect – it’s just been poorly implemented
  • Operations are stupid and don’t understand the new hotness – why can’t they implement the cutting edge technology? Why are they so backward?
  • It runs fine on my machine…

The interactions between teams quickly becomes a toxic blame storm. The customers (and by extension the shareholders, investors and management) then become the losers.  The loop gets closed with the company losing bucket loads of money and everyone losing their jobs.  EPIC and FAIL.

What’s different about DevOps?

DevOps is all about trying to avoid that epic failure and working smarter and more efficiently at the same time. It is a framework of ideas and principles designed to foster cooperation, learning and coordination between development and operational groups. In a DevOps environment, developers and sysadmins build relationships, processes, and tools that allow them to better interact and ultimately better service the customer.

DevOps is also more than just software deployment – it’s a whole new way of thinking about cooperation and coordination between the people who make the software and the people who run it.  Areas like automation, monitoring, capacity planning & performance, backup & recovery, security, networking and provisioning can all benefit from using a DevOps model to enhance the nature and quality of interactions between development and operations teams.

Everyone in the DevOps community has a slightly different take on “What is DevOps?”  We all bring different experiences and focuses to the problem space.  I personally see DevOps as having four quadrants:

Simplicity

KISS is King and in that vein this section is simple too. Design simple, repeatable, and reusable solutions. Simplicity saves documentation, training, and support time.  Simplicity increases the speed of communication, avoids confusion, and helps reduces the risk of development and operational errors.  Simplicity gets you to the pub faster.

Relationships

Engage early, engage often. Development teams need to embed operations people into their project and development life cycles.  Invite operational people to your scrum or development meetings.  Share ideas and information about product plans and new technologies. Gather operational requirements when gathering functional ones. As a project progresses test deployment, backup, monitoring, security and configuration management as well as application functionality.  The more issues you fix during the project the less issues you expose your customers to when the application is live.  Educate operations people about the applications architecture and the code base. The more information operations people can feed you about a problem with the code the less trouble-shooting you need to perform and the faster the problem can be fixed.

Operations people need to bring development people into the problem and change management space. Invite developers into your team meetings. Share your roadmaps and upgrade plans.  Understand where future development is heading to better ensure infrastructure deployments match product requirements.  Developers also bring skills, knowledge and tools that can help make your environment easier to manage, more efficient and cleaner. Learn to code or if you’re a hack-n-slash systems programmer like me then learn to code better. Concepts like building tools with APIs rather than closed interfaces, distributed version control, test driven development, and methodologies like Agile Development, Kanban and Scrum can revolutionise operational practises in the same way they’ve changed the way code is cut.

Don’t be afraid of ideas and approaches from outside your domain – we can all learn things, even if it’s “let’s never do it that way again…!”, from how others do things and ultimately? Guess what? Yep, we’re all on the SAME team.

Remember that interactions between people rank, in decreasing order of effectiveness (in IMHO but backed by some research):

  1. Face to face
  2. Video conference
  3. Phone
  4. IM & IRC
  5. Email

Process

Don’t underestimate the power of process and automation.  Many shops do process engineering – ranging from hand-written lists to ISO9001. Those processes generally have one key flaw: they focus on the outcome and its inevitability.  A simple process might provision a host – Step 1 install machine, Step 2 cable machine, Step 3 install OS, etc, etc. Assuming all goes to process then at the end of Step x you will have a fully provisioned host. But what happens if it doesn’t go right?  If your process breaks or you receive some anomalous output how does your process deal with it?  Instead think about process as a journey and map out the potential pitfalls and obstacles.  Treat your processes like applications and build error handling into them.  You can’t predict every application or operational pitfall or issue but you can ensure that if you hit one your process isn’t derailed.

Link process together across domains – software deployment, monitoring, capacity planning and other “operational” processes have their start in the development world.  Software deployment is the logical conclusion of the software development life cycle and should be viewed as such rather than a separate operational process. Another example is metrics and monitoring, it is hard to measure anything  without understanding the baselines and assumptions made in the development domain.  Joint processes also mean more opportunity for development and operations interaction, understanding and joint accountability. Finally, joint process development means single repositories for documentation and other opportunities for economies of scale.

Automate, automate, automate. Build or make use of simple and extensible tools (make sure they have APIs and machine readable input and output – see James White’s Infrastructure Manifesto).  Use tools like Puppet (or others) to manage your configuration.  Remember to extend your automation umbrella cross-domain and end-to-end in your environment – manage development, testing, staging and production environments with the same tools and processes.  Not only does this have economies of scale benefits in support and management but it means you can test deployment and management alongside functionality as your application and new codes rolls toward production.

Finally, when building process and automation always keep the KISS principle in mind. Complexity breeds opportunities for error. Build simple processes and tools that are easy to implement, manage and maintain.

Continuous Improvement

Don’t stop innovating and learning.  Technology moves fast.  So do customer requirements. Build continuous improvement and integration into your tools and processes.  Here is a good place operations people can learn from (good) developers about practises like test-driven development.  A good example here is to build tests for your software deployment process and infrastructure.  They are often an application in their own right and should be developed and maintained correctly. Your monitoring could also be extended with behavioural testing to deliver better business value.  Look at using development domain tools, like Hudson for example, to explore and measure the operational domain.

Learn from mistakes and from outages.  Seek root cause aggressively AND cross-domain.  If you have an outage and a post-incident review then bring development and operational teams together to review the incident.  Sometimes some simple code refactoring can save making infrastructure changes.  Work together to fix root cause, treat it with the same process you develop to conduct project to production software deployment, rather than relegating them to incident review reports or batting issues between teams.

Me

Finally, for me DevOps is about people and nature of the environment you want to work in.  The best thing about the movement for me is that it is trying to foster behaviours and environments where people work together towards joint goals rather than at cross-purposes or at odds.  That’s a world I’d much rather use my skills in.

Puppet ParsedFile types and providers

February 13th, 2010

In a recent post I talked about how easy it is to generate Puppet types and providers. In that post I used the example of a very simple Subversion and Git repository type, called repo. I’d like to show another example of a type and provider, this one used to manage the contents of the /etc/shells file. This type and provider makes use of some built-in Puppet functionality that allows the simple parsing of files and the management of their contents. To do this Puppet has a provider called ParsedFile that can be included into your own providers to provide this functionality.

Let’s start with our type:

Puppet::Type.newtype(:shells) do
    @doc = "Manage the contents of /etc/shells
 
            shells { "/bin/newshell":
                ensure => present,
            }"
 
    ensurable
 
    newparam(:shell) do
        desc "The shell to manage"
 
        isnamevar
 
    end
 
    newproperty(:target) do
        desc "Location of the shells file"
 
        defaultto {
            if
                @resource.class.defaultprovider.ancestors.include? (Puppet::Provider::ParsedFile)
                @resource.class.defaultprovider.default_target
            else
                nil
            end
        }
    end
end

So – pretty simple. We create a block Puppet::Type.newtype(:shells) do that creates a new type, which we’ve called shells. Inside the block we’ve got a @doc string. This is the documentation for the type. Add whatever level of detail and examples in here that is required.

We’ve also got the ensurable statement. Ensurable provides some “automagic” that creates a basic ensure property. Puppet types use the ensure property to determine the state of a configuration item. In our previous example, ensurable resulted in three methods in the provider: create, destroy, and exists?. In a ParsedFile provider we don’t use these methods at all as we’ll see shortly but rather specify how to handle each record in the file.

We’ve defined a new parameter – this one called shell.

newparam(:shell) do
        desc "The shell to manage"
 
        isnamevar
end

The shell parameter is the shell we’re going to manage in the /etc/shells file. We’ve also used another piece of Puppet automagic, isnamevar, to make this parameter the “name” variable for this type. In Puppet-speak, the value of this parameter is used as the name of the resource.

Lastly in our type we’ve specified an optional parameter, target, that allows us to override the default location of the shells file, usually /etc/shells.

newproperty(:target) do
        desc "Location of the shells file"
 
        defaultto {
            if
                @resource.class.defaultprovider.ancestors.include? (Puppet::Provider::ParsedFile)
                @resource.class.defaultprovider.default_target
            else
                nil
            end
        }
end

The target parameter is optional and would only be specified if the shells file wasn’t located in the /etc/ directory. It uses the defaultto structure to specify that the default value for the parameter is the value of default_target variable in the provider.

The provider for our type is also very simple:

require 'puppet/provider/parsedfile'
shells = "/etc/shells"
 
Puppet::Type.type(:shells).provide(:parsed, :parent => Puppet::Provider::ParsedFile, :default_target => shells, :filetype => :flat) do
 
    desc "The shells provider that uses the ParsedFile class"
 
    text_line :comment, :match => /^#/;
    text_line :blank, :match => /^\s*$/;
 
    record_line :parsed, :fields => %w{name}
end

The shells provider is stored in a file called parsed.rb in a directory named for the provider in the provider directory, for example:

/usr/lib/ruby/site_ruby/1.8/puppet/type/shells.rb
/usr/lib/ruby/site_ruby/1.8/puppet/provider/shells/parsed.rb

The file needs to be named parsed.rb to allow Puppet to load the ParsedFile support.

We first include the ParsedFile provider code at the top of our provider, require 'puppet/provider/parsedfile' and set a variable called shells to the location of the /etc/shells file. We’re going to use this variable a bit later.

Then we tell Puppet that this is a provider called shells. We specify a :parent value that tells Puppet that this provider should inherit the ParsedFile provider and make its functions available. We then specify the :default_target value to the shells variable we’ve just created. This tells the provider, that unless it is overridden by the target attribute, that the file to act upon is /etc/shells.

Then we use a desc method that allows us to add some documentation to our provider.

The next lines are the core of the provider. They tell the Puppet how to manipulate the target file to add or remove the required shell. The first two lines, both text_lines, tell Puppet how to match comments and blank lines respectively.

    text_line :comment, :match => /^#/;
    text_line :blank, :match => /^\s*$/;

We specify these to let Puppet know to ignore these lines as unimportant.

The next line performs the actual parsing of the relevant line in the /etc/shells file:

    record_line :parsed, :fields => %w{name}

The record_line parses each line and divides it into fields, in our case we only have one field: name. The name in this case is the shell we want to manage. So if we specify:

shells { "/bin/newshell":
    ensure => present,

Then Puppet would use the provider to add the /bin/newshell by parsing each line of the /etc/shells file and checking if the newshell is present. If it is, then Puppet will do nothing. If not, then Puppet will add newshell to the file. If we changed the ensure attribute to absent then Puppet would go through the file and remove the newshell if it is present.

It is important to remember that ParsedFile providers do have some limitations, they aren’t good at managing complex files such as configuration files with multi-line options, they are best for simple files that contain single line lists of entries such as the cron file entries or the /etc/hosts and /etc/shells files.

You can see the complete code for this type and its providers at my Puppet repository on GitHub. Quite a lot of the existing Puppet types and providers use ParsedFile providers (the cron type for example) and you can use these as examples of how to create your own providers. You can also find further documentation (in a lot more detail!) on creating your own types and providers at the Puppet wiki.

Creating Puppet types and providers is easy…

February 1st, 2010

Puppet types are used to manage individual configuration items.  Puppet has a package type, a service type, a user type, etc.  Each type has providers. Each provider handles the management of that configuration on a different platform or tool, for example the package type has aptitude, yum, RPM, and DMG providers (amongst 22 others – what is wrong with people that they need to invent new packaging systems… but I digress).

There are a lot of types, in fact I think Puppet covers a pretty good spectrum of configuration items that need to be managed.  I don’t know of anything in particular that is missing that I can’t live without.  But there are little gaps that are annoying, I’d like network and firewall types for example, but creating both these types in a generic enough way to support multiple platforms would be, IMHO, a non-trivial problem. 

Another gap is VCS/DVCS management. A lot of people use source code in repositories to do things with (including install stuff from you bad people – package things … it’s healthier). Puppet currently relies on creating and removing these repositories with the exec type (which executes scripts or binaries), for example:

exec { "svn co http://core.svn.wordpress.org/trunk/ /var/www/wp":
    creates => "/var/www/wp",
}

This is a bit ugly and it’d be a lot easier to write a Puppet type to manage repositories. But Puppet types and providers are written in Ruby and really, really complex and hard to develop. Right? Right?

No. No, they are not… and I’m going to create a simple type and provider to show you. :)

Here’s a very (very!) simple Puppet type, called repo, for managing repositories. I’ve created providers for SVN and Git as examples also. The first part of the repo type is the type itself – these are usually stored in lib/puppet/type or distributed via modules (see the PluginsInModules page in the Puppet wiki). I’ll create a file called repo.rb.

$ touch repo.rb

And then populate the file:

Puppet::Type.newtype(:repo) do
    @doc = "Manage repos"
 
    ensurable
 
    newparam(:source) do
        desc "The repo source"
 
        validate do |value|
            if value =~ /^git/
                resource[:provider] = :git
            else
                resource[:provider] = :svn
            end
        end
 
        isnamevar
 
    end
 
    newparam(:path) do
        desc "Destination path"
 
        validate do |value|
            unless value =~ /^\/[a-z0-9]+/
                raise ArgumentError , "%s is not a valid file path" % value
            end
        end
    end
end

So – pretty simple. We create a block Puppet::Type.newtype(:repo) do that creates a new type, which we’ve called repo.

Inside the block we’ve got a @doc string. This is the documentation for the type. Add whatever level of detail and examples in here that is required.

We’ve also got the ensurable statement. Ensurable provides some “automagic” that creates a basic ensure property. Puppet types use the ensure property to determine the state of a configuration item.

service { "sshd":
    ensure => present,
}

The ensurable statement tells Puppet to expect three methods: create, destroy and exists? in our provider. These methods, allow, respectively:

  • A command to create the resource
  • A command to delete the resource, and
  • A command to check for the existence of the resource

All we then need to do is specify these methods and their contents and Puppet creates the supporting infrastructure around them but more on this when we look at our providers.

Next, we’ve defined a new parameter – this one called source.

    newparam(:source) do
        desc "The repo source"
 
        validate do |value|
            if value =~ /^git/
                resource[:provider] = :git
            else
                resource[:provider] = :svn
            end
        end
 
        isnamevar
    end

The source parameter will tell the repo type where to go to retrieve/clone/checkout our source repository.

In this parameter we’re also using a hook called validate. Normally used to check the value for appropriateness here we’re using it to take a guess at what provider to use. Our code says, if the source parameter starts with git then use the Git provider, if not default to the Subversion provider. This is obviously fairly crude as a default and we can override this by defining the provider attribute in our resources:

provider => git,

We’ve also used another piece of Puppet automagic, isnamevar, to make this parameter the “name” variable for this type. In Puppet-speak, the value of this parameter is used as the name of the resource.

(Types have two kinds of values – properties and parameters. Properties “do things”. They tell us HOW the provider works. We’ve only defined one property, ensure, by using the ensurable statement. Parameters are more like variables, they contain information relevant to configuring the resource the type manages rather than “doing things”.)

Finally, we’ve defined another parameter, path.

    newparam(:path) do
        desc "Destination path"
 
        validate do |value|
            unless value =~ /^\/[a-z0-9]+/
                raise ArgumentError , "%s is not a valid file path" % value
            end
        end
    end

This is a variable value that specifies where the repo type should put the cloned/checked-out repository. In this parameter we’ve again used the validate hook to create a block that checks the value for appropriateness. Here we’re just checking, very crudely, to make sure it looks like the destination path is a valid fully-qualified file path. We could also use this validation for the source parameter to confirm a valid source URL/location was being provided.

(You can also use another hook called munge to adjust the value of the parameter rather than validating it before passing it to the provider.)

And that is it for the type.

Next, we need to create a provider for our type. Let’s start with a Subversion provider like so:

require 'fileutils'
 
Puppet::Type.type(:repo).provide(:svn) do
    desc "SVN Support"
 
    commands :svncmd => "svn"
    commands :svnadmin => "svnadmin"
 
    def create
        svncmd "checkout", resource[:name], resource[:path]
    end
 
    def destroy
        FileUtils.rm_rf resource[:path]
    end
 
    def exists?
        File.directory? resource[:path]
    end
end

Up front we’ve required the fileutils library, which we’re going to use a method from. Next, we’ve defined the provider as a block:

Puppet::Type.type(:repo).provide(:svn) do

We tell Puppet that this is a provider called svn for the type called repo.

Then we use a desc method that allows us to add some documentation to our provider.

Next, we define the commands that this provider will use, here the svn and svnadmin binaries, to manipulate our resource’s configuration.

    commands :svncmd => "svn"
    commands :svnadmin => "svnadmin"

Puppet uses these commands to determine if the provider is appropriate to use on a client, if Puppet can’t find these commands in the local path then it will disable the provider.

Next, we’ve defined three methods – create, destroy and exists?. Sounds familiar? Yep, these are the methods that the ensurable statement expects to find in the provider:

The create method ensures our resource is created. It uses the svn command to create a repository with a source of resource[:name] (remember the source parameter in our type is also the name variable of the type – we could also specify resource[:source] here too) and a destination of resource[:path] (the value of the path attribute).

The delete method ensures the deletion of the resource. In this case, it deletes the directory and files specified in the resource[:path] parameter.

Lastly, the exists? method checks to see if the resource exists. Its operation is pretty simple and closely linked with the value of the ensure attribute in the resource:

  • If exists? is false and ensure is present, then create method will be called.
  • If exists? is true and ensure is set to absent, then the destroy method will be called.

In this case the exists? method checks if there is already a directory at the location specified in the resource[:path] parameter.

So, let’s put all this together and create a resource with our new type. I’ve assumed you’ve already distributed your type and providers to Puppet. We can then create a resource like:

repo { "wp":
    source => "http://core.svn.wordpress.org/trunk/",
    path => "/var/www/wp",
    ensure => present,
}

Simple eh? We specify a repo resource, the source we wish to check out or clone from, the destination path and the ensure attribute (present or absent) and that’s it.

You can see the complete code for this type and its providers at my Puppet repository on GitHub. It’s obviously very basic but should be easy to extend to provide additional capabilities (and currently has no tests – my bad). You can find further documentation (in a lot more detail!) on creating your own types and providers at the Puppet wiki.

Puppet 0.25.4 released!

January 29th, 2010

You wanted “release early, release often” and the Puppet team has delivered!
The 0.25.4 release is a maintenance release (with one important feature – pre/post transaction hooks – discussed below) in the 0.25.x branch.  The release primarily addresses a regression introduced in 0.25.3 that caused issues with creating cron jobs.

The release is available at:

http://reductivelabs.com/downloads/puppet/puppet-0.25.4.tar.gz

http://reductivelabs.com/downloads/gems/puppet-0.25.4.gem

http://gemcutter.org/gems/puppet

Please note that all final releases of Puppet are signed with the Reductive Labs key – http://reductivelabs.com/trac/puppet/wiki/DownloadingPuppet#verifying…

Please report feedback via the Reductive Labs Redmine site: http://projects.reductivelabs.com

Please select an affected version of 0.25.4

RELEASE NOTES

Pre/Post Transaction hooks

There is a new feature in this release: pre and post transaction hooks.  These hooks allow you to specify commands that should be run pre and post a Puppet configuration transaction.   They are set with the prerun_command and postrun_command settings in the puppet.conf configuration file.

prerun_command = /bin/runbeforetransaction
postrun_command = /bin/runaftertransaction

The command must exit with 0, i.e. succeed, otherwise the transaction will fail – if the pre command fails before the transaction is run and if the post command fails at the end of the transaction.

CHANGELOG
*  Bug #2845: Cron entries using “special” parameter lose their title when changed
* Bug #3001: Can’t manage broken links
* Bug #3039: 0.25.3 gem spec specifies the executables incorrectly
* Bug #3075: sshkey host aliases broken by fix for #2813
* Bug #3088: Puppetd fails to stop after receiving SIGTERM
* Bug #3089: puppetlast gsub! error
* Bug #3093: Blastwave provider broken in 0.25.3
* Bug #3104: Test failed: Puppet::Network::XMLRPCClient when performing the rpc call and an exception is  raised.should log and raise XMLRPCClientError if Timeout::Error is raised
* Bug #3112: Problem with adding and removing crons
* Bug #3122: Uncharacterized failure in fileserving under OS X
* Bug #3125: Dpkg tests failing
* Feature #2914: Transactions should have before and after hooks

The Tortoise and not the Hare 2 – Principles

January 24th, 2010

Kanban

In my first post I introduced you to the Toyota Production System and the Kanban signalling system. At the core of the TPS is the concept of maintaining efficiency and eliminating waste.  To govern this processes the TPS has a series of basic principles that articulate how this is achieved:

  1. Create continuous process flow to bring problems to the surface
  2. Use the “pull” system to avoid overproduction
  3. Level out or smooth the workload AKA “Heijunka” – be the tortoise not the hare
  4. Build a culture of stopping to fix problems, to get quality right from the first
  5. Standardized tasks are the foundation for continuous improvement and employee empowerment
  6. Use visual control so no problems are hidden
  7. Use only reliable, thoroughly tested technology that serves your people and processes.

I’ve skipped all the TPS rules around long-term thinking (for example the first principle of the TPS – Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals), corporate harmony, people development and organisational learning.  I haven’t skipped them because they aren’t important but because they aren’t immediately relevant to this discussion.  If you haven’t got the others right too though I suspect your organisation will go pear-shaped in other ways.

In each of the subsequent posts I am going to look at one of the seven principles I’ve articulated above, starting with exploring how you can make continuous process flow work for you.

Puppet, Chef, deterministic ordering and the much maligned DSL

January 14th, 2010

This morning I came across a post entitled Puppet versus Chef: 10 reasons why Puppet wins.  The post attempts to explain the differences between Chef and Puppet and why Puppet is superior.  The post wasn’t great IMHO, personally I thought it was fairly poorly reasoned and made some, potentially accurate, but throughly unsubstantiated claims.

Leaving aside the issues with the post itself though, it did prompt an interesting comment thread, particularly comments  between Opcode’s CTO Adam Jacob and Reductive Lab’s Teyo Tyree (links are to the respective comments – Adam’s and Teyo’s reply).

I’m going to quote Teyo’s comment in full because I think it answers a lot of question that people have had about some of the key differences between Puppet and Chef – dependency modelling and DSL:

There is a misstatement in your assessment of Puppet’s dependency handling. You express Chef’s ordering as deterministic and imply that Puppet is in someway non-deterministic. This is not the first time you have implied this publicly, so I thought I should bring some clarity to your misstatement. The actual differentiation is procedural ordering versus a dependency graph. Puppet provides a graphing model for ordering versus a procedural model. Sure, you get procedural ordering for free with Ruby, it seems easier, and I am aware that this was a design decision for you guys. We also know that you were frustrated by “having” to express dependencies in Puppet in order to ensure consistent ordering. Properly expressed dependencies in Puppet provide ordering where you care to have it. Procedural ordering is implicit even if you don’t care. This is a BIG difference, perhaps the fundamental difference between Puppet and Chef and one that was designed into Puppet because of experiences we had trying to cope with a large code base of procedurally order scripts to manage an enterprise infrastructure. Yeah we were using make, yeah that was crazy, crazy but informative.

Your omission is related to your design decision to avoid dependency graphing, which you yourself have admitted has some major downsides, namely the inability to provide a reasonable dry-run mode, http://bit.ly/4Gcz7G. Frankly, I don’t know how you develop with out a dry-run mode, but hey I am a sysadmin not a developer.

Without a graph of resource dependencies, we would have no way of separating concerns. Consider the use case of implementing security standards. Ideally, you would want any given configuration run to bring your system into complete compliance. That sounds great but would you really want security policies not to be implemented because some earlier procedure was unable to succeed, say because it was pulling in data from a source that was not available.

So here is the difference in a nutshell. Puppet generates a catalog of dependent resources. This catalog is shipped to the clients and acted on by the Resource Abstraction Layer (RAL). On the other hand Chef, ships the required Ruby code for any node’s configuration and orders the execution of that code procedurally. These are the core differences. The DSL issues become moot if you consider Shadow Puppet or the Ruby DSL that we are developing as part of Puppet’s next release. The real difference, and IMHO Puppet’s advantage is our resource model and it’s dependency graphs versus a monolithic procedural chunk of Ruby code delivered to every client.

Here are some derived advantages of our model and a little love for the much maligned declarative external DSL:

1) Graphing base branch independence.

Parts of a catalog can be implemented more often than others. That is to say, we can tag certain resources to be checked and reconfigured more often. Additionally, parts of a configuration can be meaningfully checked but not acted on (See Adam’s discussion of noop http://bit.ly/4Gcz7G). Our customers/community love this and without a graph I don’t see how it is possible.

2) Cross host dependencies.

Our data model passes dependencies into our catalog caching system, so cross host dependencies can be resolved as well. This isn’t currently available but the framework exists and we intend to take advantage of it.

3) Failures are contained.

Critical parts of a configuration run are not excluded because of non-dependent failures.

4) Low barrier to entry for non-rubyist.

Non-rubyist can take advantage of the specification language out of the gate and Ruby developers can take advantage of the current Puppet plugin API and the future Ruby DSL, so everyone gets to use their favorite hammer.

5) Don’t like our DSL, don’t like Ruby?

Because we are only generating a catalog from the configuration language it should be fairly straight forward for anyone to generate a catalog using whatever language they choose and the RAL would be able to act on it. Come on Python people you know you want to generate catalogs with Python.

6) I can’t run Ruby on my switch!

Because we are using a data model for resources, devices that can’t/don’t have access to Ruby could still use the catalog as a basis for configuring themselves. Routers, switches, firewalls, could all be configured using the same specification language independent of how the specification is implemented, but with the resource model intact.

Finally, I think that you were a little hard on John about his comments on Chef being Rails focused. Certainly he misspoke, but the truth is that Chef development has been focused on web-application rollout in fairly homogeneous environments. Sure you can use Chef to manage the initial deployment of a web application, but in environments where you may have lots of teams utilizing compute resources for various application architectures, Puppet’s resource model shines. Security administrators can develop their Puppet manifests and not need to worry that security policies are not going to be applied because the DBA teams manifests failed. Operations teams can run Puppet in noop mode persistently and be notified if their configuration is out of compliance. Developers can make sure that the infrastructure they need for successful application deployment is available without having to worry that the security policy failed to be applied. Everyone gets to be friendlier with one and other and perhaps even get to the pub earlier on occasion.

Disclosure & Disclaimer – I am Puppet’s Release Manager and obviously heavily involved with the Reductive Lab’s team and the project.  My opinions are my own and not representative of my employer or Reductive Labs.

Puppet 0.25.3 – “Clifford” released!

January 12th, 2010

CliffordPuppet 0.25.3 – code-named “Clifford”

The 0.25.3 release is a maintenance release in the 0.25.x branch.  The release addresses a regression introduced in 0.25.2 that caused issues with command execution.

The release is available at:

http://reductivelabs.com/downloads/puppet/puppet-0.25.3.tar.gz

http://reductivelabs.com/downloads/puppet/puppet-0.25.3.gem

http://gemcutter.org/gems/puppet

Please note that all final releases of Puppet are signed with the Reductive Labs key.

http://reductivelabs.com/trac/puppet/wiki/DownloadingPuppet#verifying-puppet-downloads

Please report feedback via the Reductive Labs Redmine site:

http://projects.reductivelabs.com

Please select an affected version of 0.25.3.

CHANGELOG

* Bug #1464: Mount resource complains about missing options field
* Bug #2845: Cron entries using “special” parameter lose their title when changed
* Bug #2887: Service (init) does not seem to work with require properly
* Bug #3013: util.rb:execute broken on Ruby <1.8.3
* Bug #3025: apt and aptitude providers dont work on Debian Lenny puppet 0.25.2 from gems

Puppet 0.25.2 “Zoe” released!

January 5th, 2010

Zoe the Muppet

Puppet 0.25.2 – code-named "Zoe"

The 0.25.2 release is a significant maintenance release (123 tickets closed!) in the 0.25.x branch.

Thanks to all who contributed to the release and tested fixes – especially (but not limited to!) Peter Meier (duritong), R.I. Pienaar (Volcane), Mark Plaskin, Dan Bode, Alan Harder, Ricky Zhou, Christian Hofstaedtler, Todd Zullinger, Till Mass, Nigel Kersten, and especially Markus Roberts and Jesse Wolfe who worked around the clock to get the release out the door.

The release is available at:

http://reductivelabs.com/downloads/puppet/puppet-0.25.2.tar.gz
http://reductivelabs.com/downloads/puppet/puppet-0.25.2.gem

Please note that all future final releases of Puppet will be signed with the Reductive Labs key.  Unfortunately, I am travelling and unable to access to the box with the release key on it or its backup.  A signature will be generated for this release early next week when I return to Australia.

http://reductivelabs.com/trac/puppet/wiki/DownloadingPuppet#verifying-puppet-downloads

Please report feedback via the Reductive Labs Redmine site:

http://projects.reductivelabs.com

Please select an affected version of 0.25.2.

RELEASE NOTES

* When setting aliases using the host type now use the host_alias attribute rather than alias.

* Puppet now has the "manage_internal_file_permissions" option which allows you to enable or disable Puppet management of internal files, for example those in /var/lib/puppet.  When "false" Puppet will NOT manage these files.  Default is "true".

* Cron type now supported on AIX

* Mailist type is now working again

* File serving permissions error messages enhanced

* SELinux now supports contexts with upper case titles

* When running the tests you no longer need to use RSpec version 1.2.2 but rather versions including and newer than.

* The debug format message has been changed and clarified from:

debug: Format s not supported for Puppet::FileServing::Metadata; has not implemented method 'from_s'

to:

debug: file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml; using pson

* Puppetdoc now works with Regex node names

* There are now valid and proper OIDs in the LDAP puppet.schema that are unique and registered for Puppet.

* Packagers please note updated man pages including a new page for puppetqd

*    Fix for temporary file issues (https://bugzilla.redhat.com/show_bug.cgi?id=502881)

CHANGELOG

Full list of closed tickets.

 

The Tortoise and not the Hare – Part 1

January 2nd, 2010

production line

The production line is one of the marvels of the industrial era. I've always been fascinated with production lines in factories and how a product, like a car, gets constructed from individual components and grows until finally it rolls off the production line as a finished product.  In the last few years I've been thinking more and more about production lines and how they overlap with IT Operations.  So do they have anything in common?  Can you draw parallels between building a car and running an IT operation? Damn right you can!

Production lines construct assets from components and then sells these assets.  IT shops also construct assets, in the form of software, infrastructure and services.  These assets are also constructed from components to make a functioning whole and are then sold to customers.  The perfect example is hosting infrastructure.  A host is constructed from hardware (CPU, storage, networking), software (operating system, applications) and configuration data and then delivered to a customer for use (or "sale"). I'm going to look at the core principles of production lines, specifically some of the methodologies around their management, and see if they offer value in running IT organisations and operations. I'm also going to demonstrate some work flow and how to use some tools (like Puppet and others) to model these principles in your own IT shop.

The production line relies heavily on process and continuous flow to function efficiently.  The asset moves through the line having actions performed on it or components added to it. The objective is an uninterrupted flow from beginning to end with enough of the right components, processes and people being introduced at the right time.  Getting this production life cycle right isn't easy. As a result, the study and practise of production line management has become a science.

One of the most famous methodologies is the Toyota Production System or TPS. If you work anywhere where process is important – and that's pretty much every manufacturing organisation and almost every large corporate (including banks, insurance companies, transport, logistics and a flurry of others) – then you'll probably have heard of TPS and one of its integral components, Kanban. The TPS is a lean/Just In Time (JIT) production practice model.  Lean practises are ones where the focus is on the activities that deliver customer value.  Resources that are expended on other activities are always suspect and targets for elimination.  JIT attempts to improve ROI ("Return on Investment") by streamlining production process, managing demand and hence reducing the amount of inventory ("parts") carried so that only the parts needed are stored and then for the shortest possible time before being consumed for production.

So where does Kanban fit in? Kanban is a demand management and signalling system that uses physical signs to act as triggers between processes.  I've stolen a very simple kanban example from Wikipedia:

"A simple example of the kanban system implementation might be a "three-bin system" for the supplied parts (where there is no in-house manufacturing) — one bin on the factory floor (demand point), one bin in the factory store and one bin at the suppliers' store. The bins usually have a removable card that contains the product details and other relevant information — the kanban card. When the bin on the factory floor becomes empty, i.e, there is demand for parts, the empty bin and kanban cards are returned to the factory store. The factory store then replaces the bin on the factory floor with a full bin, which also contains a kanban card. The factory store then contacts the supplier’s store and returns the now empty bin with its kanban card. The supplier's inbound product bin with its kanban card is then delivered into the factory store completing the final step to the system. Thus the process will never run out of product and could be described as a loop, providing the exact amount required, with only one spare so there will never be an issue of over-supply. This 'spare' bin allows for the uncertainty in supply, use and transport that are inherent in the system."

Simple huh?  You can also readily scale this example by adding multiple bins (which each have their own kanban card).  This allows tights controls on stock management (and hence costs!). 

So can we apply these concepts to IT infrastructure?  Hang onto your hats because in Part II of this series of posts I'm going to do exactly that.

Yes Mum, I’ll Behave: Beginning Behaviour Driven Infrastructure

December 21st, 2009

So I like to think I know a bit about enterprise monitoring and configuration management.  I've done a lot of it over the years across multiple platforms and using a bunch of tools – both proprietary and open source.  I've even written two books about the open source tools, Nagios and Puppet.  But all this time I've been doing it wrong.  Really badly wrong.

The typical enterprise monitoring and configuration management set-up is generally something like this: central server(s) manage and monitor a number of services on local and/or remote hosts.  Digging down, for a web server these checks might be something like:

  • Is the Apache package installed and the appropriate version?
  • Is the Apache service running?
  • Can I connect to the HTTP port and is HTML returned?

Multiply this by a few hundred iterations of hosts and types of services and you're probably looking at your typical Nagios, Puppet, Cfengine, Hyperic, Tivoli or Patrol set-up. Add logging, alerts, graphs and reporting and this is probably pretty close to the environment that most system administrators manage and monitor every day. All the bases covered, appropriate alerts when things go down, reporting for your management, etc, etc.

So that's all good right and we don't need to do anything more?  Nope, not quite.  All this monitoring misses something critical – we're not actually monitoring that the service does what it should.  Yes, it matters whether Apache is installed, the Apache service is running, and you can connect to HTTP but does this actually prove anything about the availability of the service we're managing and providing for our customers?  No again.  You can connect to the port, have the service running and still not be delivering the right content or providing the appropriate functionality to the customer. And ultimately that's what our jobs are all about – delivering service to the customer.  Whether internal ("the business") or an external customer, they don't care about the infrastructure.  Nor the technology, its configuration or anything else about the widgets that deliver the services they use.  They just want "technology" to be:

  1. Available,
  2. Functional, and
  3. Cost-effective.

To deliver (and measure) the first two items on that list we're going to need more than just a check that says the Apache server is up.  We need to demonstrate that the service delivered by that infrastructure was available to our customers AND functioning as intended.  If it isn't functioning as intended, all the availability in the world is meaningless because the customer isn't getting what they want.

(Needless to say most enterprise monitoring measures of "availability" are bogus.  Using an ICMP ping of a host, uptime or checking a process as a measure of availability merely demonstrates that the asset is up.  It doesn't demonstrate that the asset is performing the function it should hence doesn't actually measure "availability".)

All is not lost though, we have the technology, we can rebuild your monitoring environment: better, stronger and more relevant.  How?  By stealing someone else's idea.  You see developers face the same challenge of delivering appropriate functionality.  In their case an application may compile and run but produce incorrect output or worse no output at all.  Like our monitoring, this leaves our developer short on knowing whether they are delivering functionality to the customer. So to ensure that their applications do what they have promised, developers test them. 

There are lots of different kinds of testing: functional tests to confirm things work, performance testing, user acceptance tests to ensure user experience is suitable.  But one kind of testing has become increasing important: behavioural testing.  Behavioural testing checks that each function, method or procedure not only works but behaves in the intended way. Developers call this methodology Behaviour Driven Development or Test Driven Development (BDD and TDD for short). In a BDD/TDD environment each component of your code is tested to ensure it is behaving correctly.  The basic element of this testing is called a unit test.  In BDD, unit tests are developed for each function to determine whether it is fit for use.  Let's look a simple example of a function, one that adds numbers, and a unit test to confirm its behaviour is correct.  We'll start by articulating our function (in pseudo code).

def addition(val1,val2)
  print "Total =" val1 + val2
end

So if our function compiles, executes and returns a result does that mean it works?  No, because we can't guarantee it returns the right result.  To overcome this gap in our knowledge, we devise a simple unit test (again in pseudo code).

test addition
  val1 = 4
  val2 = 6
  total = 10

  result = addition(val1,val2)
  if result != total then print "Function addition failed - incorrect total"
end

In our test we first set the input values and what the resulting output should be (in the object-orientated world these are called "mock" objects and are designed to simulate real objects in a controlled way).  We then run the function we'd like to test and check that the returned result matches the mock output.  If the result doesn't match then we return an error message and we know we need to fix the function. This combination of functional tests and behavioural testing means that not only do we ensure our applications runs but that when it does run it is behaving correctly.

So can we apply Behaviour Driven Development to our infrastructure to test that it is behaving correctly?  Enter Behaviour Driven Infrastructure. Behavioural Driven Infrastructure or BDI applies the principles of Behavioural Driven Development to the management of infrastructure.  And boy is it cool. Let's jump right in and see what a BDI check might look like.  Remember our Apache checks?  Let's design some simple behavioural checks to supplement these, checks that step up from monitoring our infrastructure into monitoring the behaviour of our service:

  • The site contains some static value or content
  • The site contains some dynamic content that can be validated, for example data drawn from a database
  • When I click or follow links I get the right pages returned
  • When I fill in a form the values are validated
  • When I press a button the form is submitted
  • When I select a field or drop-down the right values are populated

Notice the key difference between the checks we defined earlier and these checks?  These checks involve the behaviour of the service rather than its binary status.  Instead of the site being "up" we're testing that the site returns the right content or in other words that the site behaves correctly. We can see that determining tests for a website is relatively easy but what about other types of infrastructure and services?  You can develop similar tests for a wide variety of infrastructure:

  • SSH – check that a particular user can login, an inappropriate user fails and is logged or alerted
  • SMTP – check that the daemon receives an email and delivers it, check it rejects mail it should reject, check authentication works
  • IMAP – Check you can receive email from a mailbox, check authentication works
  • MySQL/database/LDAP/directories – check you can query a record and that the record returned is correct
  • Load balancer – check connections are switched between hosts
  • DNS – check output of DNS queries is correct
  • Backups – backup and restore a file
  • Nagios/Enterprise Monitoring – Check tests pass, fail, escalate, send notifications
  • Samba/NFS – create, change, delete a file
  • Sudo – check you can run a sudo command and check inappropriate sudo commands fail and log

Notice what we're trying to do?  We're testing that the service does what the customer expects it to do.  This not only proves the service is behaving the way we want it to but also demonstrates that the service is available.

So all of these tests sound easy when written down like this but how do we implement them? We're going to articulate our BDI tests in plain English using a tool called Cucumber and also introduce you to a spin-off tool called Cucumber-Nagios (which I talked about in a previous post). 

Cucumber is a behavioural testing framework used heavily in the Ruby community (and in the Java, .NET, Flex communities too) that is simple and easy to learn – even for non-developers. Cucumber-Nagios takes this a little further to combine Cucumber with built-in testing frameworks (web using Webrat and SSH using Net::SSH for example) and outputs test results as Nagios check data.  Perfect for immediate integration into your existing enterprise monitoring solution (and easily hack'able to output as other data formats also).  The beauty of Cucumber-Nagios is that it comes with pre-built tests components that we can adapt to suit our environment. Cucumber has two components:

  1. Plain text tests called "features" which contain the different scenarios we want to test, and
  2. Supporting code called "steps" to actually test each "feature" and its associated scenarios.

Let's create a simple behavioural test for our website.  We start by installing Cucumber-Nagios via a Gem:

$ sudo gem install cucumber-nagios

On some distributions you may need to install gemcutter first:

$ sudo gem install gemcutter

$ sudo gem tumble

The cucumber-nagios gem will install the cucumber-nagios-gen binary which we will use to create a project to hold our tests.

$ cucumber-nagios-gen project test_project

Here we've told the cucumber-nagios-gen binary to create a project called test_project. A project is a mini-application that contains the right directory structure and files to run our tests. Change into the resulting directory:

$ cd test_project

We then need to bundle some supporting gems into the project to allow it to be self-contained:

$ gem bundle

Now we have a local version of cucumber-nagios-gen installed in the bin directory of the project and we can use this to create some features to test:

$ bin/cucumber-nagios-gen feature www.google.com content

This creates a feature in a file called content.feature (each Cucumber file can contain one feature and must have a suffix of .feature).  Let's open this file and examine our feature:

Feature: www.google.com
  It should be up

  Scenario: Visiting home page
    When I go to http://www.google.com
    Then the request should succeed

Cucumber uses a business-readable domain-specific language called Gherkin to write its features. Let's deconstruct what each section of our feature means:

 Feature: Some terse yet descriptive text of what is desired
   In order to realize a named business value
   As an explicit system actor
   I want to gain some beneficial outcome which furthers the goal

   Scenario: Some determinable business situation
     Given some precondition
       And some other precondition
     When some action by the actor
       And some other action
       And yet another action
     Then some testable outcome is achieved
       And something else we can check happens too

   Scenario: A different situation
       ...

The feature starts with a description, in our case www.google.com and then some text that describes the business value of the feature, that the website should be up.  Each scenario used to validate that business value is then listed, each with it's own description and a series of steps that they involve.  There are three types of steps – Given, When or Then:

  1. Givens – put the system into a known state
  2. Whens – describe the key action that is being performed
  3. Thens – observe the outcomes

Which can be summarised as: Given some condition When I do this action Then I will see this outcome.

Each step in our scenario has to start with one of these types (but you don't need to use all of them) and you can see in our example feature we're using a When and a Then:

 When I go to http://www.google.com
 Then the request should succeed

Simple and plain English.  There is a bit more to Gherkin that we haven't touched on (but you can read about at that link) but let's try to run our feature now using the cucumber-nagios binary (obviously you need to be connected to the Internet for the feature to work appropriately):

$ bin/cucumber-nagios features/www.google.com/content.feature
Critical: 0, Warning: 0, 2 okay | passed=2, failed=0, nosteps=0, total=2

We can see that we've selected and executed our feature and it has returned some Nagios plug-in output (which appears as Critical, Warning, or Okay) and that 2 steps are Okay (or passed). 

But wait a second, how have they passed?  We haven't written any code at all and it works?  Well as I mentioned Cucumber-Nagios contains a set of pre-defined steps for a variety of common tasks.  You can use these steps and not have to write any code.  Let's look at the associated step we've just used.  This pre-defined step is contained in the features/steps/webrat.steps file in our project:

When /^I go to (.*)$/ do |path|
  visit path
end

You can see it's a very simple bit of code that uses a regular expression to check our feature file for some specific language, in this case the words "I go to URL".  The regular expression captures a URL and passes it to Webrat which runs the visit function and returns the result.  This is passed to the next step:

Then the request should succeed

Cucumber-Nagios also contains a set of pre-defined steps for handling the results, the Then steps.  These steps are contained in the features/steps/result_steps.rb file.  In this case we've used the following step:

Then /^the (.*) ?request should succeed/ do |_|
   success_code?.should be_true
end

This step checks the result of the When step and if it registered a success then the step passes and returns an Okay result.

Now, let's see how we can add another scenario to our feature.

Feature: www.google.com
  It should be up
  You should be able to click on the Videos link

  Scenario: Visiting home page
    When I go to http://www.google.com
    Then the request should succeed

  Scenario: Clicking on the Videos link
    When I go to http://www.google.com
      And I follow "Videos"
    Then I should see "Google Videos"

In our new scenario we've tested following a link on the Google site to the Google Videos site.  We've also used another piece of Cucumber statement, And, which is a cleaner way of writing multiple Given-When-Then steps.

This new When step also uses a pre-defined step from Cucumber-Nagios:

When /^I follow "(.*)"$/ do |link|
   click_link(link)
end

And a pre-defined Then step:

Then /^I should see an? (\w+) message$/ do |message_type|
  response.should have_xpath("//*[@class='#{message_type}']")
end

We can then run our new scenario:

$ bin/cucumber-nagios features/www.google.com/content.feature
Critical: 0, Warning: 0, 5 okay | passed=5, failed=0, nosteps=0, total=5

And see that we've now got 5 steps that pass, including our 3 new steps.  Simple eh?  And readily extensible.

So we've seen that Cucumber and BDI allows infrastructure testing but we can also extend this concept further.  So we've seen that Cucumber can establish that the service is behaving correctly but what if we also used it to test for business logic and business rules also?  Let's take a simple example, expressed here as a feature:

Feature: homeloansite.com
  It should be up
  And I should be able to find the current Rate

  Scenario: Checking the current interest rate
    When I visit "http://homeloansite.com"
    Then I should see the Rates section
    Then I should see current Rate
      And the current Rate should equal 5%

You'd need to write an additional step to return the Rate value in Cucumber-Nagios to get this example to work but assuming we had we could then run our new feature:

$ cucumber-nagios features/homeloansite.com/homeloan.feature
Critical: 0, Warning: 0, 4 okay | value=4.000000;;;;

In this feature we've checking four steps:

  1. That the Home Loan site is up
  2. That it has a section called "Rates"
  3. That the section contains a Rate, and
  4. That the Rate equals 5%

So in this single scenario we've killed a lot of birds with one cucumber.  We've confirmed the site is up, we've confirmed that the right content is being delivered and lastly we've confirmed that the information being delivered is correct.  This goes beyond our traditional infrastructure testing to confirm that a piece of business data, our home loan rate, is correct.  Using this test we could then configure Nagios to alert the appropriate business contact that the website was displaying the wrong rate.  And hey presto we're delivering business value.

This feature is just the tip of the iceberg.  We could do a whole collection of other things, for example test transaction limits for a finance application, confirm access controls for applications, check the output of reports, and anything else where business logic or rules are exposed and can be exercised and tested.

Lastly, it's important to remember we're not replacing our traditional checks when using Cucumber.  We still need to watch the low level components we're just adding another, more powerful and insightful layer to our monitoring.  A layer that more accurately represents the value that you offer to the business than graphs of ICMP ping results.