The Tortoise and not the Hare – Part 1

January 2nd, 2010 by kartar 3 comments »

production line

The production line is one of the marvels of the industrial era. I've always been fascinated with production lines in factories and how a product, like a car, gets constructed from individual components and grows until finally it rolls off the production line as a finished product.  In the last few years I've been thinking more and more about production lines and how they overlap with IT Operations.  So do they have anything in common?  Can you draw parallels between building a car and running an IT operation? Damn right you can!

Production lines construct assets from components and then sells these assets.  IT shops also construct assets, in the form of software, infrastructure and services.  These assets are also constructed from components to make a functioning whole and are then sold to customers.  The perfect example is hosting infrastructure.  A host is constructed from hardware (CPU, storage, networking), software (operating system, applications) and configuration data and then delivered to a customer for use (or "sale"). I'm going to look at the core principles of production lines, specifically some of the methodologies around their management, and see if they offer value in running IT organisations and operations. I'm also going to demonstrate some work flow and how to use some tools (like Puppet and others) to model these principles in your own IT shop.

The production line relies heavily on process and continuous flow to function efficiently.  The asset moves through the line having actions performed on it or components added to it. The objective is an uninterrupted flow from beginning to end with enough of the right components, processes and people being introduced at the right time.  Getting this production life cycle right isn't easy. As a result, the study and practise of production line management has become a science.

One of the most famous methodologies is the Toyota Production System or TPS. If you work anywhere where process is important – and that's pretty much every manufacturing organisation and almost every large corporate (including banks, insurance companies, transport, logistics and a flurry of others) – then you'll probably have heard of TPS and one of its integral components, Kanban. The TPS is a lean/Just In Time (JIT) production practice model.  Lean practises are ones where the focus is on the activities that deliver customer value.  Resources that are expended on other activities are always suspect and targets for elimination.  JIT attempts to improve ROI ("Return on Investment") by streamlining production process, managing demand and hence reducing the amount of inventory ("parts") carried so that only the parts needed are stored and then for the shortest possible time before being consumed for production.

So where does Kanban fit in? Kanban is a demand management and signalling system that uses physical signs to act as triggers between processes.  I've stolen a very simple kanban example from Wikipedia:

"A simple example of the kanban system implementation might be a "three-bin system" for the supplied parts (where there is no in-house manufacturing) — one bin on the factory floor (demand point), one bin in the factory store and one bin at the suppliers' store. The bins usually have a removable card that contains the product details and other relevant information — the kanban card. When the bin on the factory floor becomes empty, i.e, there is demand for parts, the empty bin and kanban cards are returned to the factory store. The factory store then replaces the bin on the factory floor with a full bin, which also contains a kanban card. The factory store then contacts the supplier’s store and returns the now empty bin with its kanban card. The supplier's inbound product bin with its kanban card is then delivered into the factory store completing the final step to the system. Thus the process will never run out of product and could be described as a loop, providing the exact amount required, with only one spare so there will never be an issue of over-supply. This 'spare' bin allows for the uncertainty in supply, use and transport that are inherent in the system."

Simple huh?  You can also readily scale this example by adding multiple bins (which each have their own kanban card).  This allows tights controls on stock management (and hence costs!). 

So can we apply these concepts to IT infrastructure?  Hang onto your hats because in Part II of this series of posts I'm going to do exactly that.

Puppet 0.25.2 Release Candidate 3 out!

January 1st, 2010 by kartar No comments »

We've pounced on a few more bugs and Puppet 0.25.2 release candidate 3 is a go.  Please test hard.  The production release should be in a few days barring any more bugs being found.

Puppet 0.25.2 – release candidate 2 is out!

December 23rd, 2009 by kartar No comments »

We've powered our way through 118 tickets to get to 0.25.2 and have a release candidate in the wild – we actually have RC2 out because of a missing commit in RC1. 

Yes Mum, I’ll Behave: Beginning Behaviour Driven Infrastructure

December 21st, 2009 by kartar 9 comments »

So I like to think I know a bit about enterprise monitoring and configuration management.  I've done a lot of it over the years across multiple platforms and using a bunch of tools – both proprietary and open source.  I've even written two books about the open source tools, Nagios and Puppet.  But all this time I've been doing it wrong.  Really badly wrong.

The typical enterprise monitoring and configuration management set-up is generally something like this: central server(s) manage and monitor a number of services on local and/or remote hosts.  Digging down, for a web server these checks might be something like:

  • Is the Apache package installed and the appropriate version?
  • Is the Apache service running?
  • Can I connect to the HTTP port and is HTML returned?

Multiply this by a few hundred iterations of hosts and types of services and you're probably looking at your typical Nagios, Puppet, Cfengine, Hyperic, Tivoli or Patrol set-up. Add logging, alerts, graphs and reporting and this is probably pretty close to the environment that most system administrators manage and monitor every day. All the bases covered, appropriate alerts when things go down, reporting for your management, etc, etc.

So that's all good right and we don't need to do anything more?  Nope, not quite.  All this monitoring misses something critical – we're not actually monitoring that the service does what it should.  Yes, it matters whether Apache is installed, the Apache service is running, and you can connect to HTTP but does this actually prove anything about the availability of the service we're managing and providing for our customers?  No again.  You can connect to the port, have the service running and still not be delivering the right content or providing the appropriate functionality to the customer. And ultimately that's what our jobs are all about – delivering service to the customer.  Whether internal ("the business") or an external customer, they don't care about the infrastructure.  Nor the technology, its configuration or anything else about the widgets that deliver the services they use.  They just want "technology" to be:

  1. Available,
  2. Functional, and
  3. Cost-effective.

To deliver (and measure) the first two items on that list we're going to need more than just a check that says the Apache server is up.  We need to demonstrate that the service delivered by that infrastructure was available to our customers AND functioning as intended.  If it isn't functioning as intended, all the availability in the world is meaningless because the customer isn't getting what they want.

(Needless to say most enterprise monitoring measures of "availability" are bogus.  Using an ICMP ping of a host, uptime or checking a process as a measure of availability merely demonstrates that the asset is up.  It doesn't demonstrate that the asset is performing the function it should hence doesn't actually measure "availability".)

All is not lost though, we have the technology, we can rebuild your monitoring environment: better, stronger and more relevant.  How?  By stealing someone else's idea.  You see developers face the same challenge of delivering appropriate functionality.  In their case an application may compile and run but produce incorrect output or worse no output at all.  Like our monitoring, this leaves our developer short on knowing whether they are delivering functionality to the customer. So to ensure that their applications do what they have promised, developers test them. 

There are lots of different kinds of testing: functional tests to confirm things work, performance testing, user acceptance tests to ensure user experience is suitable.  But one kind of testing has become increasing important: behavioural testing.  Behavioural testing checks that each function, method or procedure not only works but behaves in the intended way. Developers call this methodology Behaviour Driven Development or Test Driven Development (BDD and TDD for short). In a BDD/TDD environment each component of your code is tested to ensure it is behaving correctly.  The basic element of this testing is called a unit test.  In BDD, unit tests are developed for each function to determine whether it is fit for use.  Let's look a simple example of a function, one that adds numbers, and a unit test to confirm its behaviour is correct.  We'll start by articulating our function (in pseudo code).

def addition(val1,val2)
  print "Total =" val1 + val2
end

So if our function compiles, executes and returns a result does that mean it works?  No, because we can't guarantee it returns the right result.  To overcome this gap in our knowledge, we devise a simple unit test (again in pseudo code).

test addition
  val1 = 4
  val2 = 6
  total = 10

  result = addition(val1,val2)
  if result != total then print "Function addition failed - incorrect total"
end

In our test we first set the input values and what the resulting output should be (in the object-orientated world these are called "mock" objects and are designed to simulate real objects in a controlled way).  We then run the function we'd like to test and check that the returned result matches the mock output.  If the result doesn't match then we return an error message and we know we need to fix the function. This combination of functional tests and behavioural testing means that not only do we ensure our applications runs but that when it does run it is behaving correctly.

So can we apply Behaviour Driven Development to our infrastructure to test that it is behaving correctly?  Enter Behaviour Driven Infrastructure. Behavioural Driven Infrastructure or BDI applies the principles of Behavioural Driven Development to the management of infrastructure.  And boy is it cool. Let's jump right in and see what a BDI check might look like.  Remember our Apache checks?  Let's design some simple behavioural checks to supplement these, checks that step up from monitoring our infrastructure into monitoring the behaviour of our service:

  • The site contains some static value or content
  • The site contains some dynamic content that can be validated, for example data drawn from a database
  • When I click or follow links I get the right pages returned
  • When I fill in a form the values are validated
  • When I press a button the form is submitted
  • When I select a field or drop-down the right values are populated

Notice the key difference between the checks we defined earlier and these checks?  These checks involve the behaviour of the service rather than its binary status.  Instead of the site being "up" we're testing that the site returns the right content or in other words that the site behaves correctly. We can see that determining tests for a website is relatively easy but what about other types of infrastructure and services?  You can develop similar tests for a wide variety of infrastructure:

  • SSH – check that a particular user can login, an inappropriate user fails and is logged or alerted
  • SMTP – check that the daemon receives an email and delivers it, check it rejects mail it should reject, check authentication works
  • IMAP – Check you can receive email from a mailbox, check authentication works
  • MySQL/database/LDAP/directories – check you can query a record and that the record returned is correct
  • Load balancer – check connections are switched between hosts
  • DNS – check output of DNS queries is correct
  • Backups – backup and restore a file
  • Nagios/Enterprise Monitoring – Check tests pass, fail, escalate, send notifications
  • Samba/NFS – create, change, delete a file
  • Sudo – check you can run a sudo command and check inappropriate sudo commands fail and log

Notice what we're trying to do?  We're testing that the service does what the customer expects it to do.  This not only proves the service is behaving the way we want it to but also demonstrates that the service is available.

So all of these tests sound easy when written down like this but how do we implement them? We're going to articulate our BDI tests in plain English using a tool called Cucumber and also introduce you to a spin-off tool called Cucumber-Nagios (which I talked about in a previous post). 

Cucumber is a behavioural testing framework used heavily in the Ruby community (and in the Java, .NET, Flex communities too) that is simple and easy to learn – even for non-developers. Cucumber-Nagios takes this a little further to combine Cucumber with built-in testing frameworks (web using Webrat and SSH using Net::SSH for example) and outputs test results as Nagios check data.  Perfect for immediate integration into your existing enterprise monitoring solution (and easily hack'able to output as other data formats also).  The beauty of Cucumber-Nagios is that it comes with pre-built tests components that we can adapt to suit our environment. Cucumber has two components:

  1. Plain text tests called "features" which contain the different scenarios we want to test, and
  2. Supporting code called "steps" to actually test each "feature" and its associated scenarios.

Let's create a simple behavioural test for our website.  We start by installing Cucumber-Nagios via a Gem:

$ sudo gem install cucumber-nagios

On some distributions you may need to install gemcutter first:

$ sudo gem install gemcutter

$ sudo gem tumble

The cucumber-nagios gem will install the cucumber-nagios-gen binary which we will use to create a project to hold our tests.

$ cucumber-nagios-gen project test_project

Here we've told the cucumber-nagios-gen binary to create a project called test_project. A project is a mini-application that contains the right directory structure and files to run our tests. Change into the resulting directory:

$ cd test_project

We then need to bundle some supporting gems into the project to allow it to be self-contained:

$ gem bundle

Now we have a local version of cucumber-nagios-gen installed in the bin directory of the project and we can use this to create some features to test:

$ bin/cucumber-nagios-gen feature www.google.com content

This creates a feature in a file called content.feature (each Cucumber file can contain one feature and must have a suffix of .feature).  Let's open this file and examine our feature:

Feature: www.google.com
  It should be up

  Scenario: Visiting home page
    When I go to http://www.google.com
    Then the request should succeed

Cucumber uses a business-readable domain-specific language called Gherkin to write its features. Let's deconstruct what each section of our feature means:

 Feature: Some terse yet descriptive text of what is desired
   In order to realize a named business value
   As an explicit system actor
   I want to gain some beneficial outcome which furthers the goal

   Scenario: Some determinable business situation
     Given some precondition
       And some other precondition
     When some action by the actor
       And some other action
       And yet another action
     Then some testable outcome is achieved
       And something else we can check happens too

   Scenario: A different situation
       ...

The feature starts with a description, in our case www.google.com and then some text that describes the business value of the feature, that the website should be up.  Each scenario used to validate that business value is then listed, each with it's own description and a series of steps that they involve.  There are three types of steps – Given, When or Then:

  1. Givens – put the system into a known state
  2. Whens – describe the key action that is being performed
  3. Thens – observe the outcomes

Which can be summarised as: Given some condition When I do this action Then I will see this outcome.

Each step in our scenario has to start with one of these types (but you don't need to use all of them) and you can see in our example feature we're using a When and a Then:

 When I go to http://www.google.com
 Then the request should succeed

Simple and plain English.  There is a bit more to Gherkin that we haven't touched on (but you can read about at that link) but let's try to run our feature now using the cucumber-nagios binary (obviously you need to be connected to the Internet for the feature to work appropriately):

$ bin/cucumber-nagios features/www.google.com/content.feature
Critical: 0, Warning: 0, 2 okay | passed=2, failed=0, nosteps=0, total=2

We can see that we've selected and executed our feature and it has returned some Nagios plug-in output (which appears as Critical, Warning, or Okay) and that 2 steps are Okay (or passed). 

But wait a second, how have they passed?  We haven't written any code at all and it works?  Well as I mentioned Cucumber-Nagios contains a set of pre-defined steps for a variety of common tasks.  You can use these steps and not have to write any code.  Let's look at the associated step we've just used.  This pre-defined step is contained in the features/steps/webrat.steps file in our project:

When /^I go to (.*)$/ do |path|
  visit path
end

You can see it's a very simple bit of code that uses a regular expression to check our feature file for some specific language, in this case the words "I go to URL".  The regular expression captures a URL and passes it to Webrat which runs the visit function and returns the result.  This is passed to the next step:

Then the request should succeed

Cucumber-Nagios also contains a set of pre-defined steps for handling the results, the Then steps.  These steps are contained in the features/steps/result_steps.rb file.  In this case we've used the following step:

Then /^the (.*) ?request should succeed/ do |_|
   success_code?.should be_true
end

This step checks the result of the When step and if it registered a success then the step passes and returns an Okay result.

Now, let's see how we can add another scenario to our feature.

Feature: www.google.com
  It should be up
  You should be able to click on the Videos link

  Scenario: Visiting home page
    When I go to http://www.google.com
    Then the request should succeed

  Scenario: Clicking on the Videos link
    When I go to http://www.google.com
      And I follow "Videos"
    Then I should see "Google Videos"

In our new scenario we've tested following a link on the Google site to the Google Videos site.  We've also used another piece of Cucumber statement, And, which is a cleaner way of writing multiple Given-When-Then steps.

This new When step also uses a pre-defined step from Cucumber-Nagios:

When /^I follow "(.*)"$/ do |link|
   click_link(link)
end

And a pre-defined Then step:

Then /^I should see an? (\w+) message$/ do |message_type|
  response.should have_xpath("//*[@class='#{message_type}']")
end

We can then run our new scenario:

$ bin/cucumber-nagios features/www.google.com/content.feature
Critical: 0, Warning: 0, 5 okay | passed=5, failed=0, nosteps=0, total=5

And see that we've now got 5 steps that pass, including our 3 new steps.  Simple eh?  And readily extensible.

So we've seen that Cucumber and BDI allows infrastructure testing but we can also extend this concept further.  So we've seen that Cucumber can establish that the service is behaving correctly but what if we also used it to test for business logic and business rules also?  Let's take a simple example, expressed here as a feature:

Feature: homeloansite.com
  It should be up
  And I should be able to find the current Rate

  Scenario: Checking the current interest rate
    When I visit "http://homeloansite.com"
    Then I should see the Rates section
    Then I should see current Rate
      And the current Rate should equal 5%

You'd need to write an additional step to return the Rate value in Cucumber-Nagios to get this example to work but assuming we had we could then run our new feature:

$ cucumber-nagios features/homeloansite.com/homeloan.feature
Critical: 0, Warning: 0, 4 okay | value=4.000000;;;;

In this feature we've checking four steps:

  1. That the Home Loan site is up
  2. That it has a section called "Rates"
  3. That the section contains a Rate, and
  4. That the Rate equals 5%

So in this single scenario we've killed a lot of birds with one cucumber.  We've confirmed the site is up, we've confirmed that the right content is being delivered and lastly we've confirmed that the information being delivered is correct.  This goes beyond our traditional infrastructure testing to confirm that a piece of business data, our home loan rate, is correct.  Using this test we could then configure Nagios to alert the appropriate business contact that the website was displaying the wrong rate.  And hey presto we're delivering business value.

This feature is just the tip of the iceberg.  We could do a whole collection of other things, for example test transaction limits for a finance application, confirm access controls for applications, check the output of reports, and anything else where business logic or rules are exposed and can be exercised and tested.

Lastly, it's important to remember we're not replacing our traditional checks when using Cucumber.  We still need to watch the low level components we're just adding another, more powerful and insightful layer to our monitoring.  A layer that more accurately represents the value that you offer to the business than graphs of ICMP ping results.

Puppet 0.25.1 Released!

November 22nd, 2009 by kartar No comments »

This post is a little late… but.

Puppet 0.25.1 – code name “zoot” – is now available. The 0.25.1 release is a maintenance release in the 0.25.x branch.

The release is available via tarball and gem.

Please report issues and feedback via the Reductive Labs Redmine site:

Please select an affected version of 0.25.1.

RELEASE NOTES

* We’ve clarified that the new ‘require’ function only works for 0.25.x clients. If the function is specified with 0.24.x or earlier clients the class will be included but the inherent dependency will not be created. A warning message will be generated informing you of this.

* Node regular expression matching rules have been clarified.

* The Nagios serviceescalation type now supports the use of the servicegroup_name attribute.

* The Puppet gem now installs all binaries to the ‘bin’ directory because Gems lack support for both a ‘bin’ and ’sbin’ directory. Facter (version later than 1.5.1) is now also a dependency for the gem.

* The zone type now works with OpenSolaris

* You can now specify null values for environment variables in the cron type

* The Vim syntax highlighting now identifies new regex structures

CHANGELOG

* Bug #1538: Yumrepo sets permissions wrongly on files in /etc/yum.repos.d

* Bug #1719: Puppetd runtime increase dramaticilly after upgrading to 24.6

* Bug #1742: –color accepts parameters other than true, false, ansi,html – but produces “nil” output

* Bug #1900: Parsing of quoted $ in stdin

* Bug #1908: cron environment does not allow empty values

* Bug #2508: misleading error about ActiveRecord versions

* Bug #2534: Parser should raise an error if you specify the same property twice

* Bug #2600: Master under mongrel wrong number of arguments (3 for 2)

* Bug #2601: fqdn_rand raises exception when passed a seed

* Bug #2605: Ruby 1.8.1 compatibility – #1963 fix uses method not in 1.8.1

* Bug #2606: Gems can’t handle binaries in the sbin directory

* Bug #2607: 0.25 gem does not have facter as a dependency

* Bug #2608: install.rb will not run on ruby 1.9.1 due to ftools being deprecated

* Bug #2612: vim syntax highlighting of new regex language features

* Bug #2613: Autorequire fails when a directory’s path has a trailing /

* Bug #2615: YAML sometimes modifies the contents of string data

* Bug #2616: Locking error in tagmail

* Bug #2618: Spurious test falures when testing redhat service providers on debian varients

* Bug #2619: Fresh 0.25.0 client cannot ‘authenticate’ to 0.25.0 puppetmaster.

* Bug #2620: Regex problem in puppetmaster auth.conf

* Bug #2621: possible JSon serialization issue (on debian/lenny/amd64)

* Bug #2622: puppetdoc returns undefined method ‘[]‘

* Bug #2626: Unhelpful error message

* Bug #2627: Node regular expressions only work in some cases

* Bug #2632: require doesnt seem to work

* Bug #2634: nagios type serviceescalation should support servicegroup_name

* Bug #2637: SSL socket race condition under webrick

* Bug #2638: inconsistent behaviour when more than one “node /foo/ { }” stanza matches.

* Bug #2639: Fail to store reports in simple default config

* Bug #2640: runit service provider does not create symlinks

* Bug #2642: runit service provider doesn’t have a restart command

* Bug #2648: macauthorization provider spuriously changes values when not needed.

* Bug #2651: Directory permissions on man pages can be incorrect

* Bug #2652: syntax error in lib/puppet/util/selinux.rb according to Fedora 11 ruby 1.8.6

* Bug #2654: Confusing error message when a provider lacks a feature

* Bug #2656: Puppet –parseonly tests hang forever

* Bug #2661: puppetd exits if the master is unreachable.

* Bug #2664: regexp parse error

* Bug #2665: regex problem with package names containing ++

* Bug #2668: Too many facts: request-URI Too Large

* Bug #2672: Cannot have underscores in node name

* Bug #2674: createpackage.sh: problem finding install.rb

* Bug #2676: lib/puppet/agent.rb apparent typo

* Bug #2679: Possible regression

* Bug #2681: “Duplicate generated resource;skipping” for each managed resource

* Bug #2685: Got an uncaught exception of type TypeError

* Bug #2686: ActiveSupport >= 2.3.3 forces use of defective JSON library

* Bug #2688: macauthorization provider now doesn’t deal with booleans correctly.

* Bug #2689: Running puppet as non-root => getting rid of all those ownership warnings

* Bug #2691: “Could not retrieve catalog: HTTP-Error: 500 Internal Server Error” with tagged exported resources

* Bug #2697: provider/portage.rb: update-eix is deprecated

* Bug #2698: provider/portage.rb: format string has changed (again)

* Bug #2699: Configurable port in the included Red Hat init script is broken

* Bug #2702: puppetdoc rdoc mode fails if outputdir not specified

* Bug #2707: ‘config_version’ should behave better on failure

* Bug #2711: Storeconfigs don’t work with puppet command

* Bug #2734: classfile is only 1 byte big

* Bug #2735: External node classes aren’t added to the class list on compile startup

* Bug #2736: Ssh_authorized_key target changed?

* Bug #2737: The zone provider needs to get acquainted with OpenSolaris

* Bug #2739: puppetmasterd 0.25.1rc2 is not logging anywhere

* Bug #2745: fakedata iteration in specs is borked.

* Bug #2750: puppetd: setting the :cacrl to ‘false’ is deprecated

* Bug #2751: Red Hat initscripts kill an independently started puppetd/puppetmasterd

* Bug #2752: require function does not work in ‘puppet’

* Bug #2753: fileserver.conf allow/deny directives not honored for [modules], [plugins]

* Feature #2393: We should maintain a dynamically-built ‘next’ branch

Getting Help for Puppet and Facter

November 7th, 2009 by kartar No comments »

I just thought I’d mention some of the places you can get help with Puppet and Facter.

The first is the Puppet users mailing list (and for development related questions – the puppet-dev list too)

Also available is the #puppet IRC channel on Freenode where a lot of helpful people lurk and can answer questions (needless to say a lot of really interesting sysadmin related rant^H^H^H discussion also takes places there too).  Feel free to join and jump right in with a question and if you’re pasting in configuration or log output don’t forget to use the pastie bot or link to a pastie of your data.

For the documentation you can find it at the Wiki (we’re working on a new one – really, we are…).

Some useful links are the Configuration and Type References and the Language Tutorial.

Finally, if you’ve got a bug or an error message or you’re just stuck and can’t find help in some of the places I’ve just mentioned then we’d love it if you would log a ticket at the Reductive Labs Redmine site:

Specifically you can log issues for Puppet or Facter.

Please remember to include the Puppet (or Facter) version you are using (select it from the Affected Version drop-down), your platform and any log or trace output you have.  We recommend running your master and client with the –verbose –trace –debug options to get the most possible data out before logging the ticket.  That’ll help us resolve your issue.

And obviously I’d be remiss if I didn’t mention (disclaimer – I don’t work for Reductive Labs but Luke buys me drinks and I’d like him to be able to continue to do that) that Reductive does sell support for Puppet.

Hope that helps someone!

San Francisco and Puppetcamp

October 5th, 2009 by kartar 1 comment »

So I am at SFO waiting to board my plane back to Australia – it’s been a long, long, long eight days.  I arrived in SFO on Saturday morning and spent a bunch of days sight seeing, working, a little writing and then actually putting together the slides and videos for my talk on Thursday.  Over the course of the week I ate loads of Mexican food (and other good foods!), drank some good beer, found a nice dive bar – hi Lillian and other bar flies, visited Borderlands Books, and did touristy stuff like the Castro, Haight, Alcatraz and some shopping. Then came Puppet Camp on the Thursday and Friday.

The Puppet Camp itself was great … actually better than great … excellent.  Not just because of the good technical value (the signal to noise was one of the highest of any conference I’ve been to) but also the good conversations and putting names to faces – hi Brice, Paul x2, Rein, Markus, Beth, Dan, Nigel (“get a dog up ya”), Andrew x2, Paul, Miah, Ben, David, Deepak, Alessandro, Carl, Michael, and probably a dozen others I am forgetting.  Also great to see Luke and Teyo.

The structure of the event was particularly pleasing … structured talks in the morning and unconference sessions in the afternoon – some great topics discussed there too and you’ll see the results of these sessions feed back to the mailing list (some already have been) and reflected in code and feature tickets, for example the Facter “refacter” took a big step forward thanks to Paul and Rein sitting down and nutting out ideas.

The conference was also more than just Puppet – it was a gathering of serious sys admin, ops, development and engineering players who are serious about infrastructure management (and some heavy players in the “infrastructure == code” world) and life cycle.  There were a lot of “war stories” and lessons learnt being shared – I quite frequently overheard “Oh! I never thought of it that way” or “Yeah we have the problem too – this is how we handle it…”.  Even saw some serious “Oh I get it now!” moments – during Brice’s talk on Stored Configuration for example and Paul Nasrat’s discussion on Facter and RSpec testing (“Testing manifests with RSpec? Oh wow…”).

It was also great to see a lot of “dev” and “ops” people talking openly and bridging the (potentially non-existent?) divide between their roles in the life cycle – it is through these sorts of interactions between sometimes divergent views that progress is made and solutions get generated.  The differing perspectives on the nature of the problems themselves also sparked some really spirited and thoughtful discussion – I expect to see some interesting blog posts come out of the conference.

Lastly, we even managed to push some code … stored Configuration using Oracle XE will be in the next major release – “Rowlf”.

I am greatly looking forward to the next one … quick lessons learnt for me would be: somewhere closer to Downtown, skip dinner on the first night, keep the 50/50 structured/unconference|OpenSpaces model, and that SFO is a good place for it.

Thanks for the team at Reductive for organising it and well done!

Metaparameter Reference Added

September 29th, 2009 by kartar No comments »

A small change in the way the Puppet reference documents are structured in the wiki.  We’ve split out the Metaparmeter reference documentation from the Type Reference documentation and created a new page to hold them.

SysAdmin mini-conf CPF extended

September 29th, 2009 by kartar No comments »

I’m helping (a little) organise the SysAdmin mini-conf at linux.conf.au 2010 (Wellington, NZ) and we’re currently in “call for papers” mode.  We’re looking for a range of talks – short, medium, long – on cool SysAdmin related things:

Systems Administration, Backups, Security, Troubleshooting, Buying Decisions, Virtualisation, Enterprise Monitoring and Management, Identity Management, Web and Email management, Wiki, Clustering and High Availability, Log Management, Spam and Virus Filtering, VOIP, Ticketing systems, Bootstrapping and automated installation, Configuration Management and packaging.

If you do something cool in one of these areas I think you definitely should submit!  Submissions close 15/10/2009 so get in quick.

San Francisco – Puppet Camp

September 28th, 2009 by kartar No comments »

I am in sunny San Francisco for Puppet Camp.  Looking forward to meeting a whole bunch of Puppeteers in person finally. Woot!

Actually got a lot of work done on the plane – upgraded to business class – on Pro Puppet. Chapters 1 & 2 are pretty much done and I’ve made a start on Chapter 3.  Since I’m not actually technically scheduled to start writing yet that’s pretty damn good. I rock. :)

In the meantime here are two pictures that took my fancy – the cleverly named Urbun Burger – I bet the Urban Burger guys in Melbourne are cursing that they didn’t think of that… :) And a whole store dedicated to Halloween costumes…