A Whole Lot of Heka

As most people know I am a logging fanboi. I’ve been endlessly fascinated with logs and log management. Last year I even took that a step too far and wrote a book about Logstash. Consequently, I am always looking at new logging tools, tricks and techniques. One of those popped up a while ago: Heka.1 When I saw Heka last year I made a note to look at it more closely. But due to … well life … I never got around to it. It was mentioned to me again this week and since I had a few spare cycles I thought I’d install it and try it out.2

Heka is written in Go by the Mozilla Services Team and released as open source under the Mozilla Public License (v. 2.0). Like Logstash it is built as a core logging engine with plugins for inputting, decoding, filtering, encoding and outputting data. Heka plugins can be written in either Go or in some cases Lua.

Installing Heka 🔗︎

Heka is available as binary and package installers for Linux and OS X in a variety of flavors (Tarball, DMG, DEB, and RPM) or you can install it from source. I’d recommend sticking with the binaries or packages, the source installation is complex and prerequisite-heavy.

You can download the binary or package of your choice from here. I’m going to install a DEB package on an Ubuntu host for my local testing.

$ wget https://github.com/mozilla-services/heka/releases/download/v0.7.1/heka_0.7.1_amd64.deb
$ sudo dpkg -i heka_0.7.1_amd64.deb

This will install the Heka daemon, hekad, to /usr/bin and some supporting files and binaries. It doesn’t install any service management or configuration so you’ll need to provide that yourself.

Configuring Heka 🔗︎

Now we’ve got it installed let’s try running the hekad daemon.

$ hekad
-config="/etc/hekad.toml": Config file or directory. If directory is specified then all files in the directory will be loaded.
-version=false: Output version and exit

Okay - looks like we need some configuration to get started. The file extension on the /etc/hekad.toml file is interesting3 as it indicates the project is using the TOML configuration format.

So let’s create a Heka configuration. What we want our example configuration to do is:

  1. Watch /var/log/auth.log.
  2. Grab any incoming events.
  3. Send those events to STDOUT.

To do this let’s create a hekad.toml file in /etc.

$ vi /etc/hekad.toml

And populate our file with configuration:

log_directory = "/var/log"
file_match = 'auth\.log'

append_newlines = false

message_matcher = "TRUE"
encoder = "PayloadEncoder"

The TOML format breaks our configuration into sections marked with [section] blocks. Each section we’ve specified correlates with a Heka plugin. We could specify a [hekad] section for configuring the Heka daemon itself. Let’s step through each section.

The [LogstreamerInput] section is a LogstreamerInput input plugin that we’ve defined. The LogstreamerInput input tails one or more log files. In our case the /var/log/auth.log file. This will watch the /var/log/auth.log file and grab any incoming events for processing by Heka.

In our case each line will be grabbed and become the payload for a Heka event. Usually we’d also have a decoder plugin specified in the input. A decoder plugin parses the contents of the line and extracts useful data from it, for example you might use a rsyslog decoder to extract Syslog information from our log file. This adds context from each line like the process and priority.

So thus far we’ve defined an input to gather events and processed those events into Heka’s event format. Now we want to do something with the events. Here we could filter them or output them in a variety of formats. In our case we’re going to output events to STDOUT. To do this we define these two sections:

append_newlines = false

message_matcher = "TRUE"
encoder = "PayloadEncoder"

First we define an encoder plugin called PayloadEncoder. Encoders turn Heka events into other formats, for example generate alert events. In this case the PayloadEncoder plugin extracts the payload from the Heka event and convert it into a byte stream.

Next, we’ve defined the LogOutput output plugin. This plugin logs messages to STDOUT using Go’s log package. Inside the output plugin we’ve used the message_matcher field to match messages. In our case we’ve used TRUE to grab all messages. You can configure this field in output plugins to make a variety of decisions about what messages to process with the output.

It also compares favorably with the equivalent Logstash configuration.

input {
  file { path => "/var/log/auth.log" }

output {
  stdout { codec => rubydebug }

Running Heka 🔗︎

Now we’ve got a simple configuration let’s run Heka and see what happens. We start Heka by running the hekad binary and specifying the location of our configuration file with -config.

$ hekad -config="/etc/hekad.toml"
2014/09/09 14:24:44 Pre-loading: [LogstreamerInput]
2014/09/09 14:24:44 Pre-loading: [PayloadEncoder]
2014/09/09 14:24:44 Pre-loading: [LogOutput]
2014/09/09 14:24:44 Pre-loading: [ProtobufDecoder]
2014/09/09 14:24:44 Pre-loading: [ProtobufEncoder]
2014/09/09 14:24:44 Loading: [ProtobufDecoder]
2014/09/09 14:24:44 Loading: [PayloadEncoder]
2014/09/09 14:24:44 Loading: [ProtobufEncoder]
2014/09/09 14:24:44 Loading: [LogstreamerInput]
2014/09/09 14:24:44 Loading: [LogOutput]
2014/09/09 14:24:44 Starting hekad...
2014/09/09 14:24:44 Output started:  LogOutput
2014/09/09 14:24:44 MessageRouter started.
2014/09/09 14:24:44 Input started: LogstreamerInput
2014/09/09 14:24:44 Sep  9 14:15:01 docker CRON[24650]: pam_unix(cron:session): session opened for user root by (uid=0)

We can see that Heka has loaded some required plugins and started the daemon. We’ve also started the LogOutput output and the LogstreamerInput input have also both been started. The LogstreamerInput opens the /var/log/auth.log file and reads it from the first entry. If the file has previously been opened by the input then it’ll resume from that point.

We can then see a line from our /var/log/auth.log file outputted via the LogOutput output to STDOUT. So looks like our very simple configuration worked and we’re getting the right output.

Conclusions 🔗︎

Overall, the whole project is still pretty young, for example it has nowhere near the corpus of plugins and integrations that Logstash provides. I find the configuration language somewhat cumbersome but that is likely a reaction to my familiarity with Logstash’s configuration format rather than an inherent flaw. Like Logstash, Heka does have a web console but it lacks the power of Kibana. Although it is also possible to output Heka logs to Elasticsearch and use Kibana on top. I’ve always thought Logstash’s natural flow into Elasticsearch does make search very intuitive and being able to replicate it here is potentially makes Heka a seamless drop-in for an end user.

The documentation is detailed and solid and certainly has a stylistic edge on Logstash’s somewhat rough visual presentation. The Heka documentation could do with some easier getting started material and more tutorial-oriented material. The project could also do with some better packaging to add service management and ensure it works out of the box.

It does have some fundamentally interesting differences to Logstash. It’s written in Go which is likely to result in pretty solid performance and make concurrency a lot simpler. I find Logstash fast but JRuby in the JVM is a pretty hefty runtime and often a lot of overhead. I also very much like the dynamic plugin model. Having to stop and start Logstash to reconfigure it is one of my pet peeves about Logstash. I’m going to continue to keep an eye on Heka and see how it evolves.

  1. Apparently named for the Hekatonkheires - hundred-armed giants from Greek mythology. ↩︎

  2. Thanks to all the folks in the #heka channel on Freenode who answered my dumb questions too. ↩︎

  3. And/or annoying. Please stop inventing configuration file formats folks. You know who you are people. ↩︎

comments powered by Disqus