Monitoring Survey 2015 - Metrics

2015-08-10 1162 words 6 minutes

Contents

In the last posts I talked about the tools people used in monitoring, the demographics, and what environments people monitor. In this post I am going to look at the questions around collecting metrics and what those metrics are used for by respondents.

As I’ve mentioned in previous posts, the survey got 1,116 responses of which 884 were complete.

This post will cover the questions:

7. Do you collect metrics on your infrastructure and applications?
8. What tools do you use to collect metrics?
9. What tools do you use to store your metrics?
10. What tools do you use to visualize your metrics?
11. If you collect metrics, what do you use the metrics you track for?

Collecting Metrics

Question 7 asked if the respondents collected metrics. It was a Yes/No question.

We can see that the overwhelming majority, 88% in fact, of respondents collect metrics (slightly down from 90% last year). That continues to be a pretty conclusive indication that metrics matter.

I also broke the responses down by organization size. I was curious to see what size organizations collected the least metrics.

We can see that there a pretty even distribution of people that do not collect metrics across organization size.

Metric collection tools

I also asked respondents to tell me about the tools they used to collect metrics. There was a choice of potential tools and an Other option. The choice of tools included:

collectd
Cube
DataDog
Ganglia
Librato
Munin
New Relic
OpenTSDB
StatsD

We can see that both collectd and StatsD are heavily used with New Relic coming in third, in keeping with the data revealed in the tool analysis results.

The results of the Other question was also interesting. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics collection tools - Other	#
In-house	77
Diamond	26
Sensu	23
Zabbix	19
ELK	17
Cacti	16
Nagios	13
Check_MK	13
Centreon	11
pnp4nagios	9
Splunk	9
SolarWinds	8
AppDynamics	7
Prometheus	6
Icinga2	6
NetCrunch	6
Shinken	5
Zenoss	5
jmxtrans	5
DropWizard	4
Observium	4
Dataloop	4
OpenNMS	4
Riemann	3
Coda’s Metrics	3
Cloudwatch	2
OMD	2
Dynatrace	2
Smokeping	2
Graphite	2
Stackdriver	2
Xymon	2
CopperEgg	2
Ganglia	2
LogicMonitor	2
SignalFX	2

The high number respondents building their own metrics collection tools (77 reported having in-house tooling) is interesting. It potentially suggests that there is still a segment of the market that isn’t happy with the available tooling out there.

Also interesting was the support for Diamond, a Python-based metrics collection tools originally written by the Brightcove team and now maintained as a separate open source project.

Metric storage tools

We also asked respondents to name the tools they used to store metrics. The options for the question included:

DataDog
Graphite
Hosted Graphite
InfluxDB
Librato
OpenTSDB
RRDtool

There was also an Other option we’ll report below.

The clear winner here is Graphite. As one of the longer standing tools in the metrics space it’s not overly surprising it is so well represented. Also present in large numbers is RRDTool, an even older tool in the metric’s space. The newer generation of tools is represented by InfluxDB.

These are the responses to the Other option. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics storage tools - Other	#
ELK	28
In-house	27
Splunk	14
Zabbix	14
New Relic	9
MySQL	8
Prometheus	8
Cacti	8
SignalFX	7
AppDynamics	6
NetCrunch	6
Dataloop	5
SolarWinds	5
Stackdriver	4
Zenoss	4
Cassandra	4
CopperEgg	3
MSSQL	3
Ganglia	3
postgreSQL	2
Circonus	2
LogicMonitor	2
Check_MK	2
pnp4nagios	2
SPM	2
OpenNMS	2
kairosdb	2
Xymon	2
Redis	2

Interesting to note here is the people using the ELK stack and in-house tools to store their metric data. I’ve been seeing a lot of tools and services converting data and metrics into Logstash’s JSON format and using Logstash as a filtering router and Elasticsearch as storage.

Metric visualization tools

Our last question focussed on metrics visualization tools.

Respondents had a choice of the following tools:

D3
Grafana
Graphene
Graphite
Highcharts
Rickshaw
Tessera

Respondents could also select an Other option and specify other tools.

Here Grafana is a clear favorite. Likely given its ability to sit on top of Graphite, InfluxDB and OpenTSDB. The next largest tool was Graphite itself and then, with a long drop-off, the D3 Javascript framework.

These are the responses to the Other option. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics Visualization tools - Other	#
In-house	54
ELK	35
pnp4nagios	27
DataDog	24
Cacti	22
Zabbix	17
Splunk	13
Munin	13
New Relic	10
Ganglia	8
Observium	7
Librato	7
NetCrunch	7
Centreon	6
AppDynamics	6
SolarWinds	6
Dataloop	5
RRDTool	5
Dashing	5
OpenNMS	5
SignalFX	4
Stackdriver	4
Promdash	4
Check_MK	4
MRTG	3
pnp	3
Nagios	3
Circonus	3
Graphite	3
Tableau	3
CopperEgg	3
Xymon	3
Metrilyx	2
Riemann	2
Zenoss	2
LogicMonitor	2
SPM	2
Nagiosgraph	2
OpenTSDB	2
StatusWolf	2
Visage	2

Again present are a lot of in-house tools and the ELK stack in the form of Kibana. Given the presence of lots of Nagios users it’s also not a surprise to see pnp4nagios represented.

The purpose of metrics collection

I also asked respondents why they collected metrics. As with last year I was curious whether respondents were collecting data for performance analysis or as a fault detection tool. There’s a strong movement in more modern monitoring methodologies to consider metrics a fault detection tool in their own right. I was interested to see if this thinking had grown from last year.

Respondents were able to select one or more choice from the list of:

Performance analysis and trending
Fault and Anomaly detection
Capacity Planning
A/B Testing
We don’t do anything with collected metrics
Other

If respondents selected “No”, that they did not collect metrics, the previous question logic skipped them to the next question.

I’ve produced a summary table of respondents and their selections.

Metrics Purpose	%
Performance analysis and trending	63%
Fault and Anomaly detection	53%
Capacity Planning	45%
A/B Testing	11%
We don’t do anything with collected metrics	3%

We have see that 63% of respondents specified performance analysis and trending as a reason for collecting metrics. Below that 53% of respondents specified that they used metrics for Fault and anomaly detection. This is 10% lower than last year’s survey. The next largest group, 45%, used metrics for capacity planning.

A very small group, 11%, used metrics for A/B testing.

I also summarized the Other responses as a table

Metrics Purpose - Other	#
Reporting	5
Dashboards	4
Alerting	3
Business KPIs	2
Slow call traces	1
Marketing	1
Retrospectives	1
Power management	1
Fault diagnosis	1
Incident response	1
Billing	1

P.S. I am also writing a book about monitoring.

The posts: