opentsdb - Metrics for a distributed world Oliver Hankeln / gutefrage.net @mydalon
Who am I? Senior Engineer - Data and Infrastructure at gutefrage.net GmbH Was doing software development before DevOps advocate
Who is Gutefrage.net? Germany s biggest Q&A platform #1 German site (mobile) about 5M Unique Users #3 German site (desktop) about 17M Unique Users > 4 Mio PI/day Part of the Holtzbrinck group Running several platforms (Gutefrage.net, Helpster.de, Cosmiq, Comprano,...)
What you will get Why we chose opentsdb What is opentsdb? How does opentsdb store the data? Our experiences Some advice
Why we chose opentsdb
We were looking at some options Munin Graphite opentsdb Ganglia Scales well no sort of yes yes Keeps all data Creating metrics no no yes no easy easy easy easy
Bingo! We have a winner! Munin Graphite opentsdb Ganglia Scales well no sort of yes yes Keeps all data Creating metrics no no yes no easy easy easy easy
Separation of concerns $ unzip strip touch finger grep mount fsck more yes fsck fsck fsck umo unt sleep
The ecosystem App feeds metrics in via RabbitMQ We base Icinga checks on the metrics We evaluate etsy Skyline for anomaly detection We deploy sensors via chef
opentsdb Written at StumbleUpon but OpenSource Uses HBase (which is based on HDFS) as a storage Distributed system (multiple TSDs)
The big picture UI TSD This is really a cluster tcollector API TSD TSD TSD HBase
Putting data into opentsdb $ telnet tsd01.acme.com 4242 put proc.load.avg5min 1382536472 23.2 host=db01.acme.com
It gets even better tcollector is a python script that runs your collectors handles network connection, starts your collectors at set intervals does basic process management adds host tag, does deduplication
A simple tcollector script #!/usr/bin/php <?php#cast a die$die = rand(1,6);echo "roll.a.d6 ". tim
What was that HDFS again? HDFS is a distributed filesystem suitable for Petabytes of data on thousands of machines. Runs on commodity hardware Takes care of redundancy Used by e.g. Facebook, Spotify, ebay,...
Okay... and HBase? HBase is a NoSQL database / data store on top of HDFS Modeled after Google s BigTable Built for big tables (billions of rows, millions of columns) Automatic sharding by row key
How opentsdb stores the data
Keys are key! Data is sharded across regions based on their row key You query data based on the row key You can query row key ranges (say e.g. A...D) So: think about key design
Take 1 Row key format: timestamp, metric id
Take 1 Row key format: timestamp, metric id 1382536472, 5 17 Server A Server B
Take 1 Row key format: timestamp, metric id 1382536472, 5 17 1382536472, 6 24 Server A Server B
Take 1 Row key format: timestamp, metric id 1382536472, 5 17 1382536472, 6 24 1382536472, 8 12 1382536473, 5 134 Server A 1382536473, 6 10 1382536473, 8 99 Server B
Take 1 Row key format: timestamp, metric id 1382536472, 5 17 1382536472, 6 24 1382536472, 8 12 1382536473, 5 134 Server A 1382536473, 6 10 1382536473, 8 99 1382536474, 5 12 Server B 1382536474, 6 42
Solution: Swap timestamp and metric id Row key format: metric id, timestamp 5, 1382536472 17 6, 1382536472 24 8, 1382536472 12 5, 1382536473 134 Server A 6, 1382536473 10 8, 1382536473 99 5, 1382536474 12 Server B 6, 1382536474 42
Solution: Swap timestamp and metric id Row key format: metric id, timestamp 5, 1382536472 17 6, 1382536472 24 8, 1382536472 12 5, 1382536473 134 Server A 6, 1382536473 10 8, 1382536473 99 5, 1382536474 12 Server B 6, 1382536474 42
Take 2 Metric ID first, then timestamp Searching through many rows is slower than searching through viewer rows. (Obviously) So: Put multiple data points into one row
Take 2 continued 5, 1382608800 5, 1382612400 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
Take 2 continued Row key 5, 1382608800 5, 1382612400 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
Take 2 continued Cell Name Row key 5, 1382608800 5, 1382612400 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
Take 2 continued Cell Name Data point Row key 5, 1382608800 5, 1382612400 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
Where are the tags stored? They are put at the end of the row key Both metric names and metric values are represented by IDs
The Row Key 3 Bytes - metric ID 4 Bytes - timestamp (rounded down to the hour) 3 Bytes tag ID 3 Bytes tag value ID Total: 7 Bytes + 6 Bytes * Number of tags
Let s look at some graphs
Our experiences
What works well We store about 200M data points in several thousand time series with no issues tcollector is decoupling measurement from storage Creating new metrics is really easy
Challenges The UI is seriously lacking no annotation support out of the box Only 1s time resolution (and only 1 value/s/time series)
salvation is coming OpenTSDB 2 is around the corner millisecond precision annotations and meta data improved API
Friendly advice Pick a naming scheme and stick to it Use tags wisely (not more than 6 or 7 tags per data point) Use tcollector wait for opentsdb 2 ;-)
Questions? Please contact me: oliver.hankeln@gutefrage.net @mydalon I ll upload the slides and tweet about it