Hynek Schlawack. Get Instrumented. How Prometheus Can Unify Your Metrics

Size: px
Start display at page:

Download "Hynek Schlawack. Get Instrumented. How Prometheus Can Unify Your Metrics"

Transcription

1 Hynek Schlawack Get Instrumented How Prometheus Can Unify Your Metrics

2 Goals

3 Goals

4 Goals

5 Goals

6 Goals

7 Service Level

8 Service Level Indicator

9 Service Level Indicator Objective

10 Service Level Indicator Objective (Agreement)

11 Metrics

12 Metrics avg latency

13 Metrics avg latency :00 12:01 12:02 12:03 12:04

14 Metrics avg latency server load :00 12:01 12:02 12:03 12:04

15

16 Instrument

17 Instrument

18 Instrument

19 Instrument

20 Instrument

21

22

23 Metric Types

24 counter Metric Types

25 Metric Types counter gauge

26 Metric Types counter summary gauge

27 Metric Types counter gauge summary histogram

28 Metric Types counter gauge summary histogram buckets (1s, 0.5s, 0.25, )

29 Averages

30 Averages avg(request time) avg(ux)

31 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8

32 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8

33 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8

34 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8 median({1, 1, 1, 1, 10}) = 1

35 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8 median({1, 1, 1, 1, 10}) = 1

36 Averages avg(request time) avg(ux) avg({1, 1, 1, 1, 10}) = 2.8 median({1, 1, 1, 1, 10}) = 1 median({1, 1, 100_000}) = 1

37 Percentiles

38 Percentiles n th percentile P of a data set = P n% of values

39

40 50 th percentile = 1 ms

41 50 th percentile = 1 ms 50% of requests done by 1 ms

42 Percentiles

43 Percentiles P {1, 1, 100_000} 50 th 1

44 Percentiles P {1, 1, 100_000} 50 th 1 95 th 90_000

45 Naming

46 Naming backend1_app_http_reqs_msgs_post backend1_app_http_reqs_msgs_get

47 Naming backend1_app_http_reqs_msgs_post backend1_app_http_reqs_msgs_get app_http_reqs_total

48 Naming backend1_app_http_reqs_msgs_post backend1_app_http_reqs_msgs_get app_http_reqs_total

49 Naming backend1_app_http_reqs_msgs_post backend1_app_http_reqs_msgs_get app_http_reqs_total

50 Naming backend1_app_http_reqs_msgs_post backend1_app_http_reqs_msgs_get app_http_reqs_total{meth="post", path="/msgs", backend="1"} app_http_reqs_total{meth="get", path="/msgs", backend="1"}

51

52

53 1. resolution = scraping interval

54 1. resolution = scraping interval 2. missing scrapes = less resolution

55 Pull: Problems short lived jobs

56

57 Pull: Problems short lived jobs target discovery

58 Configuration scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090'

59 Configuration scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090'

60 Configuration scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090'

61 Configuration scrape_configs: - job_name: 'prometheus' static_configs: - targets: - 'localhost:9090' {instance="localhost:9090",job="prometheus"}

62

63 Pull: Problems target discovery short lived jobs Heroku/NATed systems

64 Pull: Advantages

65 Pull: Advantages multiple Prometheis easy

66 Pull: Advantages multiple Prometheis easy outage detection

67 Pull: Advantages multiple Prometheis easy outage detection predictable, no self-dos

68 Pull: Advantages multiple Prometheis easy outage detection predictable, no self-dos easy to instrument 3 rd parties

69 Metrics Format # HELP req_seconds Time spent \ processing a request in seconds. # TYPE req_seconds histogram req_seconds_count req_seconds_sum

70 Metrics Format # HELP req_seconds Time spent \ processing a request in seconds. # TYPE req_seconds histogram req_seconds_count req_seconds_sum

71 Metrics Format # HELP req_seconds Time spent \ processing a request in seconds. # TYPE req_seconds histogram req_seconds_count req_seconds_sum

72 Metrics Format # HELP req_seconds Time spent \ processing a request in seconds. # TYPE req_seconds histogram req_seconds_count req_seconds_sum

73 Metrics Format # HELP req_seconds Time spent \ processing a request in seconds. # TYPE req_seconds histogram req_seconds_count req_seconds_sum

74 Percentiles req_seconds_bucket{le="0.05"} 0.0 req_seconds_bucket{le="0.25"} 1.0 req_seconds_bucket{le="0.5"} req_seconds_bucket{le="0.75"} req_seconds_bucket{le="1.0"} req_seconds_bucket{le="2.0"} req_seconds_bucket{le="+inf"} 390.0

75 Percentiles req_seconds_bucket{le="0.05"} 0.0 req_seconds_bucket{le="0.25"} 1.0 req_seconds_bucket{le="0.5"} req_seconds_bucket{le="0.75"} req_seconds_bucket{le="1.0"} req_seconds_bucket{le="2.0"} req_seconds_bucket{le="+inf"} 390.0

76 Percentiles req_seconds_bucket{le="0.05"} 0.0 req_seconds_bucket{le="0.25"} 1.0 req_seconds_bucket{le="0.5"} req_seconds_bucket{le="0.75"} req_seconds_bucket{le="1.0"} req_seconds_bucket{le="2.0"} req_seconds_bucket{le="+inf"} 390.0

77

78 Aggregation

79 Aggregation sum( rate( req_seconds_count[1m] ) )

80 Aggregation sum( rate( req_seconds_count[1m] ) )

81 Aggregation sum( rate( req_seconds_count[1m] ) )

82 Aggregation sum( rate( req_seconds_count[1m] ) )

83 Aggregation sum( rate( req_seconds_count{dc="west"}[1m] ) )

84 Aggregation sum( rate( req_seconds_count[1m] ) ) by (dc)

85 Percentiles histogram_quantile( 0.9, rate( req_seconds_bucket[10m] ))

86 Percentiles histogram_quantile( 0.9, rate( req_seconds_bucket[10m] ))

87 Percentiles histogram_quantile( 0.9, rate( req_seconds_bucket[10m] ))

88 Percentiles histogram_quantile( 0.9, rate( req_seconds_bucket[10m] ))

89 Percentiles histogram_quantile( 0.9, rate( req_seconds_bucket[10m] ))

90

91

92 Internal

93 great for ad-hoc Internal

94 Internal great for ad-hoc 1 expr per graph

95 Internal great for ad-hoc 1 expr per graph templating

96 PromDash

97 best integration PromDash

98 PromDash best integration former official

99 PromDash best integration former official now deprecated don t bother

100 Grafana

101 pretty & powerful Grafana

102 Grafana pretty & powerful many integrations

103 Grafana pretty & powerful many integrations mix and match!

104 Grafana pretty & powerful many integrations mix and match! use this!

105

106 Alerts & Scrying

107 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

108 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

109 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

110 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

111 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

112 Alerts & Scrying ALERT DiskWillFillIn4Hours IF predict_linear( node_filesystem_free[1h], 4*3600) < 0 FOR 5m

113

114

115

116 Environment

117

118 HAProxy MySQL etcd Consul nginx statsd graphite collectd Django Kubernetes redis PostgreSQL Varnish SNMP CouchDB InfluxDB MongoDB Apache

119 HAProxy MySQL etcd Consul nginx statsd graphite collectd Django Kubernetes redis PostgreSQL Varnish SNMP CouchDB InfluxDB MongoDB Apache

120 node_exporter

121 cadvisor node_exporter

122 System Insight load memory disk procs network I/O

123 mtail

124 mtail follow (log) files

125 mtail follow (log) files extract metrics using regex

126 mtail follow (log) files extract metrics using regex can be better than direct

127 Moar

128 Moar Edges: web servers/haproxy

129 Moar Edges: web servers/haproxy black box

130 Moar Edges: web servers/haproxy black box databases

131 Moar Edges: web servers/haproxy black box databases network

132 So Far

133 system stats So Far

134 So Far system stats outside look

135 So Far system stats outside look 3rd party components

136 Code

137 cat-or.not

138 HTTP service cat-or.not

139 cat-or.not HTTP service upload picture

140 cat-or.not HTTP service upload picture meow!/nope meow!

141 from flask import Flask, g, request from cat_or_not import is_cat app = Flask( name methods=["post"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if name == " main ": app.run()

142 from flask import Flask, g, request from cat_or_not import is_cat app = Flask( name methods=["post"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if name == " main ": app.run()

143 from flask import Flask, g, request from cat_or_not import is_cat app = Flask( name methods=["post"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if name == " main ": app.run()

144 pip install prometheus_client

145 from prometheus_client import \ start_http_server # if name == " main ": start_http_server(8000) app.run()

146 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

147 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

148 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

149 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

150 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

151 process_virtual_memory_bytes process_resident_memory_bytes process_start_time_seconds process_cpu_seconds_total process_open_fds 8.0 process_max_fds

152

153 from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds", "Time spent in HTTP requests.")

154 from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds", "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.")

155 from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds", "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.") IN_PROGRESS = Gauge( "cat_or_not_in_progress_requests", "Number of requests in progress.")

156 def analyze(): g.auth.check(request) with ANALYZE_TIME.time(): result = is_cat( request.files["pic"].stream) return "meow!" if result else "nope!"

157 def analyze(): g.auth.check(request) with ANALYZE_TIME.time(): result = is_cat( request.files["pic"].stream) return "meow!" if result else "nope!"

158 AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()

159 AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()

160 AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()

161 AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()

162 @app.route("/analyze", methods=["post"]) def analyze(): g.auth.check(request) with ANALYZE_TIME.time(): result = is_cat( request.files["pic"].stream) return "meow!" if result else "nope!"

163 pip install prometheus_async

164 Wrapper from prometheus_async.aio import async def view(request): #...

165 Goodies

166 Goodies aiohttp-based metrics export

167 Goodies aiohttp-based metrics export also in thread!

168 Goodies aiohttp-based metrics export also in thread! Consul Agent integration

169 Wrap Up

170 Wrap Up

171 Wrap Up

172 Wrap Up

173 Wrap Up

174 vrmd.de

Rethinking monitoring with Prometheus

Rethinking monitoring with Prometheus Rethinking monitoring with Prometheus Martín Ferrari Štefan Šafár http://tincho.org @som_zlo Who is Prometheus? A dude who stole fire from Mt. Olympus and gave it to humanity http://prometheus.io/ What

More information

Prometheus. A Next Generation Monitoring System. Brian Brazil Founder

Prometheus. A Next Generation Monitoring System. Brian Brazil Founder Prometheus A Next Generation Monitoring System Brian Brazil Founder Who am I? Engineer passionate about running software reliably in production. Based in Ireland Core-Prometheus developer Contributor to

More information

Using Prometheus with InfluxDB for metrics storage

Using Prometheus with InfluxDB for metrics storage Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer, Quiq September 26, 2017 About Quiq Quiq is a messaging platform for customer service. https://goquiq.com

More information

Introduction to Prometheus. An Approach to Whitebox Monitoring

Introduction to Prometheus. An Approach to Whitebox Monitoring Introduction to Prometheus An Approach to Whitebox Monitoring Who am I? Engineer passionate about running software reliably in production. Studied Computer Science in Trinity College Dublin. Google SRE

More information

Operating Within Normal Parameters: Monitoring Kubernetes

Operating Within Normal Parameters: Monitoring Kubernetes Operating Within Normal Parameters: Monitoring Kubernetes Elana Hashman Two Sigma Investments, LP SREcon 2019 Americas Brooklyn, NY Disclaimer This document is being distributed for informational and educational

More information

Monitoring MySQL with Prometheus & Grafana

Monitoring MySQL with Prometheus & Grafana Monitoring MySQL with Prometheus & Grafana Julien Pivotto (@roidelapluie) Percona University Belgium June 22nd, 2017 SELECT USER(); Julien "roidelapluie" Pivotto @roidelapluie Sysadmin at inuits Automation,

More information

The Art of Container Monitoring. Derek Chen

The Art of Container Monitoring. Derek Chen The Art of Container Monitoring Derek Chen 2016.9.22 About me DevOps Engineer at Trend Micro Agile transformation Micro service and cloud service Docker integration Monitoring system development Automate

More information

Monitoring MySQL Performance with Percona Monitoring and Management

Monitoring MySQL Performance with Percona Monitoring and Management Monitoring MySQL Performance with Percona Monitoring and Management Santa Clara, California April 23th 25th, 2018 MIchael Coburn, Product Manager Your Presenter Product Manager for PMM (also Percona Toolkit

More information

Open Source Database Performance Optimization and Monitoring with PMM. Fernando Laudares, Vinicius Grippa, Michael Coburn Percona

Open Source Database Performance Optimization and Monitoring with PMM. Fernando Laudares, Vinicius Grippa, Michael Coburn Percona Open Source Database Performance Optimization and Monitoring with PMM Fernando Laudares, Vinicius Grippa, Michael Coburn Percona Fernando Laudares 2 Vinicius Grippa 3 Michael Coburn Product Manager for

More information

Prometheus For Big & Little People Simon Lyall

Prometheus For Big & Little People Simon Lyall Prometheus For Big & Little People Simon Lyall Sysadmin (it says DevOps Engineer in my job title) Large Company, Auckland, New Zealand Use Prometheus at home on workstations, home servers and hosted Vms

More information

Visualize Your Data With Grafana Percona Live Daniel Lee - Software Engineer at Grafana Labs

Visualize Your Data With Grafana Percona Live Daniel Lee - Software Engineer at Grafana Labs Visualize Your Data With Grafana Percona Live 2017 Daniel Lee - Software Engineer at Grafana Labs Daniel Lee Software Engineer at Grafana Labs Stockholm, Sweden @danlimerick on Twitter What is Grafana?

More information

Monitoring Infrastructure in Booking.com. Anna Stepanyan

Monitoring Infrastructure in Booking.com. Anna Stepanyan Monitoring Infrastructure in Booking.com Anna Stepanyan Context Customer focused Frequent deployments Agile environment Moderate / limited testing Agenda Logs, Errors Measurements & Metrics Alerts Logs

More information

GoDocker. A batch scheduling system with Docker containers

GoDocker. A batch scheduling system with Docker containers GoDocker A batch scheduling system with Docker containers Web - http://www.genouest.org/godocker/ Code - https://bitbucket.org/osallou/go-docker Twitter - #godocker Olivier Sallou IRISA - 2016 CC-BY-SA

More information

PCP: Ingest and Export

PCP: Ingest and Export PCP: Ingest and Export pcp-conf2018 Mark Goodwin mgoodwin@redhat.com @goodwinos PCP Ingest / Export Ingest Standard Agents Specialized agents: MMV BCC Trace Prometheus.. many others LOGIMPORT(3) Ingest

More information

Monitoring MySQL Performance with Percona Monitoring and Management

Monitoring MySQL Performance with Percona Monitoring and Management Monitoring MySQL Performance with Percona Monitoring and Management Your Presenters Michael Coburn - PMM Product Manager Working at Percona for almost 5 years Consultant, Manager, TAM, now Product Manager

More information

Federated Prometheus Monitoring at Scale

Federated Prometheus Monitoring at Scale Federated Prometheus Monitoring at Scale LungChih Tung Oath Nandhakumar Venkatachalam Oath Team Core Platform Team powering all Yahoo Media Products Yahoo Media Products Homepage, News Finance Sports,

More information

A practical guide to monitoring and alerting with time series at scale

A practical guide to monitoring and alerting with time series at scale A practical guide to monitoring and alerting with time series at scale SREcon17 Americas Jamie Wilkinson Site Reliability Engineering, Google Why does #monitoringsuck? TL;DR: when the

More information

Monitoring Cloud Native applications with Prometheus. Aaron Weaveworks

Monitoring Cloud Native applications with Prometheus. Aaron Weaveworks Monitoring Cloud Native applications with Prometheus Aaron Kirkbride @ Weaveworks Time Series Database time_series_1 => [(t0, 0), (t1, 100), (t2, 150), (t3, 170), (t4, 300),...] time_series_2 => [(t0,

More information

Introducing Jaeger 1.0

Introducing Jaeger 1.0 Introducing Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 Agenda What is distributed tracing Jaeger in a HotROD Jaeger under the hood Jaeger v1.0 Roadmap Project governance,

More information

@InfluxDB. David Norton 1 / 69

@InfluxDB. David Norton  1 / 69 @InfluxDB David Norton (@dgnorton) david@influxdb.com 1 / 69 Instrumenting a Data Center 2 / 69 3 / 69 4 / 69 The problem: Efficiently monitor hundreds or thousands of servers 5 / 69 The solution: Automate

More information

What's new in Graphite 1.1. Denys FOSDEM 2018

What's new in Graphite 1.1. Denys FOSDEM 2018 What's new in Graphite 1.1 Denys Zhdanov @deniszh FOSDEM 2018 Who am I Denys Zhdanov System engineer @ ecg / Marktplaats.nl Twitter / Github: @deniszh Sysadmin Ninja Graphite co-maintainer Data geek Pythonista

More information

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia Alexandros Kosiaris Giuseppe Lavagetto Introduction The Wikimedia Foundation is the organization running the infrastructure supporting Wikipedia

More information

Istio. A modern service mesh. Louis Ryan Principal

Istio. A modern service mesh. Louis Ryan Principal Istio A modern service mesh Louis Ryan Principal Engineer @ Google @louiscryan My Google Career HTTP Reverse Proxy HTTP HTTP2 GRPC Reverse Proxy Reverse Proxy HTTP API Proxy HTTP Control Plane HTTP2 GRPC

More information

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22 Open-Falcon A Distributed and High-Performance Monitoring System Yao-Wei Ou & Lai Wei 2017/05/22 Let us begin with a little story Grafana PR#3787 [feature] Add Open-Falcon datasource I'm sorry but we will

More information

Python StatsD Documentation

Python StatsD Documentation Python StatsD Documentation Release 2.0.3 James Socol January 03, 2014 Contents i ii statsd is a friendly front-end to Graphite. This is a Python client for the statsd daemon. Quickly, to use: >>> import

More information

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology Regain control thanks to Prometheus Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology About us Guillaume Lefevre DevOps Engineer, OCTO Technology @guillaumelfv

More information

Performance Monitoring and Management of Microservices on Docker Ecosystem

Performance Monitoring and Management of Microservices on Docker Ecosystem Performance Monitoring and Management of Microservices on Docker Ecosystem Sushanta Mahapatra Sr.Software Specialist Performance Engineering SAS R&D India Pvt. Ltd. Pune Sushanta.Mahapatra@sas.com Richa

More information

Relabeling Julien Pivotto PromConf Munich August 9, 2017

Relabeling Julien Pivotto PromConf Munich August 9, 2017 Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name="julien Pivotto"} Julien "roidelapluie" Pivotto @roidelapluie Sysadmin at inuits Automation, monitoring, HA Grafana and

More information

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram me - Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything communicating and sharing

More information

The InfluxDB-Grafana plugin for Fuel Documentation

The InfluxDB-Grafana plugin for Fuel Documentation The InfluxDB-Grafana plugin for Fuel Documentation Release 0.8.0 Mirantis Inc. December 14, 2015 Contents 1 User documentation 1 1.1 Overview................................................. 1 1.2 Release

More information

Think Small to Scale Big

Think Small to Scale Big Think Small to Scale Big Intro to Containers for the Datacenter Admin Pete Zerger Principal Program Manager, MVP pete.zerger@cireson.com Cireson Lee Berg Blog, e-mail address, title Company Pete Zerger

More information

Monitoring and Analytics With HTCondor Data

Monitoring and Analytics With HTCondor Data Monitoring and Analytics With HTCondor Data William Strecker-Kellogg RACF/SDCC @ BNL 1 RHIC/ATLAS Computing Facility (SDCC) Who are we? See our last two site reports from the HEPiX conference for a good

More information

Python StatsD Documentation

Python StatsD Documentation Python StatsD Documentation Release 3.2.2 James Socol Dec 15, 2017 Contents 1 Installing 3 2 Contents 5 2.1 Configuring Statsd............................................ 5 2.2 Data Types................................................

More information

Road to Auto Scaling

Road to Auto Scaling Road to Auto Scaling Varun Thacker Lucidworks Apache Lucene/Solr Committer, and PMC member Agenda APIs Metrics Recipes Auto-Scale Triggers SolrCloud Overview ZooKee per Lots Shard 1 Leader Shard 3 Replica

More information

Prometheus as a (internal) service. Paul Traylor LINE Fukuoka

Prometheus as a (internal) service. Paul Traylor LINE Fukuoka Prometheus as a (internal) service Paul Traylor LINE Fukuoka Self-Introduction Wanted to make games in high school Worked on several mods creating levels Decided games were hard, web development looked

More information

Ingest. David Pilato, Developer Evangelist Paris, 31 Janvier 2017

Ingest. David Pilato, Developer Evangelist Paris, 31 Janvier 2017 Ingest David Pilato, Developer Evangelist Paris, 31 Janvier 2017 Data Ingestion The process of collecting and importing data for immediate use in a datastore 2 ? Simple things should be simple. Shay Banon

More information

DISQUS. Continuous Deployment Everything. David

DISQUS. Continuous Deployment Everything. David DISQUS Continuous Deployment Everything David Cramer @zeeg Continuous Deployment Shipping new code as soon as it s ready (It s really just super awesome buildbots) Workflow Commit (master) Integration

More information

Ingest. Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017

Ingest. Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017 Ingest Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017 Data Ingestion The process of collecting and importing data for immediate use 2 ? Simple things should be simple. Shay Banon Elastic{ON}

More information

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Application monitoring with BELK Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Why logs Business PoV Input Data Analytics User Interactions /Behavior End user Experience/ Improvements 2017 Wipro

More information

Ops for Developers Monitor your Java application with Prometheus

Ops for Developers Monitor your Java application with Prometheus .consulting.solutions.partnership Ops for Developers Monitor your Java application with Prometheus Alexander Schwartz, Principal IT Consultant CloudNativeCon + KubeCon Europe 2017 30 March 2017 Ops for

More information

observability and product release: leveraging prometheus to build and test new products digitalocean.com

observability and product release: leveraging prometheus to build and test new products digitalocean.com @snehainguva observability and product release: leveraging prometheus to build and test new products about me software engineer @DigitalOcean currently network services

More information

Storing metrics at scale with. Gnocchi. Julien Danjou OpenStack Day France 22 November 2016

Storing metrics at scale with. Gnocchi. Julien Danjou OpenStack Day France 22 November 2016 Storing metrics at scale with Gnocchi Julien Danjou OpenStack Day France 22 November 2016 Hello! I am Julien Danjou Principal Software Engineer at Red Hat You can find me at @juldanjou 1 What s the problem?

More information

Cloud Monitoring as a Service. Built On Machine Learning

Cloud Monitoring as a Service. Built On Machine Learning Cloud Monitoring as a Service Built On Machine Learning Table of Contents 1 2 3 4 5 6 7 8 9 10 Why Machine Learning Who Cares Four Dimensions to Cloud Monitoring Data Aggregation Anomaly Detection Algorithms

More information

Graphite and Grafana

Graphite and Grafana Introduction, page 1 Configure Grafana Users using CLI, page 3 Connect to Grafana, page 4 Grafana Administrative User, page 5 Configure Grafana for First Use, page 11 Manual Dashboard Configuration using

More information

Cloud providers, tools and best practices in running Magento on Kubernetes. Adrian Balcan MindMagnet Software

Cloud providers, tools and best practices in running Magento on Kubernetes. Adrian Balcan MindMagnet Software Cloud providers, tools and best practices in running Magento on Kubernetes Adrian Balcan DevOps @ MindMagnet Software About Me Companies Projects Adrian Balcan contact@adrianbalcan.com Agenda Magento on

More information

Red Hat Satellite 6.4

Red Hat Satellite 6.4 Red Hat Satellite 6.4 Monitoring Red Hat Satellite Collecting metrics from Red Hat Satellite 6 Last Updated: 2018-10-03 Red Hat Satellite 6.4 Monitoring Red Hat Satellite Collecting metrics from Red Hat

More information

StreamSets Control Hub Installation Guide

StreamSets Control Hub Installation Guide StreamSets Control Hub Installation Guide Version 3.2.1 2018, StreamSets, Inc. All rights reserved. Table of Contents 2 Table of Contents Chapter 1: What's New...1 What's New in 3.2.1... 2 What's New in

More information

Managing your microservices with Kubernetes and Istio. Craig Box

Managing your microservices with Kubernetes and Istio. Craig Box Managing your microservices with Kubernetes and Istio Craig Box Agenda What is a Service Mesh? How we got here: a story Architecture and details Q&A 2 What is a service mesh? A network for services, not

More information

Performance Monitoring for the Cloud

Performance Monitoring for the Cloud Performance Monitoring for the Cloud Werner Keil JSR 363 Maintenance Lead @wernerkeil October 18, 2017 Copyright 2016, Creative Arts & Technologies and others. All rights reserved. Agenda 1. Introduction

More information

WELCOME

WELCOME WELCOME Josh Josh Kalderimis @j2h github.com/joshk #38ish Wellington NEW ZEALAND Amsterdam but now... before we get going... -35 -35 WAT!! Desconstruindo Travis LOGGING METRICS MONITORING

More information

MQ Monitoring on Cloud

MQ Monitoring on Cloud MQ Monitoring on Cloud Suganya Rane Digital Automation, Integration & Cloud Solutions Agenda Metrics & Monitoring Monitoring Options AWS ElasticSearch Kibana MQ CloudWatch on AWS Prometheus Grafana MQ

More information

Monitor your containers with the Elastic Stack. Monica Sarbu

Monitor your containers with the Elastic Stack. Monica Sarbu Monitor your containers with the Elastic Stack Monica Sarbu Monica Sarbu Team lead, Beats team monica@elastic.co 3 Monitor your containers with the Elastic Stack Elastic Stack 5 Beats are lightweight shippers

More information

Monitoring Testbed Experiments with MonEx

Monitoring Testbed Experiments with MonEx Monitoring Testbed Experiments with MonEx Abdulqawi Saif 1,2 Alexandre Merlin 1 Lucas Nussbaum 1 Ye-Qiong Song 1 1 Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France 2 Qwant Entreprise,

More information

Pusher Documentation. Release. Top Free Games

Pusher Documentation. Release. Top Free Games Pusher Documentation Release Top Free Games January 18, 2017 Contents 1 Overview 3 1.1 Features.................................................. 3 1.2 The Stack.................................................

More information

Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues

Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues Michael Coburn, Product Manager PMM September 7th, 2017 1 Your presenter Michael Coburn - Product Manager PMM Working at

More information

How to Properly Blame Things for Causing Latency

How to Properly Blame Things for Causing Latency How to Properly Blame Things for Causing Latency An introduction to Distributed Tracing and Zipkin @adrianfcole works at Pivotal works on Zipkin Introduction introduction understanding latency distributed

More information

Advanced Microservices

Advanced Microservices Advanced Microservices A Hands-on Approach to Microservice Infrastructure and Tooling Thomas Hunter II Advanced Microservices: A Hands-on Approach to Microservice Infrastructure and Tooling Thomas Hunter

More information

Monitoring MySQL with Prometheus Ben Kochie - Prometheus Lead - GitLab

Monitoring MySQL with Prometheus Ben Kochie - Prometheus Lead - GitLab Monitoring MySQL with Ben Kochie - Lead - GitLab About Metrics collection Time-series database Graphing Alerting Performance Performance Millions of Timeseries 800k samples per second 1.3 bytes per sample

More information

Package your Java Application using Docker and Kubernetes. Arun

Package your Java Application using Docker and Kubernetes. Arun Package your Java Application using Docker and Kubernetes Arun Gupta, @arungupta Docker Captain Java Champion JavaOne Rock Star (4 years) NetBeans Dream Team Silicon Valley JUG Leader Author Runner Lifelong

More information

CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT. Martin Etmajer Technology May 4, 2017

CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT. Martin Etmajer Technology May 4, 2017 CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT Martin Etmajer Technology Lead @Dynatrace May 4, 2017 WHY A CHALLENGE? Microservice A Microservice B Microservice C Microservice

More information

django-app-metrics Documentation

django-app-metrics Documentation django-app-metrics Documentation Release 0.8.0 Frank Wiles Sep 21, 2017 Contents 1 Installation 3 1.1 Installing................................................. 3 1.2 Requirements...............................................

More information

100% Containers Powered Carpooling

100% Containers Powered Carpooling 100% Containers Powered Carpooling Maxime Fouilleul Database Reliability Engineer BlaBlaCar - Facts & Figures Today s agenda Infrastructure Ecosystem - 100% containers powered carpooling Stateful Services

More information

How to see what is happening inside your OpenStack using Elastic Stack and Prometheus

How to see what is happening inside your OpenStack using Elastic Stack and Prometheus How to see what is happening inside your OpenStack using Eastic Stack and Prometheus Introduction & Agenda About me - Csaba Patyi (csaba@componentsofteu) - Consutant and Instuctor at Component Soft Ltd

More information

All Events. One Platform.

All Events. One Platform. All Events. One Platform. Industry s first IT ops platform that truly correlates the metric, flow and log events and turns them into actionable insights. Correlate Integrate Analyze www.motadata.com Motadata

More information

Fixing the "It works on my machine!" Problem with Docker

Fixing the It works on my machine! Problem with Docker Fixing the "It works on my machine!" Problem with Docker Jared M. Smith @jaredthecoder About Me Cyber Security Research Scientist at Oak Ridge National Lab BS and MS in Computer Science from the University

More information

EXPERIENCES MOVING FROM DJANGO TO FLASK

EXPERIENCES MOVING FROM DJANGO TO FLASK EXPERIENCES MOVING FROM DJANGO TO FLASK DAN O BRIEN, VP OF ENGINEERING CRAIG LANCASTER, CTO Jana Mobile Inc. www.jana.com WHO WE ARE Jana is a startup company in Boston connecting advertising and marketing

More information

Monasca. Monitoring/Logging-as-a-Service (at-scale)

Monasca. Monitoring/Logging-as-a-Service (at-scale) Monasca Monitoring/Logging-as-a-Service (at-scale) Speaker Roland Hochmuth Hewlett Packard Enterprise Fort Collins, Colorado, USA Agenda Describe how to build a highly scalable monitoring and logging as

More information

Easy PostgreSQL Clustering with Patroni. Ants Aasma

Easy PostgreSQL Clustering with Patroni. Ants Aasma Easy PostgreSQL Clustering with Patroni Introduction About me Support engineer at Cybertec Helping others run PostgreSQL for 5 years. Helping myself run PostgreSQL since 7.4 days. What are we going to

More information

Simplicity and minimalism in software development

Simplicity and minimalism in software development Simplicity and minimalism in software development Introduction My name is Mattias Sundblad, I have been working as a software developer since 2006. I have worked for large corporations, small startups

More information

The InfluxDB-Grafana plugin for Fuel Documentation

The InfluxDB-Grafana plugin for Fuel Documentation The InfluxDB-Grafana plugin for Fuel Documentation Release 0.9-0.9.0-1 Mirantis Inc. April 22, 2016 CONTENTS 1 User documentation 1 1.1 Overview................................................. 1 1.2 Release

More information

pyformance Documentation

pyformance Documentation pyformance Documentation Release 0.3.4 Omer Getrel Oct 04, 2017 Contents 1 Manual 3 1.1 Installation................................................ 3 1.2 Usage...................................................

More information

Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data

Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data FAST 2017, Santa Clara Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, and Josef Adersberger Florian.Lautenschlager@qaware.de

More information

Go Faster: Containers, Platforms and the Path to Better Software Development (Including Live Demo)

Go Faster: Containers, Platforms and the Path to Better Software Development (Including Live Demo) RED HAT DAYS VANCOUVER Go Faster: Containers, Platforms and the Path to Better Software Development (Including Live Demo) Paul Armstrong Principal Solutions Architect Gerald Nunn Senior Middleware Solutions

More information

django-debreach Documentation

django-debreach Documentation django-debreach Documentation Release 1.4.1 Luke Pomfrey October 16, 2016 Contents 1 Installation 3 2 Configuration 5 2.1 CSRF token masking (for Django < 1.10)................................ 5 2.2 Content

More information

Patrick Cheung. PopVote backend developer

Patrick Cheung. PopVote backend developer Coding PopVote Patrick Cheung PopVote backend developer Why am I here? 47 votes in 1 second highest throughput in any second first voting day (20 June) > 70% votes casted in less then 180 seconds may include

More information

NexentaStor REST API QuickStart Guide

NexentaStor REST API QuickStart Guide NexentaStor 5.1.1 REST API QuickStart Guide Date: January, 2018 Part Number: 3000-nxs-REST-API-5.1.1-000092-A Copyright 2018 Nexenta Systems TM, ALL RIGHTS RESERVED Notice: No part of this publication

More information

LECTURE 15. Web Servers

LECTURE 15. Web Servers LECTURE 15 Web Servers DEPLOYMENT So, we ve created a little web application which can let users search for information about a country they may be visiting. The steps we ve taken so far: 1. Writing the

More information

agenda PAE Docker Docker PAE

agenda PAE Docker Docker PAE Docker 2016.03.26 agenda PAE Docker Docker PAE 2 3 PAE PlCloud APP Engine Docker Docker Caas APP 4 APP APP volume images 5 App 6 APP Show Time 7 8 Docker Public DockerHup Private registry push pull AUFS

More information

Data pipelines with PostgreSQL & Kafka

Data pipelines with PostgreSQL & Kafka Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL

More information

Continuous delivery while migrating to Kubernetes

Continuous delivery while migrating to Kubernetes Continuous delivery while migrating to Kubernetes Audun Fauchald Strand Øyvind Ingebrigtsen Øvergaard @audunstrand @oyvindio FINN Infrastructure History Kubernetes at FINN Agenda Finn Infrastructure As

More information

Monitoring Open Source Databases with Icinga

Monitoring Open Source Databases with Icinga PGConf EU Warsaw 26.10.2017 Monitoring Open Source Databases with Icinga Blerim Sheqa Product Manager Working @netways @bobapple Introduction to Icinga2 Quick Poll Icinga is a scalable and extensible monitoring

More information

ntopng A Web-based Network Traffic Monitoring Application

ntopng A Web-based Network Traffic Monitoring Application ntopng A Web-based Network Traffic Monitoring Application New York City, NY June 14th, 2017 Simone Mainardi linkedin.com/in/simonemainardi Agenda About ntop Network traffic monitoring

More information

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment Bruno Donassolo - Orange Labs Ilhem Fajjari - Orange Labs Arnaud Legrand - INRIA - LIG Panayotis Mertikopoulos - INRIA

More information

NGINX: From North/South to East/West

NGINX: From North/South to East/West NGINX: From North/South to East/West Reducing Complexity with API and Microservices Traffic Management and NGINX Plus Speakers: Alan Murphy, Regional Solution Architect, APAC September, 2018 About NGINX,

More information

Note: Currently (December 3, 2017), the new managed Kubernetes service on Azure (AKS) does not yet support Windows agents.

Note: Currently (December 3, 2017), the new managed Kubernetes service on Azure (AKS) does not yet support Windows agents. Create a Hybrid Kubernetes Linux/Windows Cluster in 7 Easy Steps Azure Container Service (ACS) makes it really easy to provision a Kubernetes cluster in Azure. Today, we'll walk through the steps to set

More information

OnCommand Unified Manager

OnCommand Unified Manager OnCommand Unified Manager Operations Manager Administration Guide For Use with Core Package 5.2.1 NetApp, Inc. 495 East Java Drive Sunnyvale, CA 94089 U.S. Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501

More information

Wrapp. Powered by AWS EC2 Container Service. Jude D Souza Solutions Wrapp Phone:

Wrapp. Powered by AWS EC2 Container Service. Jude D Souza Solutions Wrapp Phone: Containers @ Wrapp Powered by AWS EC2 Container Service Jude D Souza Solutions Architect @ Wrapp Phone: +46 767085740 Email: jude@wrapp.com About Me Jude D Souza Stockholm, Sweden ß Karachi, Pakistan jude@wrapp.com

More information

web-transmute Documentation

web-transmute Documentation web-transmute Documentation Release 0.1 Yusuke Tsutsumi Dec 19, 2017 Contents 1 Writing transmute-compatible functions 3 1.1 Add function annotations for input type validation / documentation..................

More information

Flask Slither Documentation

Flask Slither Documentation Flask Slither Documentation Release 0.3 Nico Gevers Sep 27, 2017 Contents 1 Getting Started with Slither 3 1.1 Installation................................................ 3 1.2 Creating the App.............................................

More information

Monitoring Docker Containers with Splunk

Monitoring Docker Containers with Splunk Monitoring Docker Containers with Splunk Marc Chéné Product Manager Sept 27, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may make forward-looking statements

More information

Linux Clusters Institute: Monitoring. Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln

Linux Clusters Institute: Monitoring. Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln Linux Clusters Institute: Monitoring Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln Why monitor? 2 Service Level Agreement (SLA) Which services must be provided

More information

Quo vadis, Prometheus?

Quo vadis, Prometheus? Monitoring. At scale. Richard Hartmann, RichiH@{freenode,OFTC,IRCnet}, richih@{fosdem,debian,richih}.org, richard.hartmann@space.net 2018-05-16 whoami Richard RichiH Hartmann Swiss army chainsaw at SpaceNet

More information

IBM Planning Analytics Workspace Local Distributed Soufiane Azizi. IBM Planning Analytics

IBM Planning Analytics Workspace Local Distributed Soufiane Azizi. IBM Planning Analytics IBM Planning Analytics Workspace Local Distributed Soufiane Azizi IBM Planning Analytics IBM Canada - Cognos Ottawa Lab. IBM Planning Analytics Agenda 1. Demo PAW High Availability on a Prebuilt Swarm

More information

A Comparision of Service Mesh Options

A Comparision of Service Mesh Options A Comparision of Service Mesh Options Looking at Istio, Linkerd, Consul-connect Syed Ahmed - CloudOps Inc Introduction About Me Cloud Software Architect @ CloudOps PMC for Apache CloudStack Worked on network

More information

Zumobi Brand Integration(Zbi) Platform Architecture Whitepaper Table of Contents

Zumobi Brand Integration(Zbi) Platform Architecture Whitepaper Table of Contents Zumobi Brand Integration(Zbi) Platform Architecture Whitepaper Table of Contents Introduction... 2 High-Level Platform Architecture Diagram... 3 Zbi Production Environment... 4 Zbi Publishing Engine...

More information

Managing Broadband Access Center

Managing Broadband Access Center CHAPTER 9 This chapter describes the various subcomponents within Cisco Broadband Access Center (BAC) that you can use to manage the system. These include: BAC Process Watchdog, page 9-1 Administrator

More information

AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS

AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS Sunil Shah AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS 1 THE DATACENTER OPERATING SYSTEM (DCOS) 2 DCOS INTRODUCTION The Mesosphere Datacenter Operating System (DCOS) is a distributed operating

More information

Data Ingestion at Scale. Jeffrey Sica

Data Ingestion at Scale. Jeffrey Sica Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines

More information

BIG-IP Analytics: Implementations. Version 13.1

BIG-IP Analytics: Implementations. Version 13.1 BIG-IP Analytics: Implementations Version 13.1 Table of Contents Table of Contents Setting Up Application Statistics Collection...5 What is Analytics?...5 About HTTP Analytics profiles... 5 Overview:

More information

flask-jwt Documentation

flask-jwt Documentation flask-jwt Documentation Release 0.3.2 Dan Jacob Nov 16, 2017 Contents 1 Links 3 2 Installation 5 3 Quickstart 7 4 Configuration Options 9 5 API 11 6 Changelog 13 6.1 Flask-JWT Changelog..........................................

More information