DC/OS Metrics (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise Nick Parker at.. 1
Introduction Nick Parker DC/OS Slack: chat.dcos.io DC/OS Mailing List: users@dcos.io GitHub: nickbp@ Data Agility Team Frameworks for Cassandra/DSE, HDFS, Kafka/Confluent, Spark,... Service SDK (in progress...) 2
The Importance of Metrics How do you know if... Things are running fine, or falling over Containers have plenty of quota, or are on the edge of OOM You re optimizing for what people use, or what nobody sees The new release is good, or should be rolled back 3
Sources of Metrics in DC/OS Container Metrics Measure things like: RAM, Disk, IOPS, CPU, Network, To determine: Resource utilization/basic health Application Metrics Measure things like: QPS, query latency, number/types of hit exceptions, number of active users, To determine: Changes in performance/behavior across rollouts Debugging active issues (eg oncall pages) Tracing historical behavior... 4
Solving Metrics on DC/OS Easy integration by applications Little effort/thought to emit metrics from any application Support custom metric metadata Inject container metadata Container, Framework, Agent,... Flexible, configurable output Widely accessible format/schema Send metrics to any storage Easy filtering and routing Installed as a containerized application Easy reconfiguration/upgrades/fixes 5
What DC/OS Metrics Provides Easy input Container resource metrics: retrieved automatically Custom application metrics: StatsD endpoint, advertised with env vars Automatic source tagging Application, Framework, Host Agent, Container,... Flexible outputs Kafka cluster: scale as needed, attach arbitrary consumers Others? 6
Application Input: StatsD (with tag support) StatsD Format Text records: either one-per-packet or newline separated. Optional tagging (Datadog extension) - Consumed by DC/OS Metrics! memory.usage_mb:5 g frontend.query.latency_ms:46 g #shard_id:6,section:frontpage Pseudocode if (env[ STATSD_UDP_HOST ] and env[ STATSD_UDP_PORT ]) { // 1. Open UDP socket to the endpoint // 2. Send StatsD-formatted metrics } 7
Output Format: Apache Avro repeated MetricList { repeated Tag { string key, string value, } repeated Datapoint { string name, double value, int64 epoch_time_ms, } } 8
Architecture: Per-Node Per-host components: 1. Mesos Metrics Module 2. Metrics Collector 9
Architecture: Per-Cluster Per-cluster components: 1. Kafka 2. Consumer(s) 10
Architecture: Overall Per-host components: 1. Mesos Metrics Module 2. Metrics Collector Per-cluster components: 3. Kafka 4. Consumer(s) 11
Demo!!! Service config examples Consumer examples Show and tell 12
Configuring StatsD in... Cassandra Kafka 13
Executor logs from... Kafka Cassandra
Consumers for... InfluxDB KairosDB (Dog)StatsD 15
Show and Tell Cluster Services Datadog Grafana 16
Contact/Q&A Nick Parker DC/OS Slack: chat.dcos.io DC/OS Mailing List: users@dcos.io GitHub: nickbp@ Any Questions? 17
Appendix: Metrics on DC/OS Enterprise DC/OS Enterprise 1.7 Application metrics only Tagged with some container IDs Sent to metrics.marathon.mesos:8125 Tied to DC/OS release cycle DC/OS Enterprise 1.8 Adds resource usage metrics Adds more tags Sent to local Collector process Collector is detached from DC/OS release cycle 18
Appendix: Mesos Agent
Appendix: Print Consumer
21
22
23