DC/OS Metrics. (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise. Nick Parker at..

Similar documents
Scale your Docker containers with Mesos

@joerg_schad Nightmares of a Container Orchestration System

Introducing Jaeger 1.0

Using DC/OS for Continuous Delivery

Mesosphere and Percona Server for MongoDB. Peter Schwaller, Senior Director Server Eng. (Percona) Taco Scargo, Senior Solution Engineer (Mesosphere)

Mesosphere and Percona Server for MongoDB. Jeff Sandstrom, Product Manager (Percona) Ravi Yadav, Tech. Partnerships Lead (Mesosphere)

CONTINUOUS DELIVERY WITH MESOS, DC/OS AND JENKINS

CONTINUOUS DELIVERY WITH DC/OS AND JENKINS

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Building a Data-Friendly Platform for a Data- Driven Future

Issues Fixed in DC/OS

Processing of big data with Apache Spark

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

The Art of Container Monitoring. Derek Chen

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

MESOS A State-Of-The-Art Container Orchestrator Mesosphere, Inc. All Rights Reserved. 1

A Whirlwind Tour of Apache Mesos

The Emergence of the Datacenter Developer. Tobi Knaup, Co-Founder & CTO at

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

SCALING LIKE TWITTER WITH APACHE MESOS

How Container Schedulers and Software-based Storage will Change the Cloud

CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT. Martin Etmajer Technology May 4, 2017

Advantages of using DC/OS Azure infrastructure and the implementation architecture Bill of materials used to construct DC/OS and the ACS clusters

OpenStack Magnum Pike and the CERN cloud. Spyros

Marathon has a timer metric that determines how long an event has taken place. Timer does not exist for Mesos observability metrics.

Overview. About CERN 2 / 11

ControlUp v7.1 Release Notes

POWERING THE INTERNET WITH APACHE MESOS

Managing your microservices with Kubernetes and Istio. Craig Box

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Istio. A modern service mesh. Louis Ryan Principal

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

Search Engines and Time Series Databases

Big Data Security. Facing the challenge

Monasca. Monitoring/Logging-as-a-Service (at-scale)

StreamSets Control Hub Installation Guide

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Nutanix. User Guide

Monitor Cassandra audit log

Note: Isolation guarantees among subnets depend on your firewall policies.

Jupyter and Spark on Mesos: Best Practices. June 21 st, 2017

GoDocker. A batch scheduling system with Docker containers

Evolution of an Apache Spark Architecture for Processing Game Data

Deploying Applications on DC/OS

Be a Microservices Hero ContainerCon 15

AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS

Kubernetes objects on Microsoft Azure

Real-time monitoring Slurm jobs with InfluxDB September Carlos Fenoy García

Monitoring and Analytics With HTCondor Data

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

BMC Service Desk Express Performance Discussion

Trends and challenges Managing the performance of a large-scale network was challenging enough when the infrastructure was fairly static. Now, with Ci

Hortonworks DataFlow Sam Lachterman Solutions Engineer

How to Properly Blame Things for Causing Latency

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

SQUASH. Debugger for microservices. Idit Levine solo.io

Networking & Security for Mesos

Improving efficiency of Twitter Infrastructure using Chargeback

Survey and Comparison of Open Source Time Series Databases

How we built a highly scalable Machine Learning platform using Apache Mesos

Time Series Storage with Apache Kudu (incubating)

The InfluxDB-Grafana plugin for Fuel Documentation

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply

ACCURATE STUDY GUIDES, HIGH PASSING RATE! Question & Answer. Dump Step. provides update free of charge in one year!

Seagull: A distributed, fault tolerant, concurrent task runner. Sagar Patwardhan

利用 Mesos 打造高延展性 Container 環境. Frank, Microsoft MTC

Operating Within Normal Parameters: Monitoring Kubernetes

Deployment Planning Guide

Deploying SQL Stream Processing in Kubernetes with Ease

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Architecting for Failure in a Containerized World. Tom Faulhaber Infolace

How IBM Can Identify z/os Networking Issues without tracing

Supporting GPUs in Docker Containers on Apache Mesos

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22

Monitoring Docker Containers with Splunk

OpenNTI Collect and visualize KPI from Networks devices

Professional PostgreSQL monitoring made easy. Kaarel Moppel Kaarel Moppel

Principal Software Engineer Red Hat Emerging Technology June 24, 2015

Docker DCA EXAM. m/ Product: Demo. For More Information: Docker Certified Associate

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

Service Mesh and Related Microservice Technologies in ONAP

PostgreSQL monitoring with pgwatch2. Kaarel Moppel / PostgresConf US 2018

Apache Flink. Alessandro Margara

Practical Big Data Processing An Overview of Apache Flink

Data Movement & Tiering with DMF 7

DATA SCIENCE USING SPARK: AN INTRODUCTION

AmLight s SDN Looking Glass A Network Monitoring System for SDN Networks

Distributed Data on Distributed Infrastructure. Claudius Weinberger & Kunal Kusoorkar, ArangoDB Jörg Schad, Mesosphere

itpass4sure Helps you pass the actual test with valid and latest training material.

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Overview. SUSE OpenStack Cloud Monitoring

In-cluster Open Source Testing Framework

A Generic Microservice Architecture for Environmental Data Management

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment

The Evolution of a Data Project

Reference Architecture

Reference Architecture. 28 MAY 2018 vrealize Operations Manager 6.7

MongoDB Backup & Recovery Field Guide

Transcription:

DC/OS Metrics (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise Nick Parker at.. 1

Introduction Nick Parker DC/OS Slack: chat.dcos.io DC/OS Mailing List: users@dcos.io GitHub: nickbp@ Data Agility Team Frameworks for Cassandra/DSE, HDFS, Kafka/Confluent, Spark,... Service SDK (in progress...) 2

The Importance of Metrics How do you know if... Things are running fine, or falling over Containers have plenty of quota, or are on the edge of OOM You re optimizing for what people use, or what nobody sees The new release is good, or should be rolled back 3

Sources of Metrics in DC/OS Container Metrics Measure things like: RAM, Disk, IOPS, CPU, Network, To determine: Resource utilization/basic health Application Metrics Measure things like: QPS, query latency, number/types of hit exceptions, number of active users, To determine: Changes in performance/behavior across rollouts Debugging active issues (eg oncall pages) Tracing historical behavior... 4

Solving Metrics on DC/OS Easy integration by applications Little effort/thought to emit metrics from any application Support custom metric metadata Inject container metadata Container, Framework, Agent,... Flexible, configurable output Widely accessible format/schema Send metrics to any storage Easy filtering and routing Installed as a containerized application Easy reconfiguration/upgrades/fixes 5

What DC/OS Metrics Provides Easy input Container resource metrics: retrieved automatically Custom application metrics: StatsD endpoint, advertised with env vars Automatic source tagging Application, Framework, Host Agent, Container,... Flexible outputs Kafka cluster: scale as needed, attach arbitrary consumers Others? 6

Application Input: StatsD (with tag support) StatsD Format Text records: either one-per-packet or newline separated. Optional tagging (Datadog extension) - Consumed by DC/OS Metrics! memory.usage_mb:5 g frontend.query.latency_ms:46 g #shard_id:6,section:frontpage Pseudocode if (env[ STATSD_UDP_HOST ] and env[ STATSD_UDP_PORT ]) { // 1. Open UDP socket to the endpoint // 2. Send StatsD-formatted metrics } 7

Output Format: Apache Avro repeated MetricList { repeated Tag { string key, string value, } repeated Datapoint { string name, double value, int64 epoch_time_ms, } } 8

Architecture: Per-Node Per-host components: 1. Mesos Metrics Module 2. Metrics Collector 9

Architecture: Per-Cluster Per-cluster components: 1. Kafka 2. Consumer(s) 10

Architecture: Overall Per-host components: 1. Mesos Metrics Module 2. Metrics Collector Per-cluster components: 3. Kafka 4. Consumer(s) 11

Demo!!! Service config examples Consumer examples Show and tell 12

Configuring StatsD in... Cassandra Kafka 13

Executor logs from... Kafka Cassandra

Consumers for... InfluxDB KairosDB (Dog)StatsD 15

Show and Tell Cluster Services Datadog Grafana 16

Contact/Q&A Nick Parker DC/OS Slack: chat.dcos.io DC/OS Mailing List: users@dcos.io GitHub: nickbp@ Any Questions? 17

Appendix: Metrics on DC/OS Enterprise DC/OS Enterprise 1.7 Application metrics only Tagged with some container IDs Sent to metrics.marathon.mesos:8125 Tied to DC/OS release cycle DC/OS Enterprise 1.8 Adds resource usage metrics Adds more tags Sent to local Collector process Collector is detached from DC/OS release cycle 18

Appendix: Mesos Agent

Appendix: Print Consumer

21

22

23