The Art of Container Monitoring. Derek Chen

Similar documents
Prometheus For Big & Little People Simon Lyall

Using Prometheus with InfluxDB for metrics storage

Monitoring MySQL with Prometheus & Grafana

PostgreSQL monitoring with pgwatch2. Kaarel Moppel / PostgresConf US 2018

Visualize Your Data With Grafana Percona Live Daniel Lee - Software Engineer at Grafana Labs

Kubernetes: Twelve KeyFeatures

Federated Prometheus Monitoring at Scale

Professional PostgreSQL monitoring made easy. Kaarel Moppel - p2d2.cz 2019 Prague

Cloud Monitoring as a Service. Built On Machine Learning

Cloud providers, tools and best practices in running Magento on Kubernetes. Adrian Balcan MindMagnet Software

Network Automation using modern tech. Egor Krivosheev 2degrees

Professional PostgreSQL monitoring made easy. Kaarel Moppel Kaarel Moppel

CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT. Martin Etmajer Technology May 4, 2017

Performance Monitoring and Management of Microservices on Docker Ecosystem

COSMOS. Controls Open Source MOnitoring System Project Status Report. Frank on behalf of the COSMOS core team BE-CO Technical Meeting:

Linux Clusters Institute: Monitoring. Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln

Open Source Database Performance Optimization and Monitoring with PMM. Fernando Laudares, Vinicius Grippa, Michael Coburn Percona

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

Hynek Schlawack. Get Instrumented. How Prometheus Can Unify Your Metrics

Containers Infrastructure for Advanced Management. Federico Simoncelli Associate Manager, Red Hat October 2016

FROM MONOLITH TO DOCKER DISTRIBUTED APPLICATIONS

Leveraging the Serverless Architecture for Securing Linux Containers

A day in the life of a log message Kyle Liberti, Josef

Think Small to Scale Big

Operating Within Normal Parameters: Monitoring Kubernetes

Monitoring MySQL Performance with Percona Monitoring and Management

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Pursuit of stability. Growing AWS ECS in production. Alexander Köhler Frankfurt, September 2018

Continuous delivery while migrating to Kubernetes

PostgreSQL migration from AWS RDS to EC2

Overview. SUSE OpenStack Cloud Monitoring

Monasca. Monitoring/Logging-as-a-Service (at-scale)

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22

DevOps Course Content

The InfluxDB-Grafana plugin for Fuel Documentation

Buenos Aires 31 de Octubre de 2018

What s New in Red Hat OpenShift Container Platform 3.4. Torben Jäger Red Hat Solution Architect

SQL Server inside a docker container. Christophe LAPORTE SQL Server MVP/MCM SQL Saturday 735 Helsinki 2018

LOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU

@InfluxDB. David Norton 1 / 69

LAB EXERCISE: RedHat OpenShift with Contrail 5.0

Continuous Delivery of Micro Applications with Jenkins, Docker & Kubernetes at Apollo

Ingest. David Pilato, Developer Evangelist Paris, 31 Janvier 2017

Using AWS to Build a Large Scale Dockerized Microservices Architecture. Dr. Oliver Wahlen moovel Group GmbH Frankfurt, 30.

Ingest. Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017

Monitoring Docker Containers with Splunk

Triangle Kubernetes Meet Up #3 (June 9, 2016) From Beginner to Expert

From Dev to DevOps: An Unexpected Journey. Luis Angel Vicente Sanchez BigCentech Ltd

Prometheus. A Next Generation Monitoring System. Brian Brazil Founder

DEVOPS COURSE CONTENT

Question: 2 Kubernetes changed the name of cluster members to "Nodes." What were they called before that? Choose the correct answer:

Graphite and Grafana

The Long Road from Capistrano to Kubernetes

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto

MQ Monitoring on Cloud

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Effecient monitoring with Open source tools. Osman Ungur, github.com/o

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

Seagull: A distributed, fault tolerant, concurrent task runner. Sagar Patwardhan

How we built a highly scalable Machine Learning platform using Apache Mesos

Monitoring MySQL Performance with Percona Monitoring and Management

OpenStack Magnum Pike and the CERN cloud. Spyros

Rethinking monitoring with Prometheus

GoDocker. A batch scheduling system with Docker containers

From development to production

Search Engines and Time Series Databases

Package your Java Application using Docker and Kubernetes. Arun

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Przyspiesz tworzenie aplikacji przy pomocy Openshift Container Platform. Jarosław Stakuń Senior Solution Architect/Red Hat CEE

OpenNTI Collect and visualize KPI from Networks devices

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

4 Effective Tools for Docker Monitoring. By Ranvijay Jamwal

In-cluster Open Source Testing Framework

Monitor your infrastructure with the Elastic Beats. Monica Sarbu

Using PostgreSQL, Prometheus & Grafana for Storing, Analyzing and Visualizing Metrics

Monitor your containers with the Elastic Stack. Monica Sarbu

Empfehlungen vom BigData Admin

Monitoring and Analytics With HTCondor Data

Building an on premise Kubernetes cluster DANNY TURNER

Wrapp. Powered by AWS EC2 Container Service. Jude D Souza Solutions Wrapp Phone:

Installation and setup guide of 1.1 demonstrator

Zumobi Brand Integration(Zbi) Platform Architecture Whitepaper Table of Contents

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Nagios. User Guide

OpenShift 3 Technical Architecture. Clayton Coleman, Dan McPherson Lead Engineers

Container Orchestration on Amazon Web Services. Arun

Container Management : First Looks

README file for TICKpy (CogSys) Container v0.9.4

How Container Runtimes matter in Kubernetes?

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Accelerate at DevOps Speed With Openshift v3. Alessandro Vozza & Samuel Terburg Red Hat

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

How to re-invent your IT Architecture. André Christ, Co-CEO LeanIX

Filebeat is able to do multiline while collecting logs from the container. you can use autodiscover to configure it in many ways

Evolving & Supporting Stateful, Multi-Tenant Decisioning Applications in Production. B. Frazier, K. Gasser & G. Mead, Software Engineers, Capital One

Red Hat OpenStack Platform 12

Designing MQ deployments for the cloud generation

Real-time monitoring Slurm jobs with InfluxDB September Carlos Fenoy García

Advanced ecommerce Monitoring one tool does it all

Discover SUSE Manager

Containers, Serverless and Functions in a nutshell. Eugene Fedorenko

Transcription:

The Art of Container Monitoring Derek Chen 2016.9.22

About me DevOps Engineer at Trend Micro Agile transformation Micro service and cloud service Docker integration Monitoring system development Automate all the things Make everything smoother Find me at derekhound@gmail.com 2

Why monitoring? We want to know when things go wrong We want to know when things aren t quite right We want to know in advance of problems Microscope? Magnifier? Telescope? 3

Blackbox vs Whitebox Blackbox Requires no participation of the monitored system Observes external functionality, what the user see End to end test Whitebox Collects data internally provided by the target system Has more granular information about the system Can provide warning of problems before they occur 4

Blackbox tests measure Can you ping the server Can you fetch a page from the server Does it have the correct contents Whitebox monitoring collects Network statistics (packets/bytes sent/received) System load (cpu, memory, disk usage) Application statistics (connection count, query per second) 5

Goal Be metric, event, and log Allow us to easily visualize the stat of our environment Provide contextual and useful notification. Focus on whitebox monitoring instead of blackbox monitoring Router Metrics Events Logs Destinations Store Visualize Alert Business Logic Application Operating System Monitoring Framwork [image] https://www.artofmonitoring.com 6

Why Open Source Software Monitoring? The practice of system admin is changing quickly DevOps Infrastructure as Code Visualization Cloud Platform Commercial solution can t keep up! [image] http://devops.com/2014/05/05/meet-infrastructure-code 7

Monolithic If all-in-one gets the job done, then great Good for smaller scale, non-tech-focused companies Modular DevOps requires flexibility and innovation Good for tech driven and ops-focused companies 8

Metric We ll rely most heavily on metrics to help us understand what s going on in our environment.

Metric Provide a dynamic, real-time picture of the state of your infrastructure Business Logic Collect Store Visualize Application Operating System 10

Pull Mode A central collector periodically requests metrics from each monitored system Push Mode Metrics are periodically sent by each monitored system to a central collector Server #1 Server #1 Service Scrape Metrics Agent Push Metrics Server #2 Service Monitoring Server #2 Agent Monitoring Server #3 Server #3 Service Agent 11

Collectd The system statistics collection daemon

Host Machine OS Metric Monitor InfluxDB Grafana Kubernetes Cluster Collectd K8S Master Database Elasticsearch K8S Minion K8S Minion Collectd Collectd Collectd Collectd 13

cadvisor Collects, aggregates, processes, and exports information about running containers.

Container OS Metric Monitor InfluxDB Grafana Kubernetes Cluster K8S Master API Server cadvisor Database Elasticsearch K8S Minion K8S Minion Heapster cadvisor cadvisor 15

http://127.0.0.1:4194 http://127.0.0.1:4194/metrics 16

Telegraf Collects time series data from a variety of sources

Application Metric Pull Mode Telegraf Monitor InfluxDB Grafana Kubernetes Cluster API Server K8S Master Database Elasticsearch K8S Minion K8S Minion Postgres ES Nginx Redis 18

Application Metric Push Mode Monitor InfluxDB Grafana Kubernetes Cluster API Server K8S Master Database Elasticsearch K8S Minion K8S Minion Postgres Telegraf ES Telegraf Nginx Telegraf Pod Redis Telegraf Pod 19

Telegraf An agent written in Go Easy to contribute or develop your plugins Supported Input Plugins System: cpu, mem, net, disk, processes and etc. Application: docker, nginx, postgresql, redis, and etc. Third party api: aws cloudwatch Supported output plugins influxdb, graphite, cloudwatch, datadog and etc. 20

Pull Mode Collector discoveries services periodically Collector needs to talk to target services Overlay network. Ex: Flannel, Weave, etc. Forward proxy Server #1 Service Server #2 Service Server #3 Service Scrape Metrics Monitoring 21

Push Mode Agent sends metrics as soon as it starts up Suitable for short-lived containers or dynamic environments Server #1 Agent Server #2 Agent Server #3 Agent Push Metrics Monitoring 22

InfluxDB Time series database stores all the metrics

Similar scope to Graphite Written in Go No external dependency Ready for billions row Several client libraries SQL style queries 24

InfluxDB Schema Measurements (e.g. cpu, mem, disk, net, nginx) Timestamp (nano second) Tags (e.g. app=atom-apigw namespace=alpha) Fields (e.g. accepts=30925, active=1, handled=30925) 25

InfluxDB Schema Measurements (e.g. cpu, mem, disk, net, nginx) Timestamp (nano second) Tags (e.g. app=atom-apigw namespace=alpha) Fields (e.g. accepts=30925, active=1, handled=30925) 26

InfluxDB Schema Measurements (e.g. cpu, mem, disk, net, nginx) Timestamp (nano second) Tags (e.g. app=atom-apigw namespace=alpha) Fields (e.g. accepts=30925, active=1, handled=30925) 27

InfluxDB Schema Measurements (e.g. cpu, mem, disk, net, nginx) Timestamp (nano second) Tags (e.g. app=atom-apigw namespace=alpha) Fields (e.g. accepts=30925, active=1, handled=30925) 28

InfluxDB Schema Measurements (e.g. cpu, mem, disk, net, nginx) Timestamp (nano second) Tags (e.g. app=atom-apigw namespace=alpha) Fields (e.g. accepts=30925, active=1, handled=30925) 29

Grafana Querying and visualizing time series and metrics

Finally! What we can to see! Less talk more demo

32

35

37

Event We ll generally use events to let us know about changes and occurrences in our environment.

Icinga2 A monitoring system which checks the availability of your resources, notifies users of outages

Originally forked from Nagios in 2009 Independent version Icinga2 since 2014 Monitors everything Gathering status Collect performance data /use/lib/nagios/plugins/plugin_name Any program which returns 0 OK 1 WARNING 2 CRITICAL Message to STDOUT 40

Event Monitoring Monitor Kubernetes Cluster K8S Master Icinga2 InfluxDB Grafana API Server Database Elasticsearch K8S Minion K8S Minion Postgres ES Nginx Redis 41

43

Pod Host 44

Server Monitoring About changes and occurrences in our environment Is cpu load too high? Is memory not enough? Is docker engine still alive? External Monitoring End to end test from user s perspective Can you connect ssh port 22? Can you browse web page? Can you request RESTful API successfully? 45

Dashboard Provides a clear view for current environment Alerting Notifies using email, slack, pager and etc. Notification escalation Others Distributed Monitoring Reloads config without interrupt checks

Log Logs are a subset of events. They re often most useful for fault diagnosis and investigation

EFK EFK stack is a mature logging solution Shipping all of your log to where it should go The main part to store your data with high availability Visualize will power your data. To know more about its value. 48

Summary

Monitoring Service Stack Collect Store Visualize Alert Collectd cadvisor InfluxDB Grafana Icinga2 Telegraf Fluentd Elasticserach Kibana 50

Last but Not least Data Flow Blackbox vs Whitebox Pull Mode vs Push Mode Data Content Opeating System, Application, Business Logic Data Type Metric, Events, Logs [image] https://www.artofmonitoring.com Router Metrics Events Logs Destinations Store Visualize Alert Business Logic Application Operating System Monitoring Framwork 51

THANKS! Any Questions?