Containers Infrastructure for Advanced Management Federico Simoncelli Associate Manager, Red Hat October 2016
About Me
Kubernetes Decoupling problems to hand out to different teams Layer of abstraction for Application definition Machines don t have an identity or a specific function Developers do operations for their application Cluster Admins do operations for cluster software Kernel and Operating System do operations for nodes Hardware operations for clouds All...machines are created equal Developers do not know about Operators issues Operators do not know about Applications issues
OpenShift 100% based and compatible with Kubernetes Kubernetes influencer for new features Projects and Namespaces Templates Routes and Ingress Additional features related to images life-cycle and rolling updates Integrated experience in many areas Opinionated metrics and logging solutions Developer Web Console
Application Components Distribution Traditional and Kubernetes distribution of application components
New Set Of (Old) Problems for Operators COMPLEXITY SCALE One developer. How do I containerize? Dev team. How can we move faster? Dev meets Ops. How do we run at scale? DevOps. Can we turn it into a platform? Production Ops. How do we manage at scale?
Deployment Requirements Standardized and easy to reproduce Automatic and composable Deploy-and-forget is not enough Maintainable Definition of desired state and reconciliation Allow to reliably modify infrastructure Pick a platform Atomic vs Traditional Scaling (add and remove nodes) Change configurations, etc. Somehow similar to Kubernetes principles
Deployment Status Kubernetes OpenShift kube-up based on SaltStack (turning into kube-deploy) Mostly for GCE (and Vagrant for development) Kargo based on Ansible GKE (possible future) https://github.com/openshift/openshift-ansible Supports AWS GCE libvirt OpenStack Vagrant Containers on OpenStack Kubernetes and OpenShift Heat templates Magnum container orchestration as first class resources https://github.com/redhat-openstack/openshift-on-openstack
OpenShift-Ansible Actively maintained and feature-rich Based on a healthy Open Source automation project Describe your infrastructure as inventory Inventory can be versioned and updated Simple interactive installation Large ecosystem Composable with other automations atomic-openshift-installer Advanced installation supporting many advanced features Possibly hard to master
Monitoring Objectives Notification of incidents Debug new or unknown issues Grace period Notifications Quickly have at hand the overall status of the cluster Easy access to metrics and logging Metrics and logging at all levels (infrastructure, etc.) Analyze trending and proactively avoid future incidents Scheduled maintenance Datacenter Hardware upgrades
Common Monitoring Architecture
Monitor Kubernetes-Based Clusters with Heapster Leverage the infrastructure to monitor the same infrastructure Heapster What if monitoring is failing continuously? Enables Container Cluster Monitoring and Performance Analysis Different sinks Autoscaling Collected data are then used to autoscale Pods (when configured)
Agile Monitoring Running continuously a data center 24/7 demands more than Metrics collection Contribution to Heapster and cadvisor is slow Integrate additional solutions and technologies Agile addition of new Metrics Monitoring for known issues No development involved Nodes can self-heal Statistics on most recurring issues Identify fragile components or architecture Focus development for reliability
Application and Infrastructure Monitoring Roles and duties separation (once again) Developers should be interested only on metrics and logs of applications Developers must see only data of objects they own Operators are mostly interested on metrics and logs of the infrastructure (e.g. nodes) Metrics, logging and alerts belong to objects Heapster collects metrics per object (node, container, etc.) Security considerations Applications and infrastructure in the same data store? Tenancy in data store is enough for you?
Monitoring Architecture Considerations Reliability and disruptions isolation Scalability of each subsystem Data locality Reuse of existing solutions Security (and isolation of data) Monitoring life-cycle (upgrade and rollback) Cross correlation of multiple clusters and solutions Single technology for Metrics and Logging?
Direct Monitoring
Metrics and Logging Federation
Hawkular and ElasticSearch Open Source solutions for metrics and logging Hawkular based on Cassandra ElasticSearch based on Lucene Data stores used by many existing projects Technologies of choice for OpenShift Work out of the box in OpenShift Hawkular trigger definitions for Alerts Kibana visualization tool for ElasticSearch
Image and Security Security assessment How to trust underlying images? How to keep the images safe How to enforce security policies? Technologies Signed images OpenSCAP assessment tools Atomic Scan and Blackduck
Putting It All Together Maintainable deployment solution Support cluster re-shaping Versionable Monitoring unexpected events and alerts Planning data center evolution over time Ability of monitoring and cross-link with the underlying infrastructure Out-Of-The-Box experience Knowledge gathered from a community of Operators
ManageIQ Comprehensive Cloud Management Single-Pane of Glass Private and Public All-Around VMs, Instances, Containers, Storage, Network Management Framework Monitoring Management Infrastructure applications Policies and Alerts Reports and Chargeback Reports Automation Capacity Planning
ManageIQ Project and History Virtualization Management since 2006 Acquired by Red Hat in December 2012 Open-Sourced in June 2014 7 Technical Leaders 3 Monthly Stable Builds ~50 Core Engineers Nightly Builds ~100 Contributors (and counting) 3 Weeks Sprints 3 Companies Involved 200 Average PR (per Sprint)
Introducing Containers to ManageIQ 2015-2016 Inventory collection of major objects Cross-linking for nodes on known instances Dashboard and Topology Metrics collection from Hawkular Utilization aggregation (Project, Service, etc.) Smart-State Analysis Nodes, Pods, Services, Replicators, etc. Collection of images packages OpenSCAP for container images Policies for container objects Chargeback
ManageIQ Inventory and Relationships Service Pod Cluster Node Instance Container Image
Containers Management in ManageIQ in 2017 Current ongoing efforts for 2017 Alerts dashboard and life-cycle Live Metrics and Alerts Dynamic Metrics and Alerts Custom metrics and alerts on-demand Automation Metrics served by Hawkular to ManageIQ Support native Hawkular triggers for Alerts Manage and re-provision ManageIQ using Ansible Integration with Logging and ELK stack
Get Involved! Community http://talk.manageiq.org Code https://github.com/manageiq/manageiq providers/containers Documentation http://manageiq.org/documentation Social: Twitter @manageiq #manageiq Federico Simoncelli fsimonce@redhat.com https://twitter.com/simon3z