Evolution of Kubernetes in One Year From Technical View

Similar documents
Bringing Security and Multitenancy. Lei (Harry) Zhang

An Introduction to Kubernetes

Kubernetes: Twelve KeyFeatures

What s New in K8s 1.3

What s New in K8s 1.3

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack.

Kubernetes 1.9 Features and Future

gcp / gke / k8s microservices

Code: Slides:

A REFERENCE ARCHITECTURE FOR DEPLOYING WSO2 MIDDLEWARE ON KUBERNETES

Kubernetes 101. Doug Davis, STSM September, 2017

CONTAINERS AND MICROSERVICES WITH CONTRAIL

Package your Java Application using Docker and Kubernetes. Arun

OPENSTACK + KUBERNETES + HYPERCONTAINER. The Container Platform for NFV

Introduction to Kubernetes Storage Primitives for Stateful Workloads

agenda PAE Docker Docker PAE

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto

TensorFlow on vivo

Container Orchestration on Amazon Web Services. Arun

@briandorsey #kubernetes #GOTOber

Project Calico v3.2. Overview. Architecture and Key Components. Project Calico provides network security for containers and virtual machine workloads.

Kubernetes introduction. Container orchestration

Scaling Jenkins with Docker and Kubernetes Carlos

What s New in Red Hat OpenShift Container Platform 3.4. Torben Jäger Red Hat Solution Architect


Kubernetes - Load Balancing For Virtual Machines (Pods)

Continuous delivery while migrating to Kubernetes

Project Calico v3.1. Overview. Architecture and Key Components

Designing MQ deployments for the cloud generation

Kubernetes. Introduction

OpenShift Roadmap Enterprise Kubernetes for Developers. Clayton Coleman, Architect, OpenShift

Kubernetes: What s New

Red Hat OpenShift Roadmap Q4 CY16 and H1 CY17 Releases. Lutz Lange Solution

Kuber-what?! Learn about Kubernetes

Kubernetes objects on Microsoft Azure

Kubernetes Love at first sight?

Kubernetes - Networking. Konstantinos Tsakalozos

Kubernetes on Openstack

Scheduling in Kubernetes October, 2017

Managing Compute and Storage at Scale with Kubernetes. Dan Paik / Google

Kubernetes: Container Orchestration and Micro-Services logo

Hacking and Hardening Kubernetes

The Long Road from Capistrano to Kubernetes

Containers Infrastructure for Advanced Management. Federico Simoncelli Associate Manager, Red Hat October 2016

Kubernetes Basics. Christoph Stoettner Meetup Docker Mannheim #kubernetes101

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

More Containers, More Problems

Introduction to Kubernetes

Question: 2 Kubernetes changed the name of cluster members to "Nodes." What were they called before that? Choose the correct answer:

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Kubernetes Integration with Virtuozzo Storage

Fault Tolerant Stateful Services on Kubernetes. Timothy St.

PRP Distributed Kubernetes Cluster

How Container Runtimes matter in Kubernetes?

OpenShift 3 Technical Architecture. Clayton Coleman, Dan McPherson Lead Engineers

So, I have all these containers! Now what?

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Important DevOps Technologies (3+2+3days) for Deployment

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

Internals of Docking Storage with Kubernetes Workloads

Container-Native Storage

Kubernetes. An open platform for container orchestration. Johannes M. Scheuermann. Karlsruhe,

Taming Distributed Pets with Kubernetes

Building an on premise Kubernetes cluster DANNY TURNER

Life of a Packet. KubeCon Europe Michael Rubin TL/TLM in GKE/Kubernetes github.com/matchstick. logo. Google Cloud Platform

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Kubernetes, Persistent Volumes and the Pure Service Orchestrator. Simon Dodsley, Director of New Stack Technologies

Red Hat Enterprise Linux Atomic Host 7 Getting Started with Kubernetes

Singularity CRI User Documentation

Kubernetes: Integration vs Native Solution

DevOps + Infrastructure TRACK SUPPORTED BY

Overview of Container Management

Containers, Serverless and Functions in a nutshell. Eugene Fedorenko

Developing Kubernetes Services

DEVELOPER INTRO

Kuberiter White Paper. Kubernetes. Cloud Provider Comparison Chart. Lawrence Manickam Kuberiter Inc

Red Hat Roadmap for Containers and DevOps

A Comparision of Service Mesh Options

Kubernetes made easy with Docker EE. Patrick van der Bleek Sr. Solutions Engineer NEMEA

Note: Currently (December 3, 2017), the new managed Kubernetes service on Azure (AKS) does not yet support Windows agents.

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.1

Maximizing Network Throughput for Container Based Storage David Borman Quantum

Infoblox IPAM Driver for Kubernetes User's Guide

S Implementing DevOps and Hybrid Cloud

Containerisation with Docker & Kubernetes

DevOps Technologies. for Deployment

Top Nine Kubernetes Settings You Should Check Right Now to Maximize Security

100% Containers Powered Carpooling

Issues Fixed in DC/OS

Infoblox IPAM Driver for Kubernetes. Page 1

Setting up Kubernetes with Day 2 in Mind. Angela Chin, Senior Software Engineer, Pivotal Urvashi Reddy, Senior Software Engineer, Pivotal

Federated Prometheus Monitoring at Scale

The speed of containers, the security of VMs. KataContainers.io

Oracle Container Services for use with Kubernetes. User's Guide

WHITE PAPER. RedHat OpenShift Container Platform. Benefits: Abstract. 1.1 Introduction

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

Cloud I - Introduction

Table of Contents HOL CNA

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Think Small to Scale Big

Transcription:

Evolution of Kubernetes in One Year From Technical View Harry Zhang

Background

Docker = Fan economy Github search stack overflow DockerCon de facto Docker Kubernetes

diversity Docker Image Image ACI RunC RunV Runtime rkt (AppC) Mesos Containerizer containerd ocid hyperd Containerizer Container Runtime Interface Docker Daemon Kubernetes Mesos Legacy* Control Panel

Architecture

Example proxy kubelet SyncLoop 1 Pod created scheduler api-server etcd proxy kubelet SyncLoop

Example proxy kubelet SyncLoop scheduler 2 Pod object added api-server etcd proxy kubelet SyncLoop

Example proxy 3.1 New pod object detected 3.2 Bind pod with node kubelet SyncLoop scheduler api-server etcd proxy kubelet SyncLoop

Example proxy kubelet SyncLoop scheduler api-server etcd proxy 4.1 Detected pod bind with me 4.2 Start containers in pod kubelet SyncLoop

Desired World Real World controller-manager ControlLoop Architecture pod replica namespace service endpoint job deployment volume petset kubelet SyncLoop proxy api-server etcd proxy scheduler kubelet SyncLoop

Kubernetes Components De-coupling Extensible Event loop driven Distributed data oriented Robust Not only HA, but also avoid some races e.g. should undesired pod be considered by scheduler? No, it should be handled by controller

Bootstrapping

old 1. $ systemctl start etcd Self-healing new 2. $ systemctl start kubelet 3. $ systemctl start kube-proxy 4. $ systemctl start kube-apiserver 1. $ hyperkube kubelet 2. $ hyperkube kube-proxy Done! 5. $ systemctl start kube-controller-manager 6. $ systemctl start kube-scheduler

Self-healing $ hyperkube kubelet config=/etc/kubernetes/manifests Kubelet will load all of those pods And bind them with this node $ ls /etc/kubernetes/manifests etcd.yaml kube-apiserver.yaml kube-scheduler.yaml kube-controller-manager.yaml

Topology etcd api-server scheduler controller-manager kubelet kubelet kubelet kubelet kube-proxy kube-proxy kube-proxy

With CNI Network calico-policycontroller etcd api-server scheduler controller-manager kubelet kubelet kubelet kubelet kube-proxy kube-proxy kube-proxy calico/node calico/node calico/node

Production Topology LB etcd api-server scheduler controller-manager calico-policycontroller etcd api-server scheduler controller-manager kubelet kubelet kubelet kubelet kube-proxy kube-proxy kube-proxy calico/node calico/node calico/node

Scheduling

Pod The atomic scheduling unit* The container (micro-service) design pattern The container in Kubernetes *EuroSys 15, Large-scale cluster management at Google with Borg, Section 8.1

atomic scheduling unit Should sample.war be packaged with Tomcat?

How? new InitContainers: one or more containers started in sequence before the pod's normal containers are started. Share volumes, perform network operations, and perform computation prior to the app containers.

Container Design Pattern* Spring Framework 1. 2. Container Boundary Self-contained Atomic deployment signal ( succeeded / failed ) + tomcat serverless Do right things without modifying your container image * HotCloud 16, Design Patterns for Container-based Distributed Systems

sidecar* *Google Internal: web+logstash, web+git sync etc

Pod Super affinity containers Share network, storage init container + Tomcat log + app Process VS process group But why?

gang scheduling log super affinity app Requirement: swarm peer node 1. 2. Omega 3. app: 1G, log: 0.5G Available: Node_A: 1.25G, Node_B: 2G What happens if app scheduled to Node_A?

Pod Summary Pod = container affinity ( link, volumes-from, net=xxx, ) Kubernetes is for freshman to build high quality systems think about Spring Framework DO NOT modify your image! Anyone still assumes Worldpress + MySQL belongs to one Pod?

Schedule Strategy Predicates NoDiskConflict NoVolumeZoneConflict PodFitsResources PodFitsHostPorts MatchNodeSelector MaxEBSVolumeCount MaxGCEPDVolumeCount CheckNodeMemoryPressure eviction, QoS tiers CheckNodeDiskPressure Priorities LeastRequestedPriority BalancedResourceAllocation SelectorSpreadPriority CalculateAntiAffinityPriority ImageLocalityPriority NodeAffinityPriority

Resource Model requests for scheduling limits for enforcement cpu-shares=requests.cpu cpu-quota=limits.cpu cpu-period=100ms memory=limits.memory Compressible Resource CPU, throttle Incompressible Resource Memory, killed by kernel

QoS Tiers Guaranteed limits is set for all resources, all containers limits == requests (if set) Be killed until they exceed their limits Burstable or if the system is under memory pressure and there are no lower priority containers that can be killed. requests is set for one or more resources, one or more containers limits (if set)!= requests killed once they exceed their requests and no Best-Effort pods exist when system under memory pressure Best-Effort requests and limits are not set for all of the resources, all containers First to get killed if the system runs out of memory

Multi-Scheduler The 2nd scheduler annotation: system usage labels Do NOT abuse labels

Control Panel

DaemonSet Spread daemon pod to every node DaemonSet Controller don t need scheduler even on unschedulable nodes e.g. bootstrap

Deployment Replicas with control Bring up a Replica Set and Pods. Check the status of a Deployment. Update that Deployment (e.g. new image, labels). Rollback to an earlier Deployment revision. Pause and resume a Deployment.

Create ReplicaSet Next generation of ReplicaController record: record command in the annotation of nginx-deployment

Check DESIRED:.spec.replicas CURRENT:.status.replicas UP-TO-DATE: contains the latest pod template AVAILABLE: pod status is ready (running)

Update kubectl set image will change container image kubectl edit open an editor and modify your deployment yaml trigger RollingUpdateStrategy 1 max unavailable 1 max surge can also be percentage Does not kill old Pods until a sufficient number of new Pods have come up Does not create new Pods until a sufficient number of old Pods have been killed.

Update Process The update process is coordinated by Deployment Controller Create: Replica Set (nginx-deployment-2035384211) and scaled it up to 3 replicas directly. Update: created a new Replica Set (nginx-deployment-1564180365) and scaled it up to 1 scaled down the old Replica Set to 2 continued scaling up and down the new and the old Replica Set, with the same rolling update strategy. Finally, 3 available replicas in the new Replica Set, and the old Replica Set is scaled down to 0.

Rolling Back Check reversions Roll back to reversion

Pausing & Resuming (Canary) Tips blue-green deployment: duplicated infrastructure canary release: share same infrastructure rollback resumed deployment is WIP old way: kubectl rolling-update rc-1 rc-2

Horizontal Pod Autoscaling Tips Scale out/in TriggeredScaleUp (GCE, AWS, will add more) Support for custom metrics

Custom Metrics Endpoint (Location to collect metrics from) Name of metric Type (Counter, Gauge,...) Prometheus Data Type (int, float) Units (kbps, seconds, count) Polling Frequency Regexps (Regular expressions to specify which metrics to collect and how to parse them) The metric will be added to pod as ConfigMap volume Nginx

ConfigMap Decouple configuration from image configuration is a runtime attribute Can be consumed by pods thru: env volumes

ConfigMap Volume No need to use Persistent Volume Think about Etcd

Tip: credentials for accessing the k8s API is automatically added to your pods as secret Secret

Downward Api Get these inside your pod as ENV or volume The pod s name The pod s namespace The pod s IP A container s cpu limit A container s cpu request A container s memory limit A container s memory request

Service The unified portal of replica containers Portal IP:Port External load balancer GCE AWS HAproxy Nginx OpenStack

Headless Service *.nginx.default.svc.cluster.local app=nginx app=nginx app=nginx

User-space Service Etcd User Request Pod_IP:Port 1. Service change 2. Endpoint change Node-2 11.1.1.88:80 Node-1 // from container:prerouting target prot opt source destination REDIRECT tcp -- 0.0.0.0/0 11.1.1.88 tcp dpt:80 redir ports 43318 // from HOST:OUTPUT target prot opt source destination DNAT tcp -- 0.0.0.0/0 11.1.1.88 tcp dpt:80 to:10.10.103.58:43318 maintain the rules 43318 kube-proxy Proxier Load Balancer Round-Robin Session Affinity old

iptables Service $ iptables-save grep my-service -A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6 -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80 -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80 new Tip: ipvs solution works in nat mode which is the same with this iptables way

Tips User-space proxy service is slow please use: proxy-mode=iptables requires newer iptables version >= 1.4.11 (released 2011-May-26).

Publishing Services Use Service.Type=NodePort <node_ip>:<node_port> External IP IPs route to one or more cluster nodes (e.g. floating IP) Use external LoadBalancer Require support from IaaS (GCE, AWS, OpenStack) Deploy a service-loadbalancer (e.g. HAproxy) Official guide: https://github.com/kubernetes/contrib/tree/master/serviceloadbalancer

Ingress The next generation external Service load balancer Deployed as a Pod on dedicated Node (with external network) Implementation Nginx, HAproxy, GCE L7 http://foo.bar.com/foo External access for service SSL support for service http://foo.bar.com <IP_of_Ingress_node> s1

TLS Your Ingress TLS Only supports a single TLS port 443

Write Your Ingress Poll until apiserver reports a new Ingress Write the nginx config file based on a go text/template Reload nginx

PetSet Stateless application management Replicas Stateful application management Pets

clustered applications Stable hostname Peer discovery Ordinal index startup/teardown ordering Stable storage linked to the ordinal & hostname Databases like MySQL or PostgreSQL single instance attached to a persistent volume at any time Clustered software like Zookeeper, Etcd, or Elasticsearch, Cassandra stable membership.

PetSet Example $ kubectl patch petset cassandra -p '{"spec":{"replicas":10}}' cassandra-1.cassandra.default.svc.cluster.local cassandra-0.cassandra.default.svc.cluster.local cassandra-0 cassandra-1 volume 0 volume 1

ScheduledJobs Distributed Cron format job with container alpha kubectl run pi --image=perl --restart=onfailure --runat="0 14 21 7 *" -- perl -Mbignum=bpi -wle 'print bpi(2000) July 21th, 2pm Fault-Tolerance Multiple controller processes Ensuring jobs are run at most once Unlike normal pods, there should never be more than one job running at the same time Deterministic name

Network

One Pod One IP Pod /proc/{pid}/ns/net -> net:[4026532483] Network sharing is important for affiliate containers Not all containers need independent network Container A --net=container:pause Infra container Container B Network implementation for pod is totally the same as for single container

Network Model Container reach container all containers can communicate with all other containers without NAT Node reach container all nodes can communicate with all containers (and viceversa) without NAT IP addressing Pod in cluster can be addressed by its IP

Network Implementation Cloud provider CNI plugin e.g. Calico, Flannel etc The kubelet cni flags: --network-plugin=cni --network-plugin-dir=/etc/cni/net.d

Calico Step 1: Run calico-node image as DaemonSet

Calico Step 2: Download and enable calico cni plugin

Calico Step 3: Add calico network controller Done!

Tips host < calico(bgp) < calico(ipip) = flannel(vxlan) = docker(vxlan) < flannel(udp) < weave(udp) Test graph comes from: http://cmgs.me/life/docker-network-cloud

Persistent Volumes There is only one kind of volume in container world -v host_path:container_path networked storage mounted to host_path container volume bind mount container_path with host_path

Advanced Volume Model PersistentVolumeClaims Pod Host Pod mountpath Pod mountpath Persistent Volumes path networked storage Best practice: Persistent volumes should be handled by professionals

Why PV & PVC? Volume path on the host is always shared by containers from different users PVC: Sharing things needs fine-grained control: pod > PVC > PV System Admin: $ kubectl create -f nfs-pv.yaml create a volume with access mode, capacity, recycling mode Dev: $ kubectl create -f pv-claim.yaml request a volume with access mode, resource, selector $ kubectl create -f pod.yaml

Officially Supported PVs GCEPersistentDisk AWSElasticBlockStore AzureFile FC (Fibre Channel) NFS iscsi RBD (Ceph Block Device) CephFS Cinder (OpenStack block storage) Glusterfs VsphereVolume HostPath (single node testing only local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)

SLO Kubernetes SLO API responsiveness: 99% of all API calls return in less than 1s Pod startup time: 99% of pods and their containers (with pre-pulled images) start within 5s.

Summary Kubernetes is not a container management, scheduling, orchestration tool that s compose + swarm Kubernetes: the Spring Framework in cloud area Pod is like IOC in Spring Framework decoupling, refactor & rescue, best practice A framework for people to build right system with container.

THE END Lei (Harry) Zhang @reouser https://hyper.sh Kubernetes Project Member CNCF Member Docker