Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto

Similar documents
KUBERNETES IN A GROWN ENVIRONMENT AND INTEGRATION INTO CONTINUOUS DELIVERY

Kubernetes - Networking. Konstantinos Tsakalozos

Kubernetes: Twelve KeyFeatures

Project Calico v3.2. Overview. Architecture and Key Components. Project Calico provides network security for containers and virtual machine workloads.

Continuous delivery while migrating to Kubernetes

More Containers, More Problems

NGINX: From North/South to East/West

Project Calico v3.1. Overview. Architecture and Key Components

Kubernetes introduction. Container orchestration

ENHANCE APPLICATION SCALABILITY AND AVAILABILITY WITH NGINX PLUS AND THE DIAMANTI BARE-METAL KUBERNETES PLATFORM

Implementing SaaS on Kubernetes

CONTAINERS AND MICROSERVICES WITH CONTRAIL

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

MSB to Support for Carrier Grade ONAP Microservice Architecture. Huabing Zhao, PTL of MSB Project, ZTE

OpenShift 3 Technical Architecture. Clayton Coleman, Dan McPherson Lead Engineers

Building an on premise Kubernetes cluster DANNY TURNER

Building Kubernetes cloud: real world deployment examples, challenges and approaches. Alena Prokharchyk, Rancher Labs

Federated Prometheus Monitoring at Scale

Table of Contents HOL CNA

Kubernetes on Openstack

Wolfram Richter Red Hat. OpenShift Container Netzwerk aus Sicht der Workload

Hacking and Hardening Kubernetes

Kuber-what?! Learn about Kubernetes

Package your Java Application using Docker and Kubernetes. Arun

Developing Kubernetes Services

Kubernetes and the CNI: Where we are and What s Next Casey Callendrello RedHat / CoreOS

WHITE PAPER. RedHat OpenShift Container Platform. Benefits: Abstract. 1.1 Introduction

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack.

Kubernetes 1.8 and Beyond

Kubernetes and the CNI: Where we are and What s Next Casey Callendrello RedHat / CoreOS

Setting up Kubernetes with Day 2 in Mind. Angela Chin, Senior Software Engineer, Pivotal Urvashi Reddy, Senior Software Engineer, Pivotal

10 Kube Commandments

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Kubernetes. An open platform for container orchestration. Johannes M. Scheuermann. Karlsruhe,

Code: Slides:

An Introduction to Kubernetes

Enterprise Kubernetes

OPENSHIFT 3.7 and beyond

VMWARE PIVOTAL CONTAINER SERVICE

Contrail Networking: Evolve your cloud with Containers

Operating Within Normal Parameters: Monitoring Kubernetes

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.1

Kubernetes made easy with Docker EE. Patrick van der Bleek Sr. Solutions Engineer NEMEA

Continuous Delivery of Micro Applications with Jenkins, Docker & Kubernetes at Apollo

Securing Microservice Interactions in Openstack and Kubernetes

OpenShift Roadmap Enterprise Kubernetes for Developers. Clayton Coleman, Architect, OpenShift

Two years of on Kubernetes

How to Re-Architect without Breaking Stuff (too much) Owen Garrett March 2018

What s New in K8s 1.3

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Top Nine Kubernetes Settings You Should Check Right Now to Maximize Security

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Accelerate at DevOps Speed With Openshift v3. Alessandro Vozza & Samuel Terburg Red Hat

Kubernetes 101. Doug Davis, STSM September, 2017

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Taming your heterogeneous cloud with Red Hat OpenShift Container Platform.

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

TEN LAYERS OF CONTAINER SECURITY

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Docker All The Things

Kubernetes. Introduction

Table of Contents HOL CNA

Project Kuryr. Here comes advanced services for containers networking. Antoni Segura

Managing your microservices with Kubernetes and Istio. Craig Box

What s New in K8s 1.3

Cloud I - Introduction

The Long Road from Capistrano to Kubernetes

Introduction to Kubernetes

Kuberiter White Paper. Kubernetes. Cloud Provider Comparison Chart. Lawrence Manickam Kuberiter Inc

Docker Enterprise Edition 2.0 Platform Public Beta Install and Exercises Guide

A Comparision of Service Mesh Options

OpenShift Dedicated 3 Release Notes

Multiple Networks and Isolation in Kubernetes. Haibin Michael Xie / Principal Architect Huawei

Getting Started with VMware Integrated OpenStack with Kubernetes. VMware Integrated OpenStack 5.1

Triangle Kubernetes Meet Up #3 (June 9, 2016) From Beginner to Expert

agenda PAE Docker Docker PAE

Cloud Native Networking

Overview of Container Management

StarlingX. StarlingX is aligned with the OpenStack Foundation Edge Working Group and the Linux Foundation Akraino Edge Stack.

Service Mesh and Microservices Networking

OpenStack Magnum Pike and the CERN cloud. Spyros

Secure Kubernetes Container Workloads

Evolution of Kubernetes in One Year From Technical View

Exam : Implementing Microsoft Azure Infrastructure Solutions

UP! TO DOCKER PAAS. Ming

The Art of Container Monitoring. Derek Chen

Virtual Infrastructure: VMs and Containers

Prometheus For Big & Little People Simon Lyall

Kubernetes Container Networking with NSX-T Data Center Deep Dive

Life of a Packet. KubeCon Europe Michael Rubin TL/TLM in GKE/Kubernetes github.com/matchstick. logo. Google Cloud Platform

Kubernetes 1.9 Features and Future

VMWARE ENTERPRISE PKS

DevOps Technologies. for Deployment

A REFERENCE ARCHITECTURE FOR DEPLOYING WSO2 MIDDLEWARE ON KUBERNETES

Taming Distributed Pets with Kubernetes

100% Containers Powered Carpooling

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Networking Approaches in. a Container World. Flavio Castelli Engineering Manager

Table of Contents. Section 1: Overview 3 NetScaler Summary 3 NetScaler CPX Overview 3

Kubernetes: What s New

Transcription:

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia Alexandros Kosiaris Giuseppe Lavagetto

Introduction The Wikimedia Foundation is the organization running the infrastructure supporting Wikipedia and other projects Monthly stats: 17 Billion page views 43 Million edits 323 Thousands new registered users

Introduction (2) 2 Primary DCs (Ashburn, Dallas) 3 Caching DCs (San Francisco, Amsterdam, Singapore) ~1200 hardware machines ~100 VMs Size of engineering: ~160 people SRE team: ~ 20 people (4 dedicated to the application layer) We only write and use FLOSS

2014 Reasons for introducing kubernetes 2018

Also But Elasticity Single-node failure management Containers Power to deployers! More moving parts New paradigm( ) Containers

Why on bare metal? No public cloud User s privacy guarantee Already maintaining our own infrastructure and CDN Costs No private cloud We actually run Kubernetes on OpenStack, for a different project Reducing moving parts No practical advantages for production services

Cluster setup (1) We build our own Debian packages Configure all cluster components via Puppet TLS included Kubernetes 1.7 (on the way to 1.8 upgrade) Calico 2.2.0 Etcd 2.2.1 (scheduled upgrade to 3.x) But not the kubernetes resources! API servers are set up highly available (2 per cluster) Kube-scheduler and kube-controller-manager run on the same hosts as apiserver

Cluster setup (2) 2 production clusters (1 per primary DC), 1 staging Separated Etcd clusters (3 servers, DC-local, on Ganeti VMs) Kube-proxy in iptables mode Docker as the runtime, but interested in rkt We host our own registry (and it s the only allowed registry) No image pulling from Dockerhub! The backend of image storing is Openstack Swift docker pull docker-registry.wikimedia.org/wikimedia-stretch:latest

Cluster setup (3) RBAC enabled since day #1 (we delayed rolling out a bit for that) 1 namespace per application/service Token-auth-file authentication Standard admission controllers for our version enabled Firewalling enabled on all nodes via ferm. This works better than we feared!

Networking diagram

Networking Machines distributed per rack row for redundancy For backwards compatibility reasons we wanted to avoid overlay networking for pods And calico fitted very nicely our networking model Calico BGP speakers on every node and the 2 routers No BGP Full mesh (cause the next hop is the router) But will possibly have row specific full mesh RFC1918 10.x/8 address for the pods, but fully routable in our network. RFC1918 10.x/8 address for the Service IPs too, but: Those are effectively just reservations Are there to avoid surprises Pods have IPv6 address as well. Thank you Calico! net.ipv6.conf.all.forwarding=1 net.ipv6.conf.eth0.accept_ra=2

Network Policies Kubernetes 1.7 does not support egress (but 1.8 does) But Calico does Also does not allow changing a NetworkPolicy resource Alternative: We ve patched calico-k8s-policy controller 0.6.0 (the python one) Added reading a config file containing an enforced standard egress policy Populate the file using a ConfigMap Patch is minimal: 14 LoC in total But also already deprecated. Next version is in Go

Ingress What about Ingress? Evaluated it and decided to hold on it for now. We don t even need the niceties yet. Use in-house python daemon running on load balancers (PyBal) We do NodePort with externalips And PyBal manages LVS DR entries on the load balancers A lot of expertise in house regarding PyBal, reusing it sounded the best approach Open Source: https://github.com/wikimedia/pybal

Metrics collection Infrastructure (Prometheus) The discovery mechanisms rule Applications (Prometheus, yes that too!) Polling all API servers Polling all kubelets Polling kubelet cadvisors as well (hello kubernetes 1.7.3!!!) Applications historically used statsd prometheus_statsd_exporter in a sidecar container, applications talk to localhost :-) Prometheus discovers and polls all pods https://grafana.wikimedia.org/dashboard/db/kubernetes

Alerting Prometheus again! Albeit only partly Still on icinga 1.11.6 Started by using check_prometheus_metric.sh And it did not support floats Add in Puppet and the checks can be a bit intimidating to look at Rewrote the whole thing in python query => "scalar(\ sum(rate(kubelet_runtime_operations_latency_microseconds_sum{\ job=\"k8s-node\", instance=\"${::fqdn}\"}[5m]))/ \ sum(rate(kubelet_runtime_operations_latency_microseconds_count{\ job=\"k8s-node\", instance=\"${::fqdn}\"}[5m])))", Swagger spec based monitoring to instrument checks https://github.com/wikimedia/operations-software-service-checker

Streamlined Service Delivery We re-thought the whole software lifecycle for things that will run on Kubernetes, from development to deployment

Deployment Helm setup: Tiller resides on the namespace of the application and has specific RBAC rights A deploy user is only granted rights to talk to Tiller Unix users with rights to deploy to a namespace get access to the corresponding credentials A simple wrapper around helm ensures the correct credentials are used

Deployment Also: Deploy to both datacenters We (will) support canary deployments 2 helm releases, canary and production Only production declares a Service, which selects all pods from both releases New release of the software goes to canary, once all tests and metrics are green, it gets deployed to production Goal is to rewrite the wrapper as helm plugins Our own (very new!) helm repo: https://releases.wikimedia.org/charts/

THANK YOU

SRE team is hiring! https://jobs.wikimedia.org https://github.com/wikimedia/pybal https://grafana.wikimedia.org/dashboard/db/kubernetes https://github.com/wikimedia/operations-software-service-checker https://releases.wikimedia.org/charts/ https://docker-registry.wikimedia.org

Questions?