Managing Openstack in a cloud-native way
Alberto García Marcel Haerry Red Hat Cloud Architect Over 5 years helping companies to adopt emerging technologies Network engineer in a previous life Leading the Architecture of Swisscom s ElasticStack and PaaS Member of CloudFoundry s Technical Advisory Board Automate all the things! Background in SystemEngineering and Software Development
Our motivation
Use Cases https://www.mycloud.ch https://developer.swisscom.com
Modern IT philosophy at Swisscom rapid release cycles to iterate quickly on new features and bugfixes Strong and thorough CI/CD approach. Highly automated and tested before promotion through stages. High availability and scalability as you grow Promoting a devops culture through the teams fault tolerant and secure deployments and lifecycle Building platforms for the next generation workload
Is it doable?
Openstack control plane Components are decoupled: load balancer, messaging bus State is in the database Allows dynamic topologies: Can be scaled in/out based on control plane load due to workload usage Control plane services can be virtualized Openstack dedicated projects for deployment automation
The pacemaker HA approach All in one deployment doesn t scale as it is (rabbitmq, galera) Big VMs doesn t fit well in virtual environments Life cycle of baremetal is slow CI/CD is more complex -> How to iterate on individual components? Clustering software is stateful Binding control plane to infrastructure
HAProxy/Keepalived HA approach Based on Javier Peña s architecture https://github.com/beekhof/osp-ha-deploy/blob/master/ha-keepalived.md Pacemaker free architecture Distributed control plane fits well in this model Virtualization is feasible thanks to flexibility in the services layout design Does not bind application to infrastructure
Seems doable, let s design it
Distributed & virtualized control plane Pulling the pieces apart towards a distributed architecture Horizontal scalable services (wherever possible) virtualized control-plane Isolate shared state (Galera & RabbitMQ)
(Double) Highly Available Architecture Component Web Services HAProxy Mysql Mongo Rabbitmq HA model HAProxy Keepalived Galera Replica-set Rabbitmq native clustering Redis Non-API components Sentinel Resiliency in the application Application Level Infrastructure Level
Modeling the components Control Plane Compute Simple networking, one network for everything Grouping services per major component Including lightweight supporting services in the role Small sized virtual machines Hyperconverged High density hardware Network isolation of storage, control & data Network HA with bonding Part of a layer 3 spine-leaf design Local ephemeral storage
Lifecycle CI/CD Framework Multiple stages to gain confidence in changes Clear separation between code and configuration Puppet & Deployment Orchestrator for Puppet Virtual Machines & Storage described in code ScaleOut purely through API Calls
Storage Hyperconverged compute nodes Cinder with Scaleio scales with the amount of disks & so servers ObjectStore completely externally (Atmos) Glance using external S3 Backend caching of images in the control plane
distributed network services for SDN
Big picture
our journey
Active-Active HA support in Openstack components http://gorka.eguileor.com/simpler-road-to-cinder-active-active/
Bootstrapping clusters
Monitor health automate simple remediations NO MAGICAL RECOVERY
Benefits & drawbacks
Cloud like architecture Control services can be treated as stateless applications Operation of Openstack control plane similar to cloud workloads Dynamic and agile control plane for Openstack Cost effective solution (thanks to virtualization) Openstack control plane does not depend on infrastructure
Cloud like day 2 operations Measurable & scalable per component On-boarding new services -> deploy new roles Parallel deployment of Control Plane for upgrades Backup only the stateful services, restage everything else Redeployment of nodes in case of failure / problems
Drawbacks Not fully A/A ready: Cinder-volume & Galera RabbitMQ/MariaDB don t scale horizontally No magical recovery Network partitions & keepalived Horizon needs sticky sessions -> RRDNS does not work
Future work OpenStack components Build services A/A from the beginning Built-in health-endpoints in services (e.g. query from HAProxy or monitoring) Deployment Packaging deployment as containers (Kolla?!) Architecture Decoupling storage from compute?
THANK YOU