The CERN OpenStack Cloud

Similar documents
The CERN OpenStack Cloud

The CERN OpenStack Cloud

Five years of OpenStack at CERN

Upcoming Services in OpenStack Rohit Agarwalla, Technical DEVNET-1102

"Charting the Course... H8Q14S HPE Helion OpenStack. Course Summary

Minimal OpenStack Starting Your OpenStack Journey

Introduction To OpenStack. Haim Ateya Group Manager, EMC

OpenStack Mitaka Release Overview

Jumpstart your Production OpenStack Deployment with

DEEP DIVE: OPENSTACK COMPUTE

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

OpenStack Architecture and Pattern Deployment with Heat. OpenStack Architecture and Pattern Deployment using Heat Ruediger Schulze

IN2P3-CC cloud computing (IAAS) status FJPPL Feb 9-11th 2016

Road to Private Cloud mit OpenStack Projekterfahrungen

Build Cloud like Rackspace with OpenStack Ansible

Accelerate OpenStack* Together. * OpenStack is a registered trademark of the OpenStack Foundation

Red Hat Enterprise Linux OpenStack Platform User Group.

CLOUD INFRASTRUCTURE ARCHITECTURE DESIGN

Red Hat OpenStack Platform 10 Product Guide

Deterministic Storage Performance

SUSE OpenStack Cloud Production Deployment Architecture. Guide. Solution Guide Cloud Computing.

Introduction to OpenStack

OpenStack Magnum Pike and the CERN cloud. Spyros

Contrail Cloud Platform Architecture

Helion OpenStack Carrier Grade 4.0 RELEASE NOTES

White Paper The Storage System Best Suited for OpenStack FUJITSU Storage ETERNUS DX S4/S3 series and ETERNUS AF series

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

NephOS. A Single Turn-key Solution for Public, Private, and Hybrid Clouds

VMware + OpenStack. Dan Wendlandt Director of Product Management VMware VMware Inc. All rights reserved.

OSDC.de 2013 Introduction to OpenStack. Justin Clift Open Source & Standards RH 17 April 2013

OpenStack Cloud Storage. PRESENTATION TITLE GOES HERE Sam Fineberg HP Storage

Contrail Cloud Platform Architecture

Intel, OpenStack, & Trust in the Open Cloud. Intel Introduction

Deterministic Storage Performance

Infrastructure-as-Code and CI Infrastructure at Open Stack A look at one of the largest CI systems and system administration

DEPLOYING NFV: BEST PRACTICES

OpenStack Magnum Hands-on. By Saulius Alisauskas and Bryan Havenstein

An Introduction to Red Hat Enterprise Linux OpenStack Platform. Rhys Oxenham Field Product Manager, Red Hat

Huawei FusionSphere 6.0 Technical White Paper on OpenStack Integrating FusionCompute HUAWEI TECHNOLOGIES CO., LTD. Issue 01.

HPE HELION CLOUDSYSTEM 9.0. Copyright 2015 Hewlett Packard Enterprise Development LP

THE CEPH POWER SHOW. Episode 2 : The Jewel Story. Daniel Messer Technical Marketing Red Hat Storage. Karan Singh Sr. Storage Architect Red Hat Storage

Highly Available OpenStack Deployments with NetApp & Red Hat's OpenStack platform June 26, 2015

StorPool Distributed Storage Software Technical Overview

OpenStack Manila An Overview of Manila Liberty & Mitaka

NephOS. A Single Turn-key Solution for Public, Private, and Hybrid Clouds

Introduction to OpenStack

OPENSTACK Building Block for Cloud. Ng Hwee Ming Principal Technologist (Telco) APAC Office of Technology

Visita delegazione ditte italiane

BCS EXIN Foundation Certificate in OpenStack Software Syllabus

Cloud Storage. Patrick Osborne Director of Product Management. Sam Fineberg Distinguished Technologist.

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

Clouds in High Energy Physics

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Application Centric Microservices Ken Owens, CTO Cisco Intercloud Services. Redhat Summit 2015

COMMUNITY-GENERATED ROADMAP

Spawning Virtual HPCs Using OpenStack

Deployment Guide for Nuage Networks VSP

Deployment Guide for Nuage Networks VSP

HPE Helion OpenStack Carrier Grade 1.1 Release Notes HPE Helion

Introduction to OpenStack Trove

Case Study on Enterprise Private Cloud

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack

CONSIDERATIONS FOR YOUR NEXT CLOUD PROJECT CLOUDFORMS & OPENSTACK DO S AND DON TS

클라우드스토리지구축을 위한 ceph 설치및설정

Human Centric. Innovation. OpenStack = Linux of the Cloud? Ingo Gering, Fujitsu Dirk Müller, SUSE

Cloud Essentials for Architects using OpenStack

Real-time Monitoring, Inventory and Change Tracking for. Track. Report. RESOLVE!

File system, 199 file trove-guestagent.conf, 40 flavor-create command, 108 flavor-related APIs list, 280 show details, 281 Flavors, 107

Contrail Release Release Notes

CERN IT procurement update HEPiX Autumn/Fall 2018 at Port d Informació Científica

A product by CloudFounders. Wim Provoost Open vstorage

StarlingX. StarlingX is aligned with the OpenStack Foundation Edge Working Group and the Linux Foundation Akraino Edge Stack.

OpenStack Summit Half-Day Track

WHAT S NEW IN OPENSTACK LIBERTY

Building a Video Optimized Private Cloud Platform on Cisco Infrastructure Rohit Agarwalla, Technical

HP Helion OpenStack Carrier Grade 1.1: Release Notes

CloudOpen Europe 2013 SYNNEFO: A COMPLETE CLOUD STACK OVER TECHNICAL LEAD, SYNNEFO

Securing Microservice Interactions in Openstack and Kubernetes

Integrated Management of OpenPOWER Converged Infrastructures. Revolutionizing the Datacenter

A comparison of performance between KVM and Docker instances in OpenStack

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Enterprise Ceph: Everyway, your way! Amit Dell Kyle Red Hat Red Hat Summit June 2016

OpenStack in 10 minutes with DevStack

Kubernetes Integration with Virtuozzo Storage

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Build your own Cloud on Christof Westhues

Block Storage Service: Status and Performance

Isn't it Ironic? Managing a bare metal cloud. devananda.github.io/talks/isnt-it-ironic.html. Devananda van der Veen

Chapter 5 C. Virtual machines

COMP4442. Service and Cloud Computing. Lecture 04: OpenStack. Prof. George Baciu PQ838.

HPE Digital Learner OpenStack Content Pack

Virtualizing Oracle on VMware

BRKDCT-1253: Introduction to OpenStack Daneyon Hansen, Software Engineer

Migration Strategies from vsphere to Linux and OpenStack via a shared virtualized network

Oracle Solaris Virtualization: From DevOps to Enterprise

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Roles. Ecosystem Flow of Information between Roles Accountability

Transcription:

The CERN OpenStack Cloud Jan van Eldik for the CERN Cloud Team 2

Agenda Objective: why a private Cloud at CERN? Bulk compute, server consolidation Overview of OpenStack Cloud service Components & numbers Organization humans in the loop Operations Upgrades, support, performance, Outlook 3

Why a private Cloud? Compute needs of the organization growing and changing Bulk compute Compute-intensive Low SLA the more, the better Server consolidation Web servers, dev boxes, build nodes, Linux, Windows Cattle *and* pets 4

Virtualization helps a lot Stop handing out physical boxes Customers {do not care,do not want to care,should not care} about managing hardware Hardware over-dimensioned for most usecases and we only buy one type Good experience with earlier Virtualization Kiosk 3000 Linux and Windows VMs Hyper-V, managed by Microsoft VMM 5

Enter the Cloud Cloud promise attractive for CERN Provide API to customers Scale out resources to CERN-size Lots of potential beyond our earlier offerings Bursting, federating, block storage, Single pane of glass 2012: decide to consolidate resource provisioning under a single service OpenStack based Pets *and* Cattle KVM *and* HyperV 6

300000 VMs! IT/CM Group Meeting - Jan 12, 2016 7

CERN OpenStack Project July 2013 CERN OpenStack Production February 2014 CERN OpenStack Havana October 2014 CERN OpenStack Icehouse March2015 CERN OpenStack Juno September 2015 CERN OpenStack Kilo September 2016 CERN OpenStack Liberty 5 April 2012 27 September 2012 4 April 2013 17 October 2013 17 April 2014 16 October 2014 30 April 2015 15 October 2015 ESSEX FOLSOM GRIZZLY HAVANA ICEHOUSE JUNO KILO LIBERTY MITAKA Nova (*) Swift Glance (*) Horizon (*) Keystone (*) Nova (*) Swift Glance (*) Horizon (*) Keystone (*) Quantum Cinder Nova Swift Glance Horizon Keystone Quantum Cinder Ceilometer (*) Nova Swift Glance Horizon Keystone Neutron Cinder Ceilometer (*) Heat Nova Swift Glance Horizon Keystone Neutron Cinder Ceilometer Heat Ironic Trove Nova Swift Glance Horizon Keystone Neutron Cinder Ceilometer Heat (*) Rally (*) Nova Swift Glance Horizon Keystone Neutron Cinder Ceilometer Heat Rally Manila Nova Swift Glance Horizon Keystone Neutron (*) Cinder Ceilometer Heat Rally Magnum (*) Barbican (*) Ironic Nova Swift Glance Horizon Keystone Neutron Cinder Ceilometer Heat Rally Magnum Barbican Ironic Mistral Manila (*) Pilot Trial 8

Nova Cells Top level cell Runs API service Top cell scheduler Child cells run Compute nodes Scheduler Conductor 40+ cells Version 2 coming Default for all 9

Block storage Predominantly Ceph, Geneva * 2, Budapest Some NetApp From ~200TB total to ~45Cinde0 TB of RBD + 50 TB RGW Example: ~25 puppet masters reading node configurations at up to 40kHz Ceph is backend for Glance images CephFS with Manila is under investigation for cluster filesystems OpenStack Glance + Cinder iops 11

Operations: Containers Magnum: OpenStack project to treat Container Orchestration Engines (COEs) as 1 st class resources Production service since Q4 2016 - Support for Docker Swarm, Kubernetes, Mesos - Storage drivers for (CERN-specific) EOS, CVMFS Many users interested, usage ramping up - GitLab CI, Jupyter/Swan, FTS, 12

Some selected numbers 7000 compute nodes 190K cores, ~70% in compute-intensive VMs 40+ cells 22000 Virtual Machines 800 shared projects, 2478 user projects 100s of VMs created/deleted each hour 3000 Volumes in Ceph 13

OpenStack@CERN Status In production: >190K cores >7000 hypervisors ~100,000 additional cores being installed in next 6 months 90% of CERN s compute resources are now delivered on top of OpenStack 14

Humans in the loop Multiple CERN teams Openstack team: service managers and service developers Procurement, hardware mgmt. teams Ceph, DBoD teams Upstream OpenStack superusers Development, large deployment, board representative RDO Packaging, testing 15

Operations User support Tickets, documentation, etc. Deployment RDO packages, with some customizations/selected patches Puppetized configuration Upgrades Component by component, twice a year Operating systems CentOS 7.2 -> 7.3 ongoing SLC6 ( CentOS6 ) phase-out finished HyperV -> KVM migrations have started 16

VM migrations 5000 VMs migrated to newer hardware Live when possible Cold when necessary (CPU architecture, attached volumes) To retire older hosts Upgraded SLC6 hypervisors to CentOS7 Pre-req for upgrade to Nova Liberty 18

Automate provisioning with Automate routine procedures - Common place for workflows - Clean web interface - Scheduled jobs, cron-style - Traceability and auditing - Fine-grained access control - Disable compute node Disable nova-service Switch Alarms OFF Update Service-Now ticket Procedures for - OpenStack project creation - OpenStack quota changes - Notifications of VM owners - Usage and health reports - Notifications Other tasks Send e-mail to VM owners Save intervention details Send calendar invitation Add remote AT job Post new message broker 19

Optimizing CPU performance 70% of cores used for compute-intensive workloads Low-SLA CPU passthrough NUMA-aware flavours CPU overhead ~3% for 24-core VMS (details in backup slides) 20

Outlook Operate & grow the service Add new hardware, upgrade components, Get experience with Magnum Nova-network -> Neutron On a per-cell basis Phase-out HyperV, NetApp Investigate possible service additions Ironic, Manila, Neutron LBaaS, 21

Summary OpenStack at CERN in production since 3.5 years - We re working closely with the various communities - OpenStack, Ceph, RDO, Puppet, Cloud service continues to grow and mature - While experimental, good experience with Nova cells for scaling - Experience gained helps with general resource provisioning - New features added (containers, identity federation) - Expansion planned (bare metal provisioning) Confronting some major operational challenges - Transparent retirement of service hosts - Replacement of network layer http://openstack-in-production.blogspot.com (read about our recent 7M req/s Magnum & Kubernetes!) 22

10 9 HS06 sec / month PB/year LHC: Data & Compute Growth 450.0 400.0 350.0 300.0 250.0 200.0 150.0 CMS ATLAS ALICE LHCb 100.0 Collisions produce ~1 PB/s 50.0 0.0 Run 1 Run 2 Run 3 Run 4 160 140 120 100 80 60 40 GRID ATLAS CMS LHCb ALICE What we can afford 20 0 Run 1 Run 2 Run 3 Run 4 24

OpenStack at CERN In production: 4 clouds >200K cores >8,000 hypervisors 90% of CERN s compute resources are now delivered on top of OpenStack 25

CERN Cloud Architecture (1) Deployment spans our two data centers - 1 region (to have 1 API), ~40 cells - Cells map use cases hardware, hypervisor type, location, users, Top cell on physical and virtual nodes in HA - Clustered RabbitMQ with mirrored queues - API servers are VMs in various child cells Child cell controllers are OpenStack VMs - One controller per cell - Tradeoff between complexity and failure impact 26

CERN Cloud in Numbers (2) Rally down Every 10s a VM gets created or deleted in our cloud! 2 700 images/snapshots - Glance on Ceph 2 300 volumes - Cinder on Ceph (& NetApp) in GVA & Wigner Only issue during 2 years in prod: Ceph Issue 6480 27

dev environment Simulate the full CERN cloud environment on your laptop, even offline Docker containers in a Kubernetes cluster - Clone of central Puppet configuration - Mock-up of - our two Ceph clusters - our secret store - our network DB & DHCP - Central DB & Rabbit instances - One POD per service Change & test in seconds Full upgrade testing 28

Rich Usage Spectrum Batch service - Physics data analysis IT Services - Sometimes built on top of other virtualised services Experiment services - E.g. build machines Engineering services - E.g. micro-electronics/chip design Infrastructure services - E.g. hostel booking, car rental, Personal VMs - Development rich requirement spectrum! 29

Usecases (1) Server consolidation: Service nodes, dev boxes, Personal VMs, Performance less important than durability Live-migration is desirable Persistent block storage is required Linux VM @ KVM, Windows VMs @ HyperV Starting to run Win VMs under KVM Pets usecase : 32K cores, 7500 VMs 30

Usecases (2) Compute workloads Optimize for compute efficiency CPU passthrough, NUMA aware flavours Still, very different workloads IT Batch: LSF and HTCondor, longlived VMs, 8 and 16-core VMs, full-node flavors CMS Tier-0: medium-long, 8-core VMs LHCb Vcycle: short-lived, single core VMs Low-SLA, cattle usecase 150K cores, 12500 VMs @ 6000 compute nodes 31

Wide Hardware Spectrum The ~6700 hypervisors differ in - Processor architectures: AMD vs. Intel (av. features, NUMA, ) - Core-to-RAM ratio (1:2, 1:4, 1:1.5,...) - Core-to-disk ratio (going down with SSDs!) - Disk layout (2 HDDs, 3 HDDs, 2 HDDs + 1 SSD, 2 SSDs, ) - Network (1GbE vs 10 GbE) - Critical vs physics power - Physical location (Geneva vs. Budapest) - Network domain (LCG vs. GPN vs. TN) - CERN CentOS 7, RHEL7, SLC6, Windows - Variety reflected/accessible via instance types, cells, projects variety not necessarily visible to users! - We try to keep things simple and hide some of the complexity - We can react to (most of the) requests with special needs 32

Basic Building Blocks: Instance Types Name vcpus RAM [GB] Ephemeral [GB] m2.small 1 1.875 10 m2.medium 2 3.750 20 m2.large 4 7.5 40 m2.xlarge 8 15 80 m2.2xlarge 16 30 160 m2.3xlarge 32 60 320 ~80 flavors in total (swap, additional/large ephemerals, core-to- RAM, ) 33

Basic Building Blocks: Volume Types Name IOPS b/w [MB/s] feature Location Backend standard 100 80 GVA io1 500 120 fast GVA cp1 100 80 critical GVA cpio1 500 120 critical/fast GVA cp2 n.a. 120 critical/windows GVA wig-cp1 100 80 critical WIG wig-cpio1 500 120 critical/fast WIG m2.* flavor family plus volumes as basic building blocks for services 34

Operations: Network Migration We ll need to replace nova-network - It s going to be deprecated (really really really this time) New potential features (Tenant networks, LBaaS, Floating IPs, ) We have a number of patches to adapt to the CERN network constraints - We patched nova for each release - neutron allows for out-of-tree plugins! 35

Operations: Federated Clouds Public Cloud such as Rackspace or IBM Many Others on Their Way NecTAR Australia Brookhaven National Labs ALICE Trigger 9K cores CERN Private Cloud 160K cores INFN Italy ATLAS Trigger 28K cores CMS Trigger 13K cores Access to EduGAIN users via Horizon - Allow (limited) access given appropriate membership 36

IO Performance Issues Symptoms - High IOwait - Erratic response times User expectation - Ephemeral HDD Mirrored spinning drives on the host - 100-200 IOPS shared among guests 37

IO Performance: Scheduling deadline default (on RH s virtual-guest profile) - prefers reads (heavy load) - can block writes up to 5 secs! - sssd DB update at login time - CFQ better suited IO benchmark starts Switch from deadline to CFQ IO benchmarks continues 38

IO: Cinder Volumes at Work IOPS QoS changed from 100 to 500 IOPS ATLAS TDAQ monitoring application Y- Axis: CPU % spent in IOwait Blue: CVI VM (h/w RAID-10 with cache) Yellow: OpenStack VM Going to 1000 IOPS doesn t help. If the application is not designed to keep enough requests in-flight, more IOPS do not reduce the time spent in IOwait. We need to reduce the latency as well 39

IO: Block-level Caching Single SSD to cache in front of HDDs - flashcache, bcache, dm-cache, We chose bcache: - Easy to setup - Strong error handling On a 4 VM hypervisor: ~25-50 IOPS/VM ~1k-5k IOPS/VM 40

IO: bcache and Latency Use a VM on a bcache hypervisor Blue: CVI VM (h/w RAID-10 w/ cache) Yellow: OpenStack VM Red: OpenStack on bcache HV SSD block level caching sufficient for IOPS and latency demands. 41

IO: ZFS ZIL/l2arc Meanwhile: SSD-only hypervisors - Fast, but (too) small for some use cases Re-use cache idea leveraging ZFS features - Have partitions on the VM (on the SSDs!) which ZFS can use for the l2arc cache and the ZFS Intent Log - Use a fast volumes as the backing store We re testing this setup at the moment 42

CPU Performance Issues The benchmarks on full-node VMs was about 20% lower than the one of the underlying host - Smaller VMs much better Investigated various tuning options - KSM*, EPT**, PAE, Pinning, +hardware type dependencies - Discrepancy down to ~10% between virtual and physical Comparison with Hyper-V: no general issue - Loss w/o tuning ~3% (full-node), <1% for small VMs - NUMA-awareness! *KSM on/off: beware of memory reclaim! **EPT on/off: beware of expensive page table walks! 43

CPU: NUMA NUMA-awareness identified as most efficient setting - Full node VMs have ~3% overhead in HS06 EPT-off side-effect - Small number of hosts, but very visible there Use 2MB Huge Pages - Keep the EPT off performance gain with EPT on More details in this talk 44

Operations: NUMA/THP Roll-out Rolled out on ~2 000 batch hypervisors (~6 000 VMs) - HP allocation as boot parameter reboot - VM NUMA awareness as flavor metadata delete/recreate Cell-by-cell (~200 hosts): - Queue-reshuffle to minimize resource impact - Draining & deletion of batch VMs - Hypervisor reconfiguration (Puppet) & reboot - Recreation of batch VMs Whole update took about 8 weeks - Organized between batch and cloud teams - No performance issue observed since 45

Future Plans Investigate Ironic (Bare metal provisioning) - OpenStack as one interface for compute resource provisioning - Allow for complete accounting - Use physical machines for containers Replace Hyper-V by qemu/kvm - Windows expertise is a scarce resource in our team - Reduce complexity in service setup 46