High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack Eve Kleinknecht Principal Product Manager Thorsten Früauf Principal Software Engineer November 18, 2015
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 3
Agenda 1 2 3 4 OpenStack on Oracle Solaris Oracle Solaris Cluster for OpenStack HA for OpenStack cloud controller on Oracle Solaris two main topologies to achieve HA fine grained approach blackbox approach pros/cons for those topologies Discussion - Q / A 4
OpenStack Overview What is OpenStack? Open source cloud software Generic solution for IaaS, PaaS and SaaS Oracle OpenStack optimized for Database as a Service, Java as a Service Combines compute, network and storage resources Self-service dashboard Services exposed through REST APIs Single Management Pane VM VM VM Virtualized Data Center Resources 5
OpenStack Services Overview of Core Components Component Description Component Description Nova Compute virtualization Glance Image management and deployment Cinder Block storage Swift Object storage Neutron Software defined networking Heat Application and VM orchestration Keystone Authentication between cloud services Murano Application catalog Horizon Web based dashboard Trove Database as a Service 6
OpenStack Across Oracle s Portfolio Built into the Infrastructure Horizon Centralized Cloud Management Nova / Ironic Self-Service Compute and Bare Metal Neutron Software Defined Networking Cinder / Swift Cloud Scale Storage Heat / Glance Murano / Trove Platform as a Service Zones and Kernel Zones Elastic Virtual Switch and Open vswitch ZFS File System Unified Archives 7
Benefits of Running OpenStack on Oracle Solaris OS. Virtualization. SDN. OpenStack. Complete. Engineered for security and compliance Minimal privileges for cloud services Lock down infrastructure with immutability Assured reliability and scale Automatic service restart and node dependencies Guaranteed data integrity Seamless upgrade, instant roll-back 8
Agenda 1 2 3 4 OpenStack on Oracle Solaris Oracle Solaris Cluster for OpenStack HA for OpenStack cloud controller on Oracle Solaris Discussion - Q / A 9
Mission-Critical Cloud Requirements If you need: Mission-critical service level Minimal downtime for maintenance Business Continuity Oracle Solaris Cluster delivers: Local, fast, automatic failover for application and services Managed switchover of applications and resources among servers or sites Safe, reliable, orchestrated recovery from site failure 10
Oracle Solaris Cluster Functions Monitor health of all cluster components: Servers, storage, network, OS, virtual machines, applications Deliver resiliency to failures through Hardware redundancy Robust cluster protection algorithms Policy-based cluster infrastructure and applications recovery procedures Enable low-impact maintenance 11
Oracle Solaris Cluster Services Data services: failover, scalable Storage services: global file system, failover, scalable Network services: logical hostname, load balancing Dependencies management Monitoring services 12
Applications High Availability Built-in application agents Fine-grained control of application: specific start, stop and probing procedures Do not require any change in application Fully tested in physical and virtualized environment Build-your-own agent toolkit for easy creation of custom agents 13
Oracle Solaris Cluster and Virtualization Choice of VM or application centric model Choice of technology: Oracle VM for SPARC domain or zone Built-in asset optimization with load balancing, affinity and dependency management at application or VM level VM VM Application Failover Fine-grained control of application inside zone or domain Workload Failover: Zone or domain is blackbox app web db 14
Failover Zones : VM HA Managed zone switchover with cold, warm or live migration (kernel zone) Automatic zone restart or zone failover upon node failure No modification of workload Dependencies and load management at zone level Planned Maintenance: Workload migration Unplanned Outage: Immediate workload restart or failover VM VM 15
Zone Clusters: Application HA with Virtualization Application specific protection: policy based management and fault isolation Ease of use : configuration and administration across virtual cluster Security isolation: delegated administration and security model extended across cluster Dependencies and load management at application level app web db Solaris 11 zone cluster Solaris 11 zone cluster zone cluster Solaris 11 Solaris 11 Solaris 11 16
Agenda 1 2 3 4 OpenStack on Oracle Solaris Oracle Solaris Cluster for OpenStack HA for OpenStack cloud controller on Oracle Solaris Discussion - Q / A 17
HA approaches for the OpenStack cloud controller A) fine grained control over OpenStack services by Solaris Cluster " best practices as found in other Oracle Optimized Solutions for multi-tiered applications and the approach taken on Linux (OpenStack HA guide) " published white paper describes this approach with specific example " prioritize fast failure detection and recovery time of individual services B) blackbox approach by using HA failover kernel zones " prioritize simplicity of administration " Solaris Cluster manages the kernel zones to protect against global node failures 18
Example HA node deployment Example HA OpenStack node deployment: Clustered Cloud controller nodes with Oracle Solaris Cluster (OSC) Clustered Oraccle ZFS storage appliance (ZFS SA) shared storage for OSC quorum device for OSC Cinder driver for iscsi targets provided to nova compute Swift storage nodes (optional) configure HA Swift ring 19
HA for OpenStack cloud controller fine grained approach (white paper) all OpenStack cloud controller components are under cluster control (start, stop, probe) IP addresses and shared file systems used by services under cluster control usage of the cluster load balancer for scalable services define inter-component dependencies on the specific service level orchestration of service start/stop across zones fast failure detection and failover times 20
HA for OpenStack cloud controller - HA SMF proxy (1) The HA SMF proxy data service is a central component for HA OpenStack in the fine grained topology: implements a dedicated cluster SMF restarter enables/disables SMF services on behalf of cluster ability to specify resource dependencies to other cluster services running in different resource groups, within different zones or nodes for orchestration comes in three flavors: failover, multi-master and scalable 21
HA for OpenStack cloud controller - HA SMF proxy (2) OpenStack components are deeply integrated with SMF on Solaris get started as dedicated non-root UNIX users some with additional or reduced set of privileges configured some making use of a variety of SMF method tokens, to expand SMF properties as option variables for the method script OpenStack components are implemented through Python even the Python method scripts import SMF functions, thus require to be started within an SMF context SMF is also used to catch the sometimes verbose Python messages and stack traces into the dedicated SMF service log file 22
HA for OpenStack cloud controller - HA SMF proxy (3) Generic approach to provide HA for OpenStack SMF services: failover services (stateful active/passive) configure HAStoragePlus/ScalMountPoint resource to store dynamic FS content configure SUNW.LogicalHostname resource for service endpoint configure SUNW.Proxy_SMF_failover resource for SMF service scalable services (stateless active/active) ensure static content is identical across nodes/zones configure failover RG with SUNW.SharedAddress resource for service endpoint configure scalable RG with SUNW.Proxy_SMF_scalable resource for SMF service OpenStack service configuration specify corresponding IP-address and storage managed by cluster 23
Fine grained approach - pros and cons Pro: fast failure detection per service option to further improve by adding OpenStack service specific probes fast takeover time in case of unplanned outages usage of cluster load balancer allows to configure stateless services in a scalable way out of the box (rabbitmq, OpenStack api, Horizon, etc) matches industry wide approach to provide HA for OpenStack on Linux Con: interdigitation with OpenStack installation more involved order of install and some pre-setup and post-setup tasks required for cluster small changes in administration svcadm vs. clrs for OpenStack services zone cluster strict change management required OpenStack upgrade procedure configuration files to be kept in sync across cluster nodes not easy to apply to already existing non-ha OpenStack deployments 24
HA for OpenStack cloud controller blackbox approach with failover zones cluster does only manage (start, stop probe) the failover kernel zones optional monitoring of suri used in KZ config individual OpenStack services and IP addresses not managed by cluster inter-component dependencies can only be configured on the kernel zone granularity though there is an option with sczsmf ability to distribute kernel zones across global cluster nodes 25
Blackbox approach - pros and cons Pro: separation of cluster and OpenStack installation and upgrade administration and upgrade of OpenStack services near identical to non-ha setup on S11.3 onwards live migration can be used for failover kernel zones to reduce planned downtime considerably Con: longer takeover time after node failure (KZ boot in addition) individual OpenStack service failure can't trigger failover rely purely on SMF to detect service in case sczsmf is used, conflict with live migration scalability of services requires extra external HA load balancer (hard or software) 26
Flexibility through mix and match of topologies HA approaches are not either-or - they can be combined start out with blackbox HA separation in tiers allows to adapt each tier as required ability to use e.g. MySQL cluster within a zone cluster without changing the overall architecture both topologies have security isolation between tiers by design scalability can be addressed by component as needed by specific use cases some need to scale horizon as users bang on the BUI some may not require BUI, instead focus on usage of OpenStack CLI or Heat option to use cluster load balancer, but also switch to hardware load balancer 27
Discussion - Q / A 28
References Oracle Openstack for Oracle Solaris http://www.oracle.com/technetwork/server-storage/solaris11/technologies/openstack-2135773.html Oracle Solaris Cluster http://www.oracle.com/technetwork/server-storage/solaris-cluster/overview/index.html Oracle Solaris Cluster technical resources http://www.oracle.com/technetwork/server-storage/solaris-cluster/documentation/cluster-how-to-1389544.html White Paper: Providing High Availability to the OpenStack Cloud Controller on Oracle Solaris with Oracle Solaris Cluster http://www.oracle.com/technetwork/server-storage/solaris-cluster/documentation/ha-for-openstack- cloud-2537455.pdf 29
30