Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Similar documents
Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Open storage architecture for private Oracle database clouds

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

FlashGrid Cloud Area Network Version 17.05

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Selecting Server Hardware for FlashGrid Deployment

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

FlashGrid Provisioning and Maintenance Guide for Oracle RAC in Azure Cloud

Oracle Solaris 11: No-Compromise Virtualization

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

FlashGrid Provisioning and Maintenance Guide for Oracle RAC in Azure Cloud

ORACLE DATABASE LIFECYCLE MANAGEMENT PACK

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

Oracle Exadata Statement of Direction NOVEMBER 2017

Highly Available Forms and Reports Applications with Oracle Fail Safe 3.0

The Oracle Database Appliance I/O and Performance Architecture

Oracle Grid Infrastructure 12c Release 2 Cluster Domains O R A C L E W H I T E P A P E R N O V E M B E R

ORACLE SOLARIS CLUSTER

FlashGrid Storage Fabric Version 18.03

Maximum Availability Architecture. Oracle Best Practices For High Availability

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

An Oracle White Paper November Oracle RAC One Node 11g Release 2 User Guide

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Grid Infrastructure Cluster Domains O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Database 12c: JMS Sharded Queues

An Oracle White Paper October Minimizing Planned Downtime of SAP Systems with the Virtualization Technologies in Oracle Solaris 10

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

ORACLE FABRIC MANAGER

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Cloud Operations for Oracle Cloud Machine ORACLE WHITE PAPER MARCH 2017

COMPUTE CLOUD SERVICE. Moving to SPARC in the Oracle Cloud

Oracle Hyperion Planning on the Oracle Database Appliance using Oracle Transparent Data Encryption

FlashGrid Provisioning and Maintenance Guide for Oracle RAC in AWS

Advanced Architectures for Oracle Database on Amazon EC2

FlashGrid OPEN STORAGE SOFTWARE for Oracle RAC and Oracle ASM Clusters. Deployment and Maintenance Guide rev

ORACLE SNAP MANAGEMENT UTILITY FOR ORACLE DATABASE

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Performance Characterization of ONTAP Cloud in Azure with Application Workloads

Oracle Java SE Advanced for ISVs

Solution-in-a-box: Deploying Oracle FLEXCUBE v12.1 on Oracle Database Appliance Virtualized Platform ORACLE WHITE PAPER JULY 2016

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

ORACLE SUPERCLUSTER T5-8

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

StorageTek ACSLS Manager Software

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

Oracle s Netra Modular System. A Product Concept Introduction

Oracle Diagnostics Pack For Oracle Database

CONTAINER CLOUD SERVICE. Managing Containers Easily on Oracle Public Cloud

An Oracle White Paper September Oracle Integrated Stack Complete, Trusted Enterprise Solutions

Application Container Cloud

STORAGE CONSOLIDATION AND THE SUN ZFS STORAGE APPLIANCE

ORACLE WEBLOGIC SERVER 10g R3 ENTERPRISE EDITION

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Sun Fire X4170 M2 Server Frequently Asked Questions

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

Veritas Storage Foundation from Symantec

Experience the GRID Today with Oracle9i RAC

Oracle Database Appliance X7-2 Model Family

Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0

EMC Integrated Infrastructure for VMware. Business Continuity

Best Practices for Deploying High Availability Architecture on Oracle Cloud Infrastructure

VIRTUALIZATION WITH THE SUN ZFS STORAGE APPLIANCE

Oracle Real Application Clusters One Node

Introduction. Architecture Overview

Oracle WebLogic Server Multitenant:

Contents Overview of the Gateway Performance and Sizing Guide... 5 Primavera Gateway System Architecture... 7 Performance Considerations...

SOA Cloud Service Automatic Service Migration

Database Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Technical White Paper August Recovering from Catastrophic Failures Using Data Replicator Software for Data Replication

An Oracle White Paper May Oracle VM 3: Overview of Disaster Recovery Solutions

Oracle ASM Considerations for Exadata Deployments: On-premises and Cloud ORACLE WHITE PAPER MARCH 2017

Data Sheet: Storage Management Veritas Storage Foundation by Symantec Heterogeneous online storage management

Oracle Database Appliance X6-2S / X6-2M ORACLE ENGINEERED SYSTEMS NOW WITHIN REACH FOR EVERY ORGANIZATION

The Future of Virtualization. Jeff Jennings Global Vice President Products & Solutions VMware

Microsoft Office SharePoint Server 2007

SUN ORACLE DATABASE MACHINE

A Dell Technical White Paper Dell Virtualization Solutions Engineering

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Oracle Cloud Infrastructure Virtual Cloud Network Overview and Deployment Guide ORACLE WHITEPAPER JANUARY 2018 VERSION 1.0

Implementing SharePoint Server 2010 on Dell vstart Solution

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Overview. Implementing Fibre Channel SAN Boot with the Oracle ZFS Storage Appliance. January 2014 By Tom Hanvey; update by Peter Brouwer Version: 2.

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

Oracle Database 18c: Quality of Service Management

Oracle Database Appliance characterization with JD Edwards EnterpriseOne

IBM InfoSphere Streams v4.0 Performance Best Practices

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

An Oracle White Paper March Consolidation Using the Oracle SPARC M5-32 High End Server

Establishing secure connections between Oracle Ravello and Oracle Database Cloud O R A C L E W H I T E P A P E R N O V E M E B E R

Oracle Database Lite. Automatic Synchronization White Paper. An Oracle White Paper August 2008

August Oracle - GoldenGate Statement of Direction

Oracle Database Appliance X5-2

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

Deploying Custom Operating System Images on Oracle Cloud Infrastructure O R A C L E W H I T E P A P E R M A Y

Transcription:

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software. White Paper rev. 2017-10-16 2017 FlashGrid Inc. 1 www.flashgrid.io

Abstract Ensuring high availability of backend relational databases is a critical part of the cloud migration strategy. Many IT organizations have been using Oracle RAC as a trusted high-availability solution for running mission-critical databases and would like to continue using it in the cloud. Additionally, many of those organizations that have not been using Oracle RAC in their on-premise environments see Oracle RAC as the technology enabling the required level of database high availability in the cloud. However, Oracle RAC has the following infrastructure requirements that are not directly available in public clouds: Shared high-performance storage accessible from all nodes in the cluster Multicast enabled network between all nodes in the cluster Separate networks for different types of traffic: client, cluster interconnect, and storage FlashGrid Storage Fabric and FlashGrid Cloud Area Network technologies address these requirements and enable mission-critical Oracle RAC clusters in public clouds. This paper provides architectural overview of the solution that should help with planning and designing Oracle RAC deployments in Microsoft Azure public cloud. Introduction to FlashGrid Software High-speed shared storage is critical for seamless database infrastructure failure handling with zero downtime and zero data loss. FlashGrid Storage Fabric software enables high-speed shared storage in a variety of infrastructure environments including bare-metal servers, virtual machines, or extended distance clusters, without the use of proprietary storage arrays. FlashGrid Storage Fabric adds shared access required by Oracle RAC to the standard storage capabilities of Azure compute cloud. FlashGrid Cloud Area Network software enables migration of mission-critical applications to Azure cloud by bridging the gap between the standard network capabilities of the Azure virtual networks and the networking requirements of Oracle RAC. Why Oracle RAC in Azure Cloud Oracle RAC provides an advanced technology for database high availability. Many organizations use Oracle RAC for running their mission-critical applications, including most financial institutions and telecom operators where high-availability and data integrity are of paramount importance. Oracle RAC is an active-active distributed architecture with shared database storage. The shared storage plays a central role in enabling automatic failover, zero data loss, 100% data consistency, and in preventing application downtime. These HA capabilities minimize outages due to unexpected failures, as well as during planned maintenance. Oracle RAC technology is available for both large scale and entry level deployments. Oracle RAC Standard Edition 2 provides a very cost-efficient alternative to open-source databases, while ensuring the same level of high availability that the Enterprise Edition customers enjoy. FlashGrid software brings the superior economics, flexibility, and agility of Azure to a broad range of Oracle RAC customers. It enables existing enterprise Oracle RAC customers to realize the full benefits of migrating their entire IT infrastructure to Azure. It also lowers entry barriers for new customers starting with small scale database deployments. 2017 FlashGrid Inc. 2 www.flashgrid.io

Supported Cluster Configurations FlashGrid supports 2- or 3-node Oracle RAC configurations in Azure. Configurations with 4+ nodes are planned in future. Configurations with two RAC nodes Configurations with two RAC nodes have 2-way data mirroring using Normal Redundancy ASM disk groups. An additional VM is required to host quorum disks. Such cluster can tolerate loss of any one node without database downtime. Figure 1. Two RAC nodes Configurations with three RAC nodes Configurations with three RAC nodes have 3-way data mirroring using high redundancy ASM disk groups. However, a normal redundancy ASM disk group is used for clusterware files (the GRID disk group). Such a cluster can tolerate the loss of any one node without database downtime. However, loss of a second node will result in downtime. The main reason for using three (vs. two) RAC nodes are the additional CPU and memory resources and the additional storage read bandwidth. It is possible to have a 3-node RAC cluster that can tolerate loss of two RAC nodes without database downtime, but such configurations are beyond the scope of this document. To learn more, contact FlashGrid. Our contact information is at the end of this document. Figure 2. Three RAC nodes 2017 FlashGrid Inc. 3 www.flashgrid.io

How It Works Architecture Highlights FlashGrid Cloud Area Network enables high-speed overlay networks with multicast, virtual IP addresses, and bandwidth allocations FlashGrid Storage Fabric turns local drives (e.g. Premium SSDs) into shared drives accessible from all nodes in the cluster FlashGrid Read Local Technology minimizes network overhead by serving reads from local drives 2-way or 3-way mirroring of data across separate nodes or Availability Zones Oracle ASM and Clusterware provide data protection and availability Network FlashGrid Cloud Area Network (CLAN) enables running high-speed clustered applications in public clouds or multi-datacenter environments with the efficiency and control of a Local Area Network. The network connecting Azure VMs is effectively a single IP network with a fixed amount of network bandwidth allocated per VM for all types of network traffic. However, the Oracle RAC architecture requires separate networks for client connectivity and for the private cluster interconnect between the cluster nodes. There are two main reasons for that: 1) the cluster interconnect must have low latency and sufficient bandwidth to ensure adequate performance of the inter-node locking and Cache Fusion, 2) the cluster interconnect is used for transmitting raw data and for security reasons must be accessible by the database nodes only. Also, Oracle RAC requires network with multicast capability, which is not available in Azure. FlashGrid CLAN addresses the limitations described above by creating a set of high-speed virtual LAN networks and ensuring QoS between them. Figure 3. CLAN virtual subnets Network capabilities enabled by FlashGrid CLAN for Oracle RAC in Azure: Each type of traffic has its own virtual LAN with a separate virtual NIC, e.g. fg-pub, fg-priv, fg-storage Negligible performance overhead compared to the raw network Minimum guaranteed bandwidth allocation for each traffic type while accommodating traffic bursts Low latency of the cluster interconnect in the presence of large volumes of traffic of other types Transparent virtual IP failover between nodes Multicast support 2017 FlashGrid Inc. 4 www.flashgrid.io

Shared Storage FlashGrid Storage Fabric turns local drives into shared drives accessible from all nodes in the cluster. The local drives shared with FlashGrid Storage Fabric can be block devices of any type including Premium SSD disks or LVM volumes. The sharing is done at the block level with concurrent access from all nodes. Figure 4. FlashGrid Storage Fabric with FlashGrid Read-Local Technology Each database node has a full copy of user data stored on Azure Premium SSD disks attached to that database node. The FlashGrid Read Local Technology allows serving all read I/O from the locally attached disks and increases both read and write I/O performance. Read requests avoid the extra network hop, thus reducing the latency and the amount of the network traffic. As a result, more network bandwidth is available for the write I/O traffic. The FlashGrid software maintains persistent disk names and sets the required disk permissions. There is no need to configure ASMlib or UDEV rules. ASM Disk Group Structure and Data Mirroring FlashGrid software leverages proven Oracle ASM capabilities for disk group management, data mirroring, and high availability. In Normal Redundancy mode each block of data has two mirrored copies. In High Redundancy mode each block of data has three mirrored copies. Each ASM disk group is divided into failure groups one failure group per node. Each disk is configured to be a part of a failure group that corresponds to the node where the disk is located. ASM stores mirrored copies of each block in different failure groups. Figure 5. Example of a Normal Redundancy disk group in a 2-node RAC cluster 2017 FlashGrid Inc. 5 www.flashgrid.io

A typical Oracle RAC setup in Azure will have three Oracle ASM disk groups: GRID, DATA, FRA. In a 2-node RAC cluster all disk groups must have Normal Redundancy. The GRID disk group containing voting files is required to have a quorum disk for storing a third copy of the voting files. Other disk groups also benefit from having quorum disks as they store a third copy of ASM metadata and improve failure handling. In a 3-node cluster all disk groups, except the GRID disk group, must have High Redundancy in order to enable full Read-Local capability. In a 3-node RAC cluster the GRID disk group would typically have Normal Redundancy. Note that in such 3-node RAC cluster loss of no more than one node is tolerated without causing downtime. If a 3-node RAC cluster must tolerate simultaneous failure of two nodes without causing downtime then the GRID disk group must have High Redundancy and additional two quorum nodes must be provisioned to accommodate five copies of voting files. Details of such configuration are not covered in this paper. To learn more, contact FlashGrid. Our contact information is at the end of this document. Figure 6. Example of a High Redundancy disk group in a 3-node RAC cluster High Availability Considerations FlashGrid Storage Fabric and FlashGrid Cloud Area Network have a fully distributed architecture with no single point of failure. The architecture leverages HA capabilities built in Oracle Clusterware, ASM, and Database. Fault Domains and Update Domains In order to protect a cluster from two VMs going offline simultaneously, it is important to place all VMs of the cluster in an availability set and spread the VMs across separate fault domains and update domains. Placing VMs in separate fault domains ensures that those VMs have separate power sources and network switches. Thus, failure of a power source or a network switch will be localized to a single fault domain and will not affect VMs in other fault domains. Placing VMs in separate update domains ensures that those VMs will not be rebooted simultaneously during a planned update of the underlying Azure infrastructure. Because all instances are virtual, failure of a physical host causes only a short outage for the affected node. The node instance will automatically restart on another physical host. This significantly reduces the risk of double failures. 2017 FlashGrid Inc. 6 www.flashgrid.io

Note that not all regions provide three fault domains. It is still possible to deploy 2-node clusters in the regions that provide only two fault domains by placing the quorum VM in a different region. Details of such configuration are beyond the scope of this white paper. Data Availability A Premium SSD disk in Azure provides persistent storage that survives a failure of the node VM. After the failed VM restarts on a new physical node all its volumes are attached with no data loss. Premium SSD disks have built-in redundancy that protects data from failures of the underlying physical media. The mirroring by ASM is done on top of the built-in protection of Premium SSD disks. Together Premium SSD disks plus ASM mirroring provide durable storage with two layers of data protection, which exceeds the typical level of data protection in on-premises deployments. Performance Considerations Supported VM Types and Sizes Database node VMs must have 2+ CPU cores, 8+ GB of memory, and Premium storage support. The following VM types are recommended for database nodes when using Enterprise Edition: DS11_V2, DS12_V2, DS13_V2, DS14_V2, DS15_V2 GS1, GS2, GS3, GS4, GS5 DS1_V2 (1 core, 3.5GB memory) type is recommended for use as a quorum node. Note that there is no Oracle Database software installed on quorum node. Supported Disk Types Currently only Premium SSD disks are supported. Each disk provides up to 7,500 IOPS and 250 MB/s depending on its size. The maximum performance of 7,500 IOPS is available for 2048 GB disks. For databases that require high performance, but smaller capacity, use of multiple 512 GB or 1024 GB disks may be optimal to maximize the total IOPS and MB/s. Note that maximum number of IOPS per VM is also capped and depends on the VM size - 64,000 for DS15_V2. Reference Performance Results The main performance related concern when moving database workloads to the cloud tends to be around storage and network I/O performance. There is a very small to zero overhead related to the CPU performance between bare-metal and Azure cloud. Therefore, in this paper we focus on the I/O performance. Calibrate_IO The CALIBRATE_IO procedure provides an easy way for measuring storage performance including maximum bandwidth, random IOPS, and latency. The CALIBRATE_IO procedure generates I/O through the database stack on actual database files. The test is read-only and it is safe to run it on any existing database. It is also a good tool for directly comparing performance of two storage systems because the CALIBRATE_IO results do not depend on any non-storage factors, such as memory size or the number of CPU cores. Test configuration: Two database nodes, DS15_V2 Sixteen 1024 GB Premium SSD disks per node 2017 FlashGrid Inc. 7 www.flashgrid.io

Test script: SET SERVEROUTPUT ON; DECLARE lat INTEGER; iops INTEGER; mbps INTEGER; BEGIN DBMS_RESOURCE_MANAGER.CALIBRATE_IO (32, 10, iops, mbps, lat); DBMS_OUTPUT.PUT_LINE ('max_iops = ' iops); DBMS_OUTPUT.PUT_LINE ('latency = ' lat); DBMS_OUTPUT.PUT_LINE ('max_mbps = ' mbps); end; / Our results: Max_IOPS = 121597 Latency = 0 Max_MB/s = 1348 SLOB SLOB is a popular tool for generating I/O intensive Oracle workloads. SLOB generates database SELECTs and UPDATEs with minimal computational overhead. It complements Calibrate_IO by generating mixed (read+write) I/O load. AWR reports generated during the SLOB test runs provide various performance metrics. For the purposes of this paper we focus on the I/O performance numbers. Test configuration: Database node VM type: DS15_V2 Sixteen 1024 GB Premium SSD disks per node SGA size: 2.6 GB (small size selected to minimize caching effects and maximize physical I/O) 8KB database block size Schemas: 45 x 240MB per node UPDATE_PCT= 20 The table below shows our results for tests performed in the same configuration (provided above) with a 2-node RAC cluster and a single-instance database (on a single VM) as a baseline. 2-node RAC Single-instance (both nodes combined) Read+Write Database Requests (IOPS) 53,839 52,990 Read Database Requests (IOPS) 43,020 43,159 Write Database Requests (IOPS) 10,819 9,831 Note that the results are very similar for a 2-node RAC and for a single-instance database. While there is an extra network overhead in the 2-node RAC for writes and global cache operations, the additional storage read bandwidth of the second node increases the overall performance and helps keep it on par with the singleinstance configuration. At 53K IOPS the performance is comparable to using a storage array with SSDs. 2017 FlashGrid Inc. 8 www.flashgrid.io

Software Compatibility The following versions of software are supported by FlashGrid as part of the solution: Oracle Database: ver. 12.2.0.1, 12.1.0.2, or 11.2.0.4 with the latest PSU Oracle Grid Infrastructure: ver. 12.2.0.1 or 12.1.0.2 with the latest PSU Operating System: Oracle Linux 7, Red Hat Enterprise Linux 7 Deployment Process FlashGrid Cloud Provisioning tool automates the process of deploying a cluster. The tool provides a flexible webinterface for defining cluster configuration and generating an Azure Resource Manager template for it. The following tasks are performed automatically using the Resource Manager template: Creating cloud infrastructure: VMs, storage, and optionally network Installing and configuring FlashGrid Cloud Area Network Installing and configuring FlashGrid Storage Fabric Installing, configuring, and patching Oracle Grid Infrastructure Installing and patching Oracle Database software Creating ASM disk groups The entire deployment process takes approximately 90 minutes. After the process is complete the cluster is ready for creating a database. Conclusion Running Oracle RAC clusters in Microsoft Azure cloud out of the box has historically been challenging due to storage and network constraints. FlashGrid Cloud Area Network and FlashGrid Storage Fabric remove those constraints and enable a range of highly available database cluster solutions ranging from small cost-efficient and easy to deploy Oracle RAC Standard Edition 2 clusters to high-performance mission-critical Oracle RAC Enterprise Edition clusters with high availability characteristics exceeding those of the traditional on-premises deployments. Contact Information For more information please contact FlashGrid at info@flashgrid.io 2017 FlashGrid Inc. 9 www.flashgrid.io

Copyright 2017 FlashGrid Inc. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. FlashGrid is a registered trademark of FlashGrid Inc. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Red Hat is a registered trademark of Red Hat Inc. Microsoft and Azure are registered trademarks of Microsoft Corporation. Other names may be trademarks of their respective owners. 2017 FlashGrid Inc. 10 www.flashgrid.io