MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads

Similar documents
All-Flash Business Processing SAN and ONTAP 9 Verification Tests Using Oracle Workloads

All-Flash Business Processing SAN and ONTAP 9 Verification Tests Using Microsoft SQL Server Workloads

A Continuous Availability Solution for Virtual Infrastructure

vsphere 6 on NetApp MetroCluster 8.3

High Availability and MetroCluster Configuration Guide For 7-Mode

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

Abstract. Data Classification. Technical Report. Roy Scaife, NetApp Version March 2016 TR-4375

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

SAP HANA Disaster Recovery with Asynchronous Storage Replication

NetApp FAS8000 Series

Data ONTAP 8.1 High Availability and MetroCluster Configuration Guide For 7-Mode

Clustered Data ONTAP 8.2 High-Availability Configuration Guide

EMC VPLEX with Quantum Stornext

Datasheet NetApp FAS6200 Series

NetApp Clustered Data ONTAP 8.2

EMC Integrated Infrastructure for VMware. Business Continuity

High-Availability (HA) Pair Controller Configuration Overview and Best Practices

SAP with Oracle on UNIX and NFS with NetApp Clustered Data ONTAP and SnapManager for SAP 3.4

Veritas Storage Foundation for Windows by Symantec

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA

EMC VPLEX Geo with Quantum StorNext

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

NetApp FAS3200 Series

Native vsphere Storage for Remote and Branch Offices

vsan Security Zone Deployment First Published On: Last Updated On:

EMC Business Continuity for Microsoft Applications

vsan Remote Office Deployment January 09, 2018

VMware vsphere Clusters in Security Zones

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Reasons to Deploy Oracle on EMC Symmetrix VMAX

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

DATA PROTECTION IN A ROBO ENVIRONMENT

Data Sheet: High Availability Veritas Cluster Server from Symantec Reduce Application Downtime

Protecting remote site data SvSAN clustering - failure scenarios

Real-time Protection for Microsoft Hyper-V

Veritas Storage Foundation for Windows by Symantec

Step into the future. HP Storage Summit Converged storage for the next era of IT

Cisco I/O Accelerator Deployment Guide

Hosted Microsoft Exchange Server 2003 Deployment Utilizing Network Appliance Storage Solutions

Today s trends in the storage world. Jacint Juhasz Storage Infrastructure Architect

Synology High Availability (SHA)

FlexArray Virtualization

Best Practices of Huawei SAP HANA TDI Active-Passive DR Solution Using BCManager. Huawei Enterprise BG, IT Storage Solution Dept Version 1.

vsan Management Cluster First Published On: Last Updated On:

Veritas Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL 2008

Carbonite Availability. Technical overview

vsan Disaster Recovery November 19, 2017

Copy-Based Transition Guide

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

EMC Unified Storage for Oracle Database 11g/10g Virtualized Solution. Enabled by EMC Celerra and Linux using FCP and NFS. Reference Architecture

Using EonStor DS Series iscsi-host storage systems with VMware vsphere 5.x

Veritas Cluster Server from Symantec

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002

The Best Storage for Virtualized Environments

Maximum Availability Architecture on Dell PowerEdge Servers and Dell/EMC Storage over Wide Area Networks

VERITAS Volume Manager for Windows 2000 VERITAS Cluster Server for Windows 2000

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Microsoft Office SharePoint Server 2007

Protecting Mission-Critical Workloads with VMware Fault Tolerance W H I T E P A P E R

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

ONTAP 9.3 Cluster Administration and Data Protection Bundle (CDOTDP9)

SAP HANA Disaster Recovery with Asynchronous Storage Replication

Cisco Unified Computing System for SAP Landscapes

Copyright 2012 EMC Corporation. All rights reserved. EMC And VMware. <Date>

Midsize Enterprise Solutions Selling Guide. Sell NetApp s midsize enterprise solutions and take your business and your customers further, faster

Nimble Storage Adaptive Flash

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

SAP with Microsoft SQL Server on Windows

The Right Choice for DR: Data Guard, Stretch Clusters, or Remote Mirroring. Ashish Ray Group Product Manager Oracle Corporation

NetApp AFF. Datasheet. Leading the future of flash

IBM TS7700 grid solutions for business continuity

SnapMirror Configuration and Best Practices Guide for Clustered Data ONTAP

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

Introduction to iscsi

The Data Protection Rule and Hybrid Cloud Backup

IBM N Series. Store the maximum amount of data for the lowest possible cost. Matthias Rettl Systems Engineer NetApp Austria GmbH IBM Corporation

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

EMC Solutions for Enterprises. EMC Tiered Storage for Oracle. ILM Enabled by EMC Symmetrix V-Max. Reference Architecture. EMC Global Solutions

Outline: ONTAP 9 Cluster Administration and Data Protection Bundle (CDOTDP9)

WHITE PAPER: ENTERPRISE SOLUTIONS

Oracle Real Application Clusters on VMware vsan January 08, 2018

PURE STORAGE PURITY ACTIVECLUSTER

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD

1 BRIEF / Oracle Solaris Cluster Features and Benefits

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

DELL EMC CX4 EXCHANGE PERFORMANCE THE ADVANTAGES OF DEPLOYING DELL/EMC CX4 STORAGE IN MICROSOFT EXCHANGE ENVIRONMENTS. Dell Inc.

EMC Backup and Recovery for Microsoft Exchange 2007

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

Clustered Data ONTAP 8.3: A Proven Foundation For Hybrid Cloud

7-Mode Transition Tool 2.2

Simplifying Downtime Prevention for Industrial Plants. A Guide to the Five Most Common Deployment Approaches

Architecture. SAN architecture is presented in these chapters: SAN design overview on page 16. SAN fabric topologies on page 24

Copyright 2012 EMC Corporation. All rights reserved.

IBM System Storage DS5020 Express

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

SAP HANA Backup and Recovery with SnapCenter

Transcription:

Technical Report MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads Business Workloads Group, PSE, NetApp April 2015 TR-4396 Abstract This document describes the results of functional testing of NetApp MetroCluster software on the NetApp clustered Data ONTAP 8.3 operating system in an Oracle Database 11g R2 environment. Proper operation is verified as well as expected behavior during each of the test cases. Specific equipment, software, and functional failover tests are included along with results.

TABLE OF CONTENTS 1 Introduction... 4 1.1 Best Practices...4 1.2 Assumptions...4 2 Executive Summary... 4 3 Product Overview... 4 3.1 NetApp Storage Technology...5 3.2 Oracle Database and Oracle Real Application Clusters...7 4 Challenges for Disaster Recovery Planning... 7 4.1 Logical Disasters...8 4.2 Physical Disasters...8 5 Value Proposition... 8 6 High-Availability Options... 8 6.1 ASM Mirroring...8 6.2 Two-Site Storage Mirroring...9 7 High-Level Topology... 9 8 Test Case Overview and Methodology... 10 9 Test Results... 12 9.1 Loss of Single Oracle Node (TC-01)... 12 9.2 Loss of Oracle Host HBA (TC-02)... 12 9.3 Loss of Individual Disk (TC-03)... 13 9.4 Loss of Disk Shelf (TC-04)... 14 9.5 Loss of NetApp Storage Controller (TC-05)... 15 9.6 Loss of Back-End Fibre Channel Switch (TC-06)... 16 9.7 Loss of Interswitch Link (TC-07)... 17 9.8 Maintenance Requiring Planned Switchover from Site A to Site B (TC-08)... 18 9.9 Disaster Forcing Unplanned Manual Switchover from Site A to Site B (TC-09)... 19 10 Conclusion... 21 Appendix... 21 Detailed Test Cases... 21 Deployment Details... 30 Network... 34 Data Layout... 35 2 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Materials List... 37 LIST OF TABLES Table 1) Test case summary.... 11 Table 2) Oracle host specifications.... 30 Table 3) Oracle specifications.... 30 Table 4) Kernel parameters.... 31 Table 5) Oracle initiation file parameters.... 32 Table 6) NetApp storage specifications.... 33 Table 7) Server network specifications.... 34 Table 8) Storage network specifications.... 34 Table 9) FC back-end switches.... 34 Table 10) Materials list for testing.... 37 LIST OF FIGURES Figure 1) MetroCluster overview....6 Figure 2) MetroCluster mirroring....7 Figure 3) Test environment....9 Figure 4) Data layout.... 10 Figure 5) Test phases... 11 Figure 6) Loss of Oracle node.... 12 Figure 7) Loss of an Oracle server host HBA.... 13 Figure 8) Loss of an individual disk.... 14 Figure 9) Loss of disk shelf.... 15 Figure 10) Loss of NetApp storage controller.... 16 Figure 11) Loss of an FC switch.... 17 Figure 12) Loss of ISL.... 18 Figure 13) Loss of primary site for planned maintenance.... 19 Figure 14) Loss of primary site.... 20 Figure 15) Aggregate and volume layouts and sizes.... 35 Figure 16) Volume and LUN layouts for site A.... 36 3 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

1 Introduction This document describes the results of a series of tests demonstrating that an Oracle Database 11g R2 Real Application Cluster (RAC) database configured in a NetApp MetroCluster solution in clustered Data ONTAP 8.3 operates without problems while under load in a variety of possible failure scenarios. The tests simulate several different failure scenarios. This technical report documents their effects on the Oracle Database 11g database environment. The tests were conducted while both the database servers and the NetApp storage controllers were subjected to a heavy transactional workload meant to increase the amount of stress on the system as well as to better represent more of a real-world environment. In order to pass a test, the MetroCluster cluster had to remain online and accessible while the database continued to serve I/O in the environment without errors. 1.1 Best Practices This document should not be interpreted as a best practice guide for using solutions with Oracle databases on NetApp MetroCluster software in clustered Data ONTAP 8.3. Customer requirements vary, and therefore configurations vary as well. The configuration described in this document reflects the most common two-site customer need encountered by NetApp, but many others exist as well. In addition, NetApp MetroCluster technology is not required for Oracle RAC. Most Oracle RAC clusters on NetApp storage do not require synchronous remote replication and therefore are used with standard Data ONTAP clusters, not with MetroCluster clusters. Although Oracle Database 11g R2 was used for these tests, the principles are equally applicable to Oracle Database 12c and later. The 11g R2 version was chosen as the most mature, stable, and commonly used version of Oracle RAC. Although the Fibre Channel (FC) protocol is used in this document, the same overall design and procedures can be used for NFS and iscsi. For more information about all other configuration details, including Oracle database and kernel parameters, see the appendix of this document. For general Oracle best practices, including those for Oracle RAC, see TR-3633: Best Practices for Oracle Databases on NetApp Storage. 1.2 Assumptions Throughout this document, the examples assume two physical sites, site A and site B. Site A represents the main data center on campus. Site B is the campus disaster recovery (DR) location that provides protection during a complete data center outage. All components are named to show clearly where they are physically located. It is also assumed that the reader has a basic familiarity with both NetApp and Oracle products. 2 Executive Summary MetroCluster in clustered Data ONTAP 8.3 provides native continuous availability for business-critical applications, including Oracle. The testing demonstrated that our Oracle Database 11g R2 RAC cluster operated as expected in the MetroCluster environment under a moderate to heavy transactional workload when subjected to a variety of failure scenarios that resulted in limited, moderate, and complete disruption to the systems in our primary production site. These tests show that NetApp MetroCluster technology and the Oracle RAC database together provide a winning combination for continuous application availability. 3 Product Overview This section describes the NetApp and Oracle products used in the solution. 4 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

3.1 NetApp Storage Technology This section describes the NetApp hardware and software used in the solution. FAS8000 Series Storage Systems NetApp FAS8000 series storage systems combine a unified scale-out architecture with leading datamanagement capabilities. They are designed to adapt quickly to changing business needs while delivering core IT requirements for up-time, scalability, and cost efficiency. These systems offer the following advantages: Speed the completion of business operations. Leveraging a new high-performance, multicore architecture and self-managing flash acceleration, FAS8000 unified scale-out systems boost throughput and decrease latency to deliver consistent application performance across a broad range of SAN and NAS workloads. Streamline IT operations. Simplified management and proven integration with cloud providers let you deploy the FAS8000 in your data center and in a hybrid cloud with confidence. Nondisruptive operations simplify long-term scaling and improve uptime by facilitating hardware repair, tech refreshes, and other updates without planned downtime. Deliver superior total cost of ownership. Proven storage efficiency and a two-fold improvement in price/performance ratio over the previous generation reduce capacity utilization and improve longterm return on investment. NetApp FlexArray storage virtualization software lets you integrate existing arrays with the FAS8000, increasing consolidation and providing even greater value to your business. Clustered Data ONTAP Operating System NetApp clustered Data ONTAP 8.3 software delivers a unified storage platform that enables unrestricted, secure data movement across multiple cloud environments and paves the way for software-defined data centers, offering advanced performance, availability, and efficiency. Data ONTAP clustering capabilities help you keep your business running nonstop. Clustered Data ONTAP is an industry-leading storage operating system. Its single feature-rich platform allows you to scale infrastructure without increasing IT staff. Clustered Data ONTAP provides the following benefits: Nondisruptive operations: Perform storage maintenance, hardware lifecycle operations, and software upgrades without interrupting your business. Eliminate planned and unplanned downtime. Proven efficiency: Reduce storage costs by using one of the most comprehensive storage efficiency offerings in the industry. Consolidate and share the same infrastructure for workloads or tenants with different performance, capacity, and security requirements. Seamless scalability: Scale capacity, performance, and operations without compromise, regardless of application. Scale SAN and NAS from terabytes to tens of petabytes without reconfiguring running applications. MetroCluster Solution A self-contained solution, NetApp MetroCluster high-availability (HA) and DR software lets you achieve continuous data availability for mission-critical applications at half the cost and complexity. 5 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

MetroCluster software combines array-based clustering with synchronous mirroring to deliver continuous availability and zero data loss. It provides transparent recovery from most failure scenarios so that critical applications continue running uninterrupted. It also eliminates repetitive change-management activities to reduce the risk of human error and administrative overhead. New MetroCluster enhancements deliver the following improvements: Local node failover in addition to site switchover End-to-end continuous availability in a virtualized environment with VMware HA and fault tolerance Whether you have a single data center, a campus, or a metropolis-wide environment, use the costeffective NetApp MetroCluster solution to achieve continuous data availability for your critical business environment. Figure 1 shows a high-level view of a MetroCluster environment that spans two data centers separated by a distance of up to 200km. MetroCluster software in clustered Data ONTAP 8.3 provides the following features: MetroCluster is an independent two-node cluster at each site, up to 200km apart. Each site serves data to local clients or hosts and acts as secondary to the other site. The client/host network spans both sites, just as with fabric and stretch MetroCluster. Interswitch links (ISLs) and redundant fabrics connect the two clusters and their storage. All storage is fabric attached and visible to all nodes. Local HA handles almost all planned and unplanned operations. Switchover and switchback transfer the entire cluster's workload between sites. Figure 1) MetroCluster overview. SyncMirror Mirroring NetApp SyncMirror, an integral part of MetroCluster, combines the disk-mirroring protection of RAID 1 with industry-leading NetApp RAID technology. During an outage whether from a disk problem, a cable break, or a host bus adapter (HBA) failure SyncMirror can instantly access the mirrored data without operator intervention or disruption to client applications. SyncMirror maintains strict physical separation between two copies of your mirrored data. Each copy is called a plex. As Figure 2 shows, each controller s data has its mirror at the other location. 6 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 2) MetroCluster mirroring. With MetroCluster, all mirroring is performed at an aggregate level so that all volumes are automatically protected with one simple replication relationship. Other protection solutions operate at an individual volume level. This means that to protect all of the volumes (which could be hundreds), some type of replication relationship must be created after each source and destination volume is created. 3.2 Oracle Database and Oracle Real Application Clusters The Oracle Database 11g R2 Enterprise Edition provides industry-leading performance, scalability, security, and reliability on clustered or single servers with a wide range of options to meet the business needs of critical enterprise applications. Oracle Database with Real Application Clusters (RAC) brings an innovative approach to the challenges of rapidly increasing amounts of data and demand for high performance. In the scale-out model of Oracle RAC, active-active clusters use multiple servers to deliver high performance, scalability, and availability, making Oracle Database 11g the ideal platform for private and public cloud deployments. The use of Oracle RAC clusters running on extended host clusters provide the highest level of Oracle capability for availability, scalability, and low-cost computing. It also supports popular packaged products such as SAP, PeopleSoft, Siebel, and Oracle E*Business Suite, as well as custom applications. 4 Challenges for Disaster Recovery Planning Disaster recovery (DR) is defined as the processes, policies, and procedures related to preparing for recovery or continuation of technical infrastructure critical to an organization after a natural disaster (such as flood, tornado, volcano eruption, earthquake, or landslide) or a human-induced disaster (such as a threat having an element of human intent, negligence, or error or involving a failure of a human-made system). DR planning is a subset of a larger process known as business continuity planning, and it should include planning for the resumption of applications, data, hardware, communications (such as networking), and other IT infrastructure. A business continuity plan (BCP) includes planning for non-it related aspects such 7 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

as key personnel, facilities, crisis communication, and reputation protection, and it should refer to the disaster recovery plan (DRP) for IT-related infrastructure recovery or continuity. Generically, a disaster can be classified as either logical or physical. Both categories are addressed with HA, recovery processing, and/or DR processes. 4.1 Logical Disasters Logical disasters include, but are not limited to, data corruption by users or technical infrastructure. Technical infrastructure disasters can result from file system corruption, kernel panics, or even system viruses introduced by end users or system administrators. 4.2 Physical Disasters Physical disasters include the failure of any storage component on site A or site B that supersedes the resiliency features of an HA pair of NetApp controllers not based on MetroCluster that would normally result in downtime or data loss. In certain cases, mission-critical applications should not be stopped even in a disaster. By leveraging Oracle RAC extended-distance clusters and NetApp storage technology, it is possible to address those failure scenarios and provide a robust deployment for critical database environments and applications. 5 Value Proposition Typically, mission-critical applications must be implemented with two requirements: RPO = 0 (recovery point objective equal to zero), meaning that data loss from any type of any failure is unacceptable RTO ~= 0 (recovery time objective as close to zero as possible), meaning that the time to recovery from a disaster scenario should be as close to 0 minutes as possible The combination of Oracle RAC on extended-distance clusters with NetApp MetroCluster technology meets these RPO requirements by addressing the following common failures: Any kind of Oracle Database instance crash Switch failure Multipathing failure Storage controller failure Storage or rack failure Network failure Local data center failure Complete site failure 6 High-Availability Options Multiple options exist for spanning sites with an Oracle RAC cluster. The best option depends on the available network connectivity, the number of sites, and customer business needs. NetApp Professional Services can offer assistance with configuration planning and, when necessary, can offer Oracle consulting services as well. 6.1 ASM Mirroring Automatic storage management (ASM) mirroring, also called ASM normal redundancy, is a frequent choice when only a very small number of databases must be replicated. In this configuration, the Oracle 8 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

RAC nodes span sites and leverage ASM to replicate data. Storage mirroring is not required, but scalability is limited because as the number of databases increases the administrative burden to maintain many mirrored ASM disk groups becomes excessive. In these cases, customers generally prefer to mirror data at the storage layer. This approach can be configured with and without a tiebreaker to control the Oracle RAC cluster quorum. 6.2 Two-Site Storage Mirroring The configuration chosen for these tests was two-site storage mirroring because it reflects the most common use of site-spanning Oracle RAC with MetroCluster. As described in detail in the following section, this option establishes one of the sites as a designated primary site and the other as the designated secondary site. This is done by first selecting one site to host the active storage site and then configuring two Oracle Cluster Ready Services (CRSs) and voting resources on it. The other site is a synchronous but passive replica. It does not directly serve data. It also contains only one CRS and voting resource. 7 High-Level Topology Figure 3 shows the architecture of the configuration used for our validation testing. These tests used a two-node Oracle RAC database environment with a RAC node deployed at both site A and site B with the following specifics: The sites were separated by a 20km distance, and fiber spools were used for both the MetroCluster and the RAC nodes. The RAC configuration used the FC protocol and ASM to provide access to the database. A WAN emulator was used to simulate a 20km distance between the RAC nodes for the private interconnect and to introduce approximately 10ms of latency into the configuration. Figure 3) Test environment. 9 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

The Oracle binaries were installed locally on each server. The configuration included the following specific arrangements: The data and logs were mirrored to site B with the storage controllers at site B acting strictly in a passive DR capacity. Using a single front-end fabric spanning both sites, the FC LUNs were presented to both Oracle RAC nodes by the storage controllers at site A. There were three OCR disks: two at site A and one at site B. There were three voting disks: two at site A and one at site B. Figure 4 shows the distribution of the Oracle logs and data files across both controllers on site A. For more information about the hardware and software used in this test configuration, see the appendix of this document. Figure 4) Data layout. 8 Test Case Overview and Methodology All of the test cases listed in Table 1 were executed by injecting a specific fault into an otherwise nominally performing system under a predefined load driven to the Oracle RAC cluster. The load was generated by a utility called Simple Little Oracle Benchmark (SLOB), and it used a combination of 90% reads and 10% writes with a 100% random access pattern. This load delivered more than 70K IOPS evenly across the FAS8060 controllers at site A, resulting in storage CPU and disk utilization of 30% to 40% during the tests. The goal was not to measure performance of the overall environment specifically but to subject the environment to a substantial load during testing. 10 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

To increase the load on the test environment, we made sure that the Oracle RAC node that was installed in site B participated in the load generation by driving IOPS to the FAS8060 storage controllers on site A across the network. Table 1) Test case summary. Test Case TC01 TC02 TC03 TC04 TC05 TC06 TC07 TC08 TC09 Description Loss of a single Oracle node Loss of an Oracle host HBA Loss of an individual disk in an active data aggregate Loss of an entire disk shelf Loss of a NetApp storage controller Loss of a back-end FC switch on the MetroCluster cluster Loss of an ISL Sitewide maintenance requiring a planned switchover from site A to site B Sitewide disaster requiring an unplanned manual switchover from site A to site B For more information about how we conducted each of these tests, see the appendix of this document. Each test was broken into the following three phases: 1. A baseline stage, indicative of normal operations. A typical duration for this stage was 15 minutes. 2. A fault stage, during which the specific fault under test was injected and allowed to continue in this stage for 15 minutes to provide sufficient time to verify correct database behavior. 3. A recovery stage, in which the fault was corrected and database behavior was verified. When applicable, this stage generally included 30 additional minutes of run time after the fault was corrected. Figure 5 shows the process. Before each stage of a specific test, we used the automatic workload repository (AWR) functionality of the Oracle database to create a snapshot of the current condition of the database. After the test was complete, we captured the data between the snapshots to understand the impact of the specific fault on database performance and behavior. Finally, we monitored the CPU, IOPS, and disk utilization on the storage controllers throughout the tests. Figure 5) Test phases. 11 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

9 Test Results The following sections summarize the tests that were performed and report the results of each. 9.1 Loss of Single Oracle Node (TC-01) This test case resulted in the loss of the Oracle RAC node on site B. As Figure 6 shows, this loss was accomplished by powering off the Oracle RAC database node while it was under load. For this test, we ran the workload for a total of 60 minutes and allowed the RAC node to be disabled for 15 minutes before restarting it to correct the fault. Figure 6) Loss of Oracle node. As expected, we observed no impact to the Oracle RAC functionality during this test. Also as expected, we observed a larger impact to the overall performance driven from the database because the loss of one of the database nodes reduced the amount of I/O data driven to the FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the test to continue in the failed state. To correct the failure, we powered on the RAC node located at site B and observed that it was correctly added back into the RAC environment. We then started the workload again on both RAC nodes to verify that they were both operating correctly. 9.2 Loss of Oracle Host HBA (TC-02) This test resulted in the loss of an HBA on one of the Oracle RAC nodes. As Figure 7 shows, this loss was accomplished by removing the cables from an HBA on the Oracle node at site A. For this test, we ran the workload for a total of 60 minutes and allowed the HBA to be disconnected for 15 minutes before reconnecting it to correct the fault. 12 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 7) Loss of an Oracle server host HBA. As expected, during this test we observed no impact to the Oracle RAC functionality while the database servers continued to drive load to the FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the test to continue in the failed state. To correct the failure, we reconnected the HBA on the RAC node located at site A and verified that it ultimately started participating in the workload again after a brief time. We observed no database errors during this test. 9.3 Loss of Individual Disk (TC-03) This test resulted in the loss of a disk on one of the storage controllers. As Figure 8 shows, this loss was accomplished by removing one of the disks on an active data aggregate at site A. For this test, we ran the workload for a total of 60 minutes and allowed the disk to be removed for 15 minutes before reinserting it to correct the fault. 13 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 8) Loss of an individual disk. As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact to the overall performance driven from the database while both RAC nodes continued to drive load to the FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the test to continue in the failed state. Note: NetApp RAID DP technology can survive the failure of two disks per aggregate, and it automatically reconstructs the data on the spare. 9.4 Loss of Disk Shelf (TC-04) This test resulted in the loss of an entire shelf of disks on one of the FAS8060 storage controllers. As Figure 9 shows, this loss was accomplished by powering off one of the disk shelves at site A. For this test, we ran the workload for a total of 60 minutes and allowed the disk shelf to be powered off for 15 minutes before reapplying power to correct the fault. 14 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 9) Loss of disk shelf. As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact to the overall performance driven from the database while both RAC nodes continued to drive load to the FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the test to continue in the failed state. With the use of SyncMirror in MetroCluster, shelf failure at either site is transparent. There are two plexes, one at each site. In normal operation, all reads are fulfilled from the local plex, and all writes are synchronously updated on both plexes. If one plex fails, reads continue seamlessly on the remaining plex, and writes are directed to the remaining plex. If the hardware can be powered on for recovery, the resynchronization of the recovered plex is automatic. If the failed shelf must be replaced, the new disks are added to the mirrored plex. Afterward, resynchronization again becomes automatic. 9.5 Loss of NetApp Storage Controller (TC-05) This test resulted in the unplanned loss of an entire storage controller. As Figure 10 shows, this was accomplished by powering off one of the FAS8060 storage controllers at site A. The surviving storage controller automatically took over the workload that was initially shared evenly across both storage controllers. Note: The storage controller takeover and giveback process used for this test differs from the MetroCluster switchover and switchback process used in test cases TC-08 and TC-09. For this test we ran the workload for a total of 60 minutes and allowed the controller to be powered off for 15 minutes before reapplying power to correct the fault and performing a storage controller giveback to bring both FAS8060 controllers back on line at site A. 15 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 10) Loss of NetApp storage controller. As expected, during this test we observed no impact to the Oracle RAC functionality, with a larger impact to the overall performance driven from the database while both RAC nodes continued to drive load to the surviving FAS8060 storage controller, albeit at a lower rate because of the nature of the failure. After performing a storage controller giveback to rectify the failure, we allowed the test to continue for an additional 30 minutes and observed that overall performance returned to prefailure levels. We continued to observe no problems with the operation of the database. 9.6 Loss of Back-End Fibre Channel Switch (TC-06) This test resulted in the loss of one of the MetroCluster FC switches. As Figure 11 shows, this loss was accomplished by powering off one of the switches at site A. For this test, we ran the workload for a total of 60 minutes and allowed the switch to be powered off for 15 minutes before reapplying power to correct the fault. 16 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 11) Loss of an FC switch. As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact to the overall performance driven from the database. In this case, continuous operation was maintained by automatically moving all of the I/O across the surviving path to the LUNs on the surviving switch. After rectifying the failure by reapplying power to the switch, we allowed the test to continue for an additional 30 minutes and observed that overall performance was maintained at prefailure levels. We continued to observe no problems with the operation of the database. 9.7 Loss of Interswitch Link (TC-07) This test resulted in the loss of one of the ISLs on the MetroCluster FC switches. As Figure 12 shows, this loss was accomplished by unplugging the ISL on one of the switches at site A. For this test, we ran the workload for a total of 60 minutes and allowed the ISL to be disconnected for 15 minutes before reconnecting it to correct the fault. 17 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 12) Loss of ISL. As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact to the overall performance driven from the database. In this case, continuous operation was maintained by automatically moving all of the I/O across the surviving paths. After rectifying the failure by reconnecting the ISL to the switch, we allowed the test to continue for an additional 30 minutes and observed that the overall performance was maintained at prefailure levels. We continued to observe no problems with the operation of the database. 9.8 Maintenance Requiring Planned Switchover from Site A to Site B (TC-08) This test resulted in the planned switchover of the FAS8060 storage controllers on site A in order to conduct a maintenance operation. As Figure 13 shows, this was accomplished by executing a MetroCluster switchover that changed the LUNs serving the RAC database from site A to those that were mirrored at site B. 18 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 13) Loss of primary site for planned maintenance. For this test, we ran the workload for a total of 60 minutes. After 15 minutes, we initiated the MetroCluster switchover command from the FAS8060 controllers on site B. After the switchover was successfully completed, we observed that the workload was picked up by the FAS8060 controllers at site B and that both Oracle RAC nodes continued to operate normally without interruption. Note: The switchover was accomplished by using a single command to switch over the entire storage resource from site A to site B while preserving the configuration and identity of the LUNs. The result was that no action, rediscovery, remapping, or reconfiguration was required from the perspective of the Oracle RAC database. We allowed the test to continue in the switched-over state for another 15 minutes and then initiated the MetroCluster switchback process to restore site A as the primary site for the Oracle RAC database. After successfully completing the MetroCluster switchback process, we observed the FAS8060 in site A resuming the processing of the workload from both RAC nodes, and the database operation continued without interruption. During this test, were observed no problems with the operation of the Oracle RAC database. 9.9 Disaster Forcing Unplanned Manual Switchover from Site A to Site B (TC-09) This test resulted in the unexpected complete loss of site A because of an unspecified disaster. As Figure 14 shows, this loss was accomplished by powering off both of the FAS8060 storage controllers and the Oracle RAC node located at site A. Our expectation was that the second Oracle RAC node running on site B would lose access to the database LUNs hosted on the FAS8060 controllers on site A and shut down. After officially declaring the loss of site A, we manually initiated a MetroCluster switchover from site A to site B and restarted the Oracle RAC database instance on site B. 19 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 14) Loss of primary site. For this test, we ran the workload for a total of 60 minutes. After 15 minutes, we powered off both of the FAS8060 controllers and the Oracle RAC node on site A. We continued in this state for a total of 15 minutes. As expected, the Oracle RAC database node that was running on site B lost access to the voting LUNs on site A and stopped working after exceeding the defined timeout period. As discussed previously, this interruption occurred because of the lack of a third-site tiebreaker service, which is the most common configuration chosen by customers. If completely seamless DR capability is desired, this can be accomplished through the use of a third site. We then initiated the MetroCluster switchover command from the FAS8060 controllers on site B. After the switchover was completed, we restarted the Oracle RAC node on site B and observed that it started normally without additional manual intervention after the database LUNs were redirected to the copies that had been mirrored to the FAS8060 controllers on site B. To verify that the database was working, we initiated the workload from the surviving RAC node and observed that it successfully drove IOPS to the FAS8060 controllers on site B. We allowed the test to continue for an additional 15 minutes and then initiated the MetroCluster switchback process to restore site A as the primary site for the Oracle RAC database. After successfully completing the MetroCluster switchback process, we restarted the Oracle RAC server on site A and verified that it was added back into the cluster. We then restarted the workload and verified that the Oracle RAC nodes on site A and site B were again driving load to the FAS8060 controllers on site A. 20 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

10 Conclusion NetApp MetroCluster software in clustered Data ONTAP 8.3 provides native continuous availability for business-critical applications, including Oracle. Our tests demonstrated that even under heavy transactional workloads Oracle databases continue to function normally during a wide variety of failure scenarios that could potentially cause downtime and data loss. In addition, clustered Data ONTAP provides the following benefits: Nondisruptive operations leading to zero data loss Set-it-once simplicity Zero change management Lower cost and complexity of competitive solutions Seamless integration with storage efficiency, SnapMirror, nondisruptive operations, and virtualized storage Unified support for both SAN and NAS Together, these products create a winning combination for continuous data availability. Appendix This appendix provides detailed information about the test cases described in this document as well as about deployment, the network, the data layout, and the list of materials used. Detailed Test Cases TC-01: Loss of Single Oracle Node Test Case Test case number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-01 No single point of failure should exist in the solution. Therefore, the loss of one of the Oracle servers in the cluster was tested. This test was accomplished by halting a host in the cluster while running a test workload. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The loss of an Oracle RAC node causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at a lower rate because of the loss of one of the RAC nodes. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 21 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Halt one of the Oracle RAC servers and allow the test to continue for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Bring the halted server back online and verify that it is placed back into the RAC environment. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault. TC-02: Loss of Oracle Host HBA Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-02 No single point of failure should exist in the solution. Therefore, the loss of one of the Oracle servers in the cluster was tested. This test was accomplished by halting a host in the cluster while running a test workload. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers Removal of the HBA connection from the Oracle RAC node causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Remove the cable from the FC HBA on the Oracle RAC server on site B and allow the test to continue for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Reinstall the cable. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 22 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-03: Loss of Individual Disk Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-03 No single point of failure should exist in the solution. Therefore, the loss of a single disk was tested. This test was accomplished by removing a disk drive from the shelf hosting the database data files on the FAS8060 running on site A while running an active workload. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The removal of the disk drive causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Remove one of the disks in an active aggregate and allow the test to continue for 15 minutes. 5. Reinstall the disk drive. 6. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 23 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-04: Loss of Disk Shelf Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-04 No single point of failure should exist in the solution. Therefore, the loss of an entire shelf of disks was tested. This test was accomplished by turning off both power supplies on one of the disk shelves hosting the database data files on the FAS8060 running on site A while running an active workload. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The loss of a disk shelf causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Turn off the power supplies on the designated disk shelf and let the test continue for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Turn on the power supplies on the affected disk shelf. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 24 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-05: Loss of NetApp Storage Controller Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-05 No single point of failure should exist in the solution. Therefore, the loss of one of the FAS8060 controllers serving the database on site A was tested while an active workload was running. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The loss of a controller of an HA pair has no impact on Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at a lower rate in the time frame while the second storage controller is halted and the surviving storage controller is handling the entire workload. After the storage giveback process is completed, performance returns to prefailure levels because both storage controllers are again servicing the workload. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Without warning, halt one of the controllers of the FAS8060 HA pair on site A. 5. Initiate a storage takeover by the surviving node and let the test continue for 15 minutes. 6. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 7. Reboot the halted storage controller. 8. Initiate a storage giveback operation to bring the failed node back into the storage cluster. 9. Allow the test to continue for the remainder of the 60-minute duration. 10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 25 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-06: Loss of Back-End Fibre Channel Switch Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-06 No single point of failure should exist in the solution. Therefore, the loss of an entire FC switch supporting the MetroCluster cluster was tested. This test was accomplished by simply removing the power cord from one of the Brocade 6510 switches in site A while running an active workload. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The loss of a single MetroCluster FC switch causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Power off one of the MetroCluster Brocade 6510 switches in site A and allow the test to run for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Power on the Brocade 6510 switch. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 26 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-07: Loss of Interswitch Link Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-07 No single point of failure should exist in the solution. Therefore, the loss of one of the ISLs was tested. This test was accomplished by removing the FC cable between two Brocade 6510 switches on site A and site B while running an active workload. A completely operational NetApp MetroCluster cluster has been properly installed and configured. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers The loss of one of the ISL switch links between site A and site B causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. Disconnect one of the MetroCluster ISLs. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Reconnect the affected ISL. 7. Allow the test to continue for the remainder of the 60-minute duration. 8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 27 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-08: Maintenance Requiring Planned Switchover from Site A to Site B Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-08 If there is a required maintenance window for the FAS8060 storage controllers at site A, the MetroCluster switchover feature should be capable of moving the production workload to site B and presenting the Oracle RAC database LUNs from the FAS8060 storage controllers at site B, allowing the database to continue operations. To test this premise, we initiated a MetroCluster switchover and switchback from site A to site B and then back to site A after the maintenance was complete. A completely operational NetApp MetroCluster cluster has been installed and configured properly. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers on site A and site B Moving the production operations from site A to site B by using the MetroCluster switchover operations causes no interruption of the Oracle RAC operation. After the MetroCluster switchover, IOPS are directed to the FAS8060 storage controllers at site B from both RAC nodes. After the MetroCluster switchback, IOPS are again directed at the FAS8060 storage controllers on site A. No database errors are detected. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. On site B, initiate a MetroCluster switchover of production operations and let the test continue to run in switchover mode for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Heal the aggregates on site A. 7. Perform a MetroCluster switchback to return to normal operation. 8. Verify successful switchback. 9. Allow the test to continue for the remainder of the 60-minute duration. 10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. 28 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

TC-09: Disaster Forcing Unplanned Manual Switchover from Site A to Site B Test Case Test number Test case description Test assumptions Test data or metrics to capture Expected results Details TC-09 If an unplanned disaster at site A takes out the FAS8060 storage controllers and the Oracle RAC node at site A, the MetroCluster switchover feature should be capable of moving the production workload to site B and presenting the Oracle RAC database LUNs from the FAS8060 storage controllers at site B, allowing the database to continue operations. To test this premise, we powered off the FAS8060 storage controllers and the Oracle RAC node located at site A to simulate a site failure. We then manually initiated a MetroCluster switchover and switchback from site A to site B and then back to site A after mitigating the disaster at site A. A completely operational NetApp MetroCluster cluster has been properly installed and configured. A completely operational Oracle RAC environment has been installed and configured. The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern. AWR data as described in section 8, Test Case Overview and Methodology IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers on site A and site B As a result of the disaster, the FAS8060 at site A is lost, which ultimately causes the RAC node at site B to lose access to the database LUNs and stop running. Manually moving the production operations from site A to site B through the MetroCluster switchover operations allows the Oracle RAC database to be restarted by using the surviving database node. After the MetroCluster switchover and restart of the database, IOPS are directed to the FAS8060 storage controllers at site B from the surviving RAC node on site B. After the disaster is repaired and the MetroCluster switchback is completed, the repaired Oracle RAC node on site A is restarted and added back into the database. IOPS are again directed at the FAS8060 storage controllers on site A from both Oracle RAC nodes. Test Methodology 1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot. 2. Allow the workload to run for 15 minutes to establish consistent performance. 3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected. 4. On site B, initiate a MetroCluster switchover of production operations and let the test continue to run in switchover mode for 15 minutes. 5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault. 6. Heal the aggregates on site A. 7. Perform a MetroCluster switchback to return to normal operation. 8. Verify successful switchback. 9. Allow the test to continue for the remainder of the 60-minute duration. 29 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected. Deployment Details In this section, the deployment details of the architecture are listed in Table 2 through Table 6. Table 2) Oracle host specifications. Oracle Hosts Server Two Fujitsu Primergy RX300 S7 servers Operating system RedHat Enterprise Linux 6.5 Memory Network interfaces HBA 132GB Eth0:10000Mb/sec, MTU=9,000 Eth1:10000Mb/sec, MTU=9,000 Eth2:1000Mb/sec, MTU=1,500 Eth3:1000Mb/sec, MTU=1,500 QLogic QLE2562 - PCI-Express dual-channel 8Gb FC HBA Host attach kit and version NetApp Linux Host Utilities version 6.2 Multipathing SAN switches, models, and firmware Local storage used Yes Brocade 6510, v7.0.2c RHEL 6.5 only Table 3) Oracle specifications. Oracle Version 11.2.0.4.0 ASM (SAN only) 11.2.0.4.0 Oracle CRS (SAN only) 11.2.0.4.0 For these tests, we set the Oracle RAC parameters miscount and disktimeout to 120 and 300 seconds, respectively. These parameters control the amount of time the RAC nodes wait after losing access to storage and/or network heartbeats before taking themselves out of the cluster to prevent a potential split-brain situation. These values should be changed from the defaults only with careful understanding of the storage, network, and cluster layout. 30 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Table 4) Kernel parameters. Kernel Parameters: /etc/sysctl.conf File kernel.sem 250 32000 100 128 kernel.shmmni 4096 kernel.sem 250 32000 100 128 net.ipv4.ip_local_port_range 6815744 net.core.rmem_default 4194304 net.core.rmem_max 16777216 net.core.wmem_default 262144 net.core.wmem_max 16777216 net.ipv4.ipfrag_high_thresh 524288 net.ipv4.ipfrag_low_thresh 393216 net.ipv4.tcp_rmem 4096 524288 16777216 net.ipv4.tcp_wmem 4096 524288 16777216 net.ipv4.tcp_timestamps 0 net.ipv4.tcp_sack 0 net.ipv4.tcp_window_scaling 1 net.core.optmem_max 524287 net.core.netdev_max_backlog 2500 sunrpc.tcp_slot_table_entries 128 sunrpc.udp_slot_table_entries 128 net.ipv4.tcp_mem 16384 16384 16384 fs.file-max 6815744 fs.aio-max-nr 1048576 net.ipv4.tcp_no_metrics_save 1 net.ipv4.tcp_moderate_rcvbuf 0 vm.swappiness 0 31 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Table 5) Oracle initiation file parameters. Oracle init.ora Parameters MCCDB2. db_cache_size MCCDB1. db_cache_size 3G 3G MCCDB1. java_pool_size 67108864 MCCDB1. large_pool_size 83886080 MCCDB2. oracle_base MCCDB1. oracle_base MCCDB2. pga_aggregate_target /u01/app/oracle'#oracle_base set from environment /u01/app/oracle'#oracle_base set from environment 300M MCCDB1. pga_aggregate_target 419430400 MCCDB2. sga_target 4G MCCDB1. sga_target 4294967296 MCCDB2. shared_io_pool_size 0 MCCDB1. shared_io_pool_size 0 MCCDB2. shared_pool_size 300M MCCDB1. shared_pool_size 922746880 MCCDB2. streams_pool_size 0 MCCDB1. streams_pool_size 0 *.audit_file_dest *.audit_trail *.cluster_database *.compatible *.control_files '/u01/app/oracle/admin/mccdb/adump' 'db' TRUE '11.2.0.4.0' '+FRA/MCCDB/control01.ctl','+FRA/MCCDB/control02.ctl' *.db_block_size 8192 *.db_domain *.db_name '' 'MCCDB' *.db_writer_processes 20 *.diagnostic_dest *.dispatchers '/u01/app/oracle' (PROTOCOLTCP) (SERVICE MCCDBXDB)' MCCDB1.instance_number 1 MCCDB2.instance_number 2 *.log_buffer 102400000 32 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Oracle init.ora Parameters *.open_cursors 300 *.pga_aggregate_target 400M *.processes 1500 *.remote_listener *.remote_login_passwordfile 'rac-mcc:1521' 'exclusive' *.sessions 1655 *.sga_target 4294967296 MCCDB2.thread 2 MCCDB1.thread 1 MCCDB2.undo_tablespace MCCDB1.undo_tablespace 'UNDOTBS2' 'UNDOTBS1' Table 6) NetApp storage specifications. NetApp Storage Model Four FAS8060 storage systems (2 two-node clusters) Number of disks 192 Size of disks Drive type Shelf type 838.36GB SAS DS2246 Number of shelves 8 Operating system Flash Cache Network interface card (NIC) Target HBA Data ONTAP 8.3RC1 1TB Dual 10GbE controller IX1-SFP+ Qlogic 8324 (2a,2b) 4 back-end switches Brocade 6510 Kernel: 2.6.14.2 Fabric OS: v7.0.2c Made on: Fri Feb 22 21:29:23 2013 Flash: Mon Nov 4 18:39:15 2013 BootProm: 1.0.9 Software NFS, CIFS, FCP, FlexClone, OnCommand Balance 33 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Network Table 7, Table 8, and Table 9 list the network details. Table 7) Server network specifications. Hostname Interface IP Address Speed MTU Purpose stlrx300s7-85 eth0 172.20.160.100 10Gb/s 9,000 RAC interconnect eth0:1 169.254.76.209 10Gb/s 9,000 eth2 10.61.164.204 1Gb/s 1,500 Public eth2:1 10.61.164.138 1Gb/s 1,500 Public VIP eth2:2 10.61.164.140 1Gb/s 1,500 Mgmt eth2:3 10.61.164.142 1Gb/s 1,500 Mgmt stlrx300s7-87 eth0 172.20.160.102 10Gb/s 9,000 RAC interconnect eth0:1 169.254.180.210 10Gb/s 9,000 eth2 10.61.164.206 1Gb/s 1,500 Public eth2:1 10.61.164.141 1Gb/s 1,500 Public VIP eth2:5 10.61.164.139 1Gb/s 1,500 Public VIP Table 8) Storage network specifications. SVM LIF Port IP Address Spee d MTU Role Cluster stl-mcc-01-01_clus1 stl-mcc-01-01 e0a 169.254.228.130 10Gb 9,000 Cluster Cluster stl-mcc-01-01_clus2 stl-mcc-01-01 e0c 169.254.183.28 10Gb 9,000 Cluster Cluster stl-mcc-01-02_clus1 stl-mcc-01-02 e0a 169.254.32.214 10Gb 9,000 Cluster Cluster stl-mcc-01-02_clus2 stl-mcc-01-02 e0c 169.254.235.240 10Gb 9,000 Cluster Stl-mcc-01 cluster_mgmt stl-mcc-01-01 e0i 10.61.164.172 1Gb 1,500 Cluster mgmt Stl-mcc-01 stl-mcc-01-01_icl1 stl-mcc-01-01:e0b 10.61.164.176 10Gb 1,500 Intercluster Stl-mcc-01 stl-mcc-01-01_mgmt1 stl-mcc-01-01 e0i 10.61.164.170 1Gb 1,500 Node mgmt Stl-mcc-01 stl-mcc-01-02_icl1 stl-mcc-01-02 e0b 10.61.164.177 10Gb 1,500 Intercluster Stl-mcc-01 stl-mcc-01-02_mgmt1 stl-mcc-01-02 e0i 10.61.164.171 1Gb 1,500 Node mgmt Table 9) FC back-end switches. Hostname IP Address FC_switch_A1 10.61.164.166 FC_switch_A2 10.61.164.167 FC_switch_B1 10.61.164.168 34 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Hostname IP Address FC_switch_B2 10.61.164.169 Data Layout Figure 15 and Figure 16 show the layout of the data. Figure 15) Aggregate and volume layouts and sizes. 35 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Figure 16) Volume and LUN layouts for site A. 36 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Materials List Table 10 lists the materials used in the testing. Table 10) Materials list for testing. Quantity Description 2 HA pairs of FAS8060 (total 4 nodes) 4 Brocade 6510 switches for back-end MC SAN 4 FC/SAS bridges 8 2,246 disk shelves with 900GB SAS drives 37 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.

Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications. Copyright Information Copyright 1994 2015 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this document covered by copyright may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system without prior written permission of the copyright owner. Software derived from copyrighted NetApp material is subject to the following license and disclaimer: THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp. The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987). Trademark Information NetApp, the NetApp logo, Go Further, Faster, ASUP, AutoSupport, Campaign Express, Cloud ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness, Flash Accel, Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare, FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster, MultiStore, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, SANtricity, SecureShare, Simplicity, Simulate ONTAP, Snap Creator, SnapCopy, SnapDrive, SnapIntegrator, SnapLock, SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapValidator, SnapVault, StorageGRID, Tech OnTap, Unbound Cloud, and WAFL are trademarks or registered trademarks of NetApp, Inc., in the United States and/or other countries. A current list of NetApp trademarks is available on the Web at http://www.netapp.com/us/legal/netapptmlist.aspx. Cisco and the Cisco logo are trademarks of Cisco in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-4396-0415 38 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads 2015 NetApp, Inc. All Rights Reserved.