Session Title: Designing a PowerHA SystemMirror for AIX Disaster Recovery Solution

Similar documents
IBM System Storage SAN Volume Controller

IBM System Storage Neues zu Disk, SVC und Tape

PowerHA Enterprise Edition Cross Reference

Storage Virtualization: Simplify, Optimize, Centralize. Shiva Anand Neiker, XIV Sales Leader

Geographic LVM: Planning and administration guide

Power Systems with POWER7 and AIX Technical Sales

EMC VPLEX Geo with Quantum StorNext

IBM PowerHA SystemMirror for AIX. Standard Edition. Version 7.1. PowerHA SystemMirror concepts IBM

Storage / SAN Compatibility Guide For ESX Server 3.x

Storage Compatibility Guide for ESX Server 3.0

Becoming Proactive on High Availability and Disaster Recovery Readiness

HACMP Smart Assist for Oracle User s Guide

Introduction to PowerHA SystemMirror for AIX V 7.1 Managed with IBM Systems Director

Smarter Storage Solutions: Taming the explosion of information IBM Virtualized Disk Solution

Enabling Fast Recovery of Your Virtual Environments: NetBackup, Backup Exec & VCS for VMware

EMC VPLEX with Quantum Stornext

IBM EXAM QUESTIONS & ANSWERS

IBM EXAM QUESTIONS & ANSWERS

Exam Name: Midrange Storage Technical Support V2

Application Integration IBM Corporation

TSM Paper Replicating TSM

Extend your DB2 purescale cluster to another city- Geographically Dispersed purescale Cluster

HUAWEI OceanStor VIS6600T Product Presales Training

Data Sheet: High Availability Veritas Cluster Server from Symantec Reduce Application Downtime

SM B10: Rethink Disaster Recovery: Replication and Backup Are Not Enough

IBM TotalStorage Enterprise Storage Server Model 800

Storage Virtualization with IBM SAN Volume Controller 1G04

VMware with CLARiiON. Sheetal Kochavara

Exam : Title : High-End Disk for Open Systems V2. Version : DEMO

iscsi Interoperability

IBM Storwize V7000 Technical Architecture Overview Javier Suarez Systems Architect e-techservices

Using the Geographic LVM in AIX 5L

SC Series: Affordable Data Mobility & Business Continuity With Multi-Array Federation

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Veritas Cluster Server from Symantec

Step into the future. HP Storage Summit Converged storage for the next era of IT

Veritas Resiliency Platform 3.2 Hardware and Software Compatibility List

Vendor: IBM. Exam Code: Exam Name: IBM Midrange Storage Technical Support V3. Version: Demo

Ende-zu-Ende Datensicherungs-Architektur bei einem Pharmaunternehmen

PowerHA for IBM i Solutions Portfolio. Allyn Walsh Consulting IT Specialist Power Systems Strategic Initiatives

Disk Storage Interoperability Matrix

FlashSystem A9000 / A9000R R12.3 Technical Update

IBM TotalStorage Enterprise Storage Server (ESS) Model 750

Universal Storage Consistency of DASD and Virtual Tape

IBM Storage Software Strategy

HACMP High Availability Introduction Presentation February 2007

1 BRIEF / Oracle Solaris Cluster Features and Benefits

Simplify and Improve DB2 Administration by Leveraging Your Storage System

Network Layer Flow Control via Credit Buffering

Software Installation and Configuration Guide

Datasheet. FUJITSU Storage ETERNUS SF Storage Cruiser V16.1 ETERNUS SF AdvancedCopy Manager V16.1 ETERNUS SF Express V16.1

October Veritas TM Resiliency Platform 3.0 Hardware and Software Compatibility List

IBM Solutions Advanced Technical Support

IBM GDPS V3.3: Improving disaster recovery capabilities to help ensure a highly available, resilient business environment

Data Sheet: Storage Management Veritas Storage Foundation for Oracle RAC from Symantec Manageability and availability for Oracle RAC databases

Copyright 2012 EMC Corporation. All rights reserved.

EMC Symmetrix DMX Series The High End Platform. Tom Gorodecki EMC

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

Power Systems High Availability & Disaster Recovery

Exam Name: High Availability for AIX - Technical Support

VMware Site Recovery Manager 5.x guidelines for the IBM Storwize family

SAN for Business Continuity

Essentials. Oracle Solaris Cluster. Tim Read. Upper Saddle River, NJ Boston Indianapolis San Francisco. Capetown Sydney Tokyo Singapore Mexico City

Number: Passing Score: 800 Time Limit: 120 min File Version: 1.0. Vendor: IBM. Exam Code:

WHITE PAPER: ENTERPRISE SOLUTIONS. Veritas Storage Foundation for Windows Dynamic Multi-pathing Option. Competitive Comparisons

Acer Hitachi AMS2300 specifications

Virtualization And High Availability. Howard Chow Microsoft MVP

Veritas Storage Foundation for Windows by Symantec

IBM TotalStorage Enterprise Storage Server Model 800

Exploring Options for Virtualized Disaster Recovery

SAN Storage Array Workbook September 11, 2012

EMC RECOVERPOINT: ADDING APPLICATION RECOVERY TO VPLEX LOCAL AND METRO

How to license Oracle Database programs in DR environments

Microsoft Exam Questions & Answers

SANtricity OS Synchronous and Asynchronous Mirroring

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007

How Symantec Backup solution helps you to recover from disasters?

Data Center Interconnect Solution Overview

March Veritas TM Resiliency Platform 2.1 Hardware and Software Compatibility List

EMC DiskXtender for Windows and EMC RecoverPoint Interoperability

DELL EMC UNITY: REPLICATION TECHNOLOGIES

Virtual Recovery for Real Disasters: Virtualization s Impact on DR Planning. Caddy Tan Regional Manager, Asia Pacific Operations Double-Take Software

Simplify and Improve IMS Administration by Leveraging Your Storage System

IBM. Availability Implementing high availability. IBM i 7.1

A Continuous Availability Solution for Virtual Infrastructure

January Veritas TM Resiliency Platform 1.1 Hardware and Software Compatibility List

Veritas Storage Foundation for Oracle RAC from Symantec

iscsi Target Usage Guide December 15, 2017

Reasons to Deploy Oracle on EMC Symmetrix VMAX

Hitachi Virtual Storage Platform

IBM GDPS V3.3: Improving disaster recovery capabilities to help ensure a highly available, resilient business environment

Nové možnosti storage virtualizace s řešením IBM Storwize V7000

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Combining IBM Storwize V7000 IP Replication and Oracle Data Guard Reference guide for database and storage administrators

VERITAS Storage Foundation 4.0 for Windows

What s New In NetBackup 6.5: The NetBackup Platform: Next Generation Data Protection. Justin Stevens Senior Architect 02/21/2008

Veritas Volume Replicator Option by Symantec

Veritas InfoScale TM Operations Manager 7.3 Hardware and Software Compatibility List

Database Tables to Storage Bits: Data Protection Best Practices for Oracle Database

Storage Solutions with the IBM TotalStorage Enterprise Storage Server and VERITAS Software

Transcription:

IBM Power Systems Technical University October 18 22, 2010 Las Vegas, NV Session Title: Designing a PowerHA SystemMirror for AIX Disaster Recovery Solution Session ID: HA18 (AIX) Speaker Name: Michael Herrera 2010 IBM Corporation

Best Practices for Designing a PowerHA Enterprise Edition Solution on AIX Michael Herrera (mherrera@us.ibm.com) Advanced Technical Skills (ATS) Certified IT Specialist + Workload-Optimizing Systems

Agenda Available Offerings Campus Disaster Recovery vs. Extended Distance What you get with Enterprise Edition Expected Fallover Behaviors Summary 3

Tiers of Disaster Recovery PowerHA SM Enterprise Edition HA & DR solutions from IBM for your mission-critical AIX applications PowerHA Enteprise Edition fits in here Tier 7 - Highly automated, business wide, integrated solution (Example: GDPS/PPRC/VTS P2P, AIX PowerHA Enterprise Edition, OS/400 HABP... Tier 6 - Storage mirroring (example:xrc, Metro & Global Mirror, VTS Peer to Peer) Zero or near zero data recreation Applications with Low tolerance to outage Value minutes to hours data recreation up to 24 hours data recreation 24 to 48 hours data recreation Tier 5 - Software two site, two phase commit (transaction integrity) Tier 4 - Batch/Online database shadowing & journaling, Point in Time disk copy (FlashCopy), TSM-DRM Tier 3 - Electronic Vaulting, TSM**, Tape Tier 2 - PTAM, Hot Site, TSM** Tier 1 - PTAM Applications Somewhat Tolerant to outage Applications very tolerant to outage 15 Min 1-4 Hr. 4-8 Hr. 8-12 Hr. 12-16 Hr. 24 Hr. Days Recovery Time Tiers based on SHARE definitions *PTAM=Pickup Truck Access Method with Tape **TSM=Tivoli Storage Manager ***=Geographically Dispersed Parallel Sysplex 4

HACMP is now PowerHA SystemMirror for AIX! A 20 year track record in high availability for AIX Current Release: 7.1.0.X Available on: AIX 6.1 TL06 & 7.1 Packaging Changes: Standard Edition - Local Availability Enterprise Edition - Local & Disaster Recovery (Version 7.1 will not be released until 2011) Licensing Changes: Small, Medium, Large Server Class Product Lifecycle: Version Release Date End of Support Date HACMP 5.4.1 PowerHA 5.5.0 PowerHA SystemMirror 6.1.0 PowerHA SystemMirror 7.1.0 Nov 6, 2007 Nov 14, 2008 Oct 20, 2009 Sept 10, 2010 Sept, 2011 N/A N/A N/A * These dates are subject to change per Announcement Flash 5

PowerHA SystemMirror Version 6.1 Editions for AIX (7.1 Enterprise Edition N/A till 2011) Centralized Management CSPOC SMIT management interfaces AIX event/error management Integrated heartbeat PowerHA DLPAR HA management Smart Assists High Level Features Cluster resource management Shared Storage management Cluster verification framework Integrated disk heartbeat Multi Site HA Management PowerHA GLVM async mode GLVM deployment wizard IBM Metro Mirror support IBM Global Mirror support * EMC SRDF sync/async Hitachi True Copy & Global Replicator * * Hitachi & Global Mirror functionality is only available in 6.1.0.3 Standard Edition Enterprise Edition Highlights: New Editions to optimize software value capture Standard Edition targeted at datacenter HA Enterprise Edition targeted at multi-site HA/DR Tiered pricing structure Small/Med/Large 6

High Availability & DR: Drawing the Line Different Perspectives on the protection and replication of data Campus Style DR Cross Site LVM Mirroring - AIX LVM Mirrors SVC Split I/O VDisk Mirroring - SVC VDisk functionality Metro Mirror or SRDF * - Disk Based Replication * To manage disk level replication the Enterprise Edition is required Extended Distance Offerings Metro Mirror & Global Mirror - SVC, DS6K, DS8K, ESS800 EMC SRDF - DMX3, DMX4, VMAX Hitachi TrueCopy & Global - USPV USPVM Replicator GLVM (sync / async) - IP Based Replication Remote Site Data Center 7

Local High Availability vs. Disaster Recovery How far can I stretch a local cluster? How far can my storage be shared? LVM Mirroring Disk Replication VDisk Mirroring Storage Enclosure 1 Storage Enclosure 2 Network Connectivity Subnetting & potential latency Storage Infrastructure Can you merge fabrics and present LUNs from either location across the campus? Desired Resiliency LVM Mirroring across storage subsystems Both copies are accessible Storage Level Replication Only active copy available VDisk Mirroring (San Volume Controller) Single logical copy mirrored on backend 8 (Distance limitations ~10km or 6 miles)

Campus Style DR: Cross Site LVM Mirroring Leverage AIX Logical Volume Mirroring Distance Limitations: Synchronous Direct SAN links Up to 15 km FC switch FC switch FC switch FC switch LVM mirrors DWDM, CWDM or other SAN extenders (ie. 120-300 km) Distance limited by latency effect on performance FC switch FC switch DWDM DWDM FC switch FC switch LVM mirrors 9

Campus DR: Cross Site LVM vs. Storage Replication Considerations: Standard Edition vs. Enterprise Edition Disk Replication: Common Replication Mechanism across platforms Performance Differences: Host based LVM Mirroring vs. Disk Replication White Paper Cross Site Mirroring Performance Implications http://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp101269 Choices: Cross Site LVM Mirroring VDisk Mirroring (Split I/O Group) Metro Mirroring LAN LAN SAN DWDM DWDM SAN SVC SVC 10

PowerHA & Logical Volume Mirroring Considerations: What do the volume groups look like? Datavg Logical Volume LV LV hdisk hdisk hdisk hdisk LV Copy 1 primary LV Copy 2 secondary Local Storage Subsystem Remote Storage Subsystem New in AIX 6.1 - Mirror Pools Intended for Asynchronous GLVM Address Issues with Extending Logical Volumes and spanning copies New DR Redbook: Exploiting PowerHA SystemMirror Enterprise Edition - Scenario for Cross Site LVM with Mirror Pools 11

AIX 6.1 & Mirror Pools (SMIT Panels & CLI) Benefits: Prevent spanning copies Requirement for Async GLVM Other Potential Uses: - Cross Site LVM configurations - Synchronous GLVM * Reason that there is no asynchronous GLVM on AIX 5.3 and why it was not retrofitted * CSPOC does not currently allow you to create logical volume via menus. * Work around is to create logical volume using smit mklv and then continue creating Filesystem via CSPOC 12

Infrastructure Considerations Site A Site B LAN LAN SAN DWDM DWDM SAN Node A Node B SITEAMETROVG 50GB 50GB 50GB 50GB 50GB 50GB 50GB 50GB Important: Identify & Eliminate Single Points of Failure! 13

Infrastructure Considerations Site A XD_rs232 XD_IP WAN Site B net_ether_0 LAN LAN SAN DWDM DWDM SAN Node A Node B ECM VG: diskhb_vg1 hdisk2 000fe4111f25a1d1 1GB ECM VG: diskhb_vg1 hdisk3 000fe4111f25a1d1 ECM VG: diskhb_vg2 hdisk3 000fe4112f998235 1GB ECM VG: diskhb_vg2 hdisk4 000fe4112f998235 50GB 50GB 50GB 50GB SITEAMETROVG 50GB 50GB 50GB 50GB Important: Identify Single Points of Failure & design the solution around them 14

XD_rs232 networks and Serial over Ethernet Converted rs232 using rs422/rs485 Using true serial requires rs422/rs485 converters Distance to 1.2 km at 19,200 bps or 5 km (~3.1 miles) at 9600 bps Converted rs232 using fiber optics Fiber optic modems or multiplexors Distances of 20-100 km (~12-62 miles) but must conform to the vendor s specifications to avoid signal loss Companies like Black Box and TC Communications Serial over Ethernet This option provides the greatest distance by not defining any hard limitations, but is based on TCP/IP, which is one of the components that this type of network is designed to isolate Several vendors available online 15

PowerHA SystemMirror: Prominent Client Issues Cluster Subnet Requirements How do clients connect IPAT across Sites (Site Specific IPs) Context switch External Devices (ie. www.f5.com) Static IPs Node Bound Service IP (manual reconnect) DNS Change (consider TTL Time to Live) Resource Group A Startup: Online on Home Node Only Fallover: Fallover to Next Node in List Fallback: Never Fallback Site Policy: Prefer Primary Site Nodes: NodeA Node B Service IP: service_ip1 service_ip2 Volume Groups: datavg Application Server: AppA XD_rs232_net_0 XD_IP_net_0 en2 10.10.10.100 base 10.10.10.120 service_ip1 en2 13.10.10.100 base 13.10.10.120 service_ip2 1GB disk_hb_net_0 disk_hb_net_1 1GB 30GB 30GB 30GB 30GB Bldg A Cluster Data LUNs Bldg B 16

PowerHA Extended DR Solution Progression Building blocks for success SVC_Site A net_ether_01 SVC_Site B xd_ip net_diskhb_01 net_diskhb_02 Node A1 Node A2 Node B1 Node B2 SAN SAN PPRC Links SVC SVC HA first then DR 17

What are customers doing (Manual vs. Automated) Longer Distances require more robust solutions Local Clustering and Replication under the covers Metro / Global Mirror SRDF/A Hitachi True Copy & Global Replicator Oracle Dataguard DB2 HADR * Replicated volumes to an Inactive cluster Standalone GLVM IP replication or Automated GLVM is available in base 5.3 & 6.1 AIX Media Enterprise Edition required for Automation Fully Automated Solution PowerHA SystemMirror Enterprise Edition Additional offerings in the works! 18

PowerHA SystemMirror Storage Replication Integration Enterprise Edition Storage replication offerings Characteristics: Distance Limitations: Synchronous or Asynchronous Supported Replication: Metro & Global Mirror, SRDF, Hitachi TrueCopy & Global Replicator How it works: The cluster will redirect the replication depending on where the resources are being hosted Site A IP Communication Site B Considerations: DS8700 Global Mirror, EMC SRDF & Hitachi True Copy require PowerHA 6.1+ PowerHA SystemMirror Enterprise Cluster The Enterprise Edition adds additional cluster panels to define and store the relationships for the replicated volumes Storage Level Replication CLI is enabled for each replication offering to communicate directly with the storage enclosures and perform a role reversal in the event of a fallover Source LUNs Target LUNs 19

IBM z/vse SVC Version 5 Interoperability Matrix Storage Level virtualization for your Enterprise needs Novell NetWare VMware vsphere 4 Microsoft Windows Hyper-V IBM AIX IBM i 6.1 Sun Solaris HP-UX 11i Tru64 OpenVMS SGI IRIX Linux (Intel/Power/zLinux) RHEL SUSE 11 Apple Mac OS IBM N series Gateway NetApp V-Series IBM TS7650G IBM BladeCenter 1024 Hosts New New Point-in-time Copy Full volume, Copy on write 256 targets, New Incremental, Cascaded, Reverse Space-Efficient, FlashCopy Mgr New Entry Edition software IBM DS DS3400 DS4000 DS5020, DS3950 DS6000 DS8000 IBM XIV DCS9550 DCS9900 IBM N series New Native iscsi New SSD New SAN Volume Controller Hitachi Lightning Thunder TagmaStore AMS 2100, 2300, 2500 WMS, USP New New 8Gbps SAN fabric Continuous Copy Metro/Global MirrorNew Multiple Cluster Mirror HP MA, EMA MSA 2000, XP EVA 6400, 8400 Space-Efficient Virtual Disks New Virtual Disk Mirroring EMC CLARiiON CX4-960 Symmetrix New SAN SAN Volume Controller Sun NetApp NEC Fujitsu StorageTek FAS istorage Bull Eternus StoreWay 3000 For the most current, and more detailed, information please visit ibm.com/storage/svc and click on Interoperability. 8000 Models 2000 & 1200 4000 models 600 & 400 20 Pillar Axiom

Enterprise Edition Disk Replication Integration So what are you paying for? cluster.xd.license cluster.es.pprc.rte cluster.es.pprc.cmds cluster.msg.en_us.pprc cluster.es.spprc.cmds cluster.es.spprc.rte cluster.es.pprc.rte cluster.es.pprc.cmds cluster.es.msg.en_us.pprc cluster.es.sr.cmds cluster.es.sr.rte cluster.msg.en_us.sr cluster.es.tc.cmds cluster.es.tc.rte cluster.msg.en_us.tc Enterprise License Direct PPRC Management DSCLI Management EMC SRDF Hitachi True Copy & Global Replicator Qualified & supported DR configurations - IBM Development & AIX Software Support Teaming with EMC & Hitachi - Cooperative Service Agreement Install all Filesets or only what you need - Note Enterprise Verification takes longer - Don t install if you are not using it Filesets are in addition to base replication solution requirements 21

Geographic Logical Volume Mirroring - GLVM Enterprise Edition integrates with this IP replication offering How it works: Drivers will make the remote disks appear as if they are local over the WAN allowing for LVM mirrors between local and remote disks. Asynchronous replication requires the use of AIO cache logical volumes and Mirror Pools available only in AIX 6.1 and above Site A IP I/O Replication IP Communication Site B GLVM code is available in the AIX base media: AIX 5.3 synchronous replication AIX 6.1 synchronous & asynchronous replication PowerHA SystemMirror Enterprise Cluster PowerHA SystemMirror Enterprise Edition provides SMIT panels to define and manage all configuration information and automates the management of the replication in the event of a fallover Find more details in the new DR Redbook SG24-7841-00 Source LUNs Target LUNs * Storage can be dissimilar subsystems at either location 22

AIX & Geographic Logical Volume Mirroring Filesets: cluster.xd.license cluster.xd.glvm cluster.doc.en_us.glvm glvm.rpv.client glvm.rpv.msg.en_us glvm.rvp.server glvm.rpv.util glvm.rpv.man.en_us glvm.rpv.msg.en_us cluster.msg.en_us.glvm Enterprise License & Integration Filesets Geographic Logical Volume Mirrring (Available on AIX Media) 23

PowerHA SystemMirror & AIX 6.1 - Asynchronous GLVM Vegas Conference: Implementing PowerHA SystemMirror Enterprise Edition for Asynchronous GLVM Double session lab Wednesday Bill Miller 24

New in PowerHA SM 6.1 - GLVM Configuration Wizard Assists in the creation of a Synchronous GLVM cluster GLVM GLVM Cluster Cluster Configuration Configuration Assistant Assistant Type Type or or select select values values in in entry entry fields. fields. Press Press Enter Enter AFTER AFTER making making all all desired desired changes. changes. [Entry [Entry Fields] Fields] * * Communication Communication Path Path to to Takeover Takeover Node Node [] [] + + * * Application Application Server Server Name Name [] [] * * Application Application Server Server Start Start Script Script [] [] * * Application Application Server Server Stop Stop Script Script [] [] HACMP HACMP can can keep keep an an IP IP address address highly highly available: available: Consider Consider specifying specifying Service Service IP IP labels labels and and Persistent Persistent IP IP labels labels for for your your nodes. nodes. Generates: HACMP configuration: Service Service IP IP Label Label [] [] + + Persistent Persistent IP IP for for Local Local Node Node [] [] + + Persistent Persistent IP IP for for Takeover Takeover Node Node [] [] + + Cluster name: <user supplied application name>_cluster 2 HACMP sites: "sitea" "siteb" 2 HACMP nodes - one per site: use hostname for node name Single XD_data network IP-Alias enabled Includes all inter-connected network interfaces Persistent IP address for each node (optional for single interface networks) One Resource Group Inter-site Management Policy Prefer Primary Site Includes all the GMVGs created by the wizard Application Server One or more Service IPs 25

Site Management Policies & Dependencies The Enterprise Edition appends Inter-Site Management policies beyond the resource group node list - Prefer Primary Site - Online on Either Site - Online on Both Sites Standard Edition allows Site Definitions - Cross Site LVM Configs RG Dependencies: - Online on Same Site - will group RGs into a set - rg_move would move set not an individual resource group - SW will prevent removal of RG without removing dependency first 26

Failure Detection Rate & Disaster Recovery IP Based Networks: Serial Networks: Most customers using Local HA have these by default XD Type Networks have slower Failure Detection rates * PowerHA SystemMirror 7.1 has self tuning FDR with IP Multicasting * * There is no Enterprise Edition available for the 7.1 2010 release 27

PowerHA SM Enterprise: Fallover Recovery Action 2 Policies available: AUTO (default) or MANUAL Fallover Expected Behaviors: MANUAL only prevents a failover based on the status of the replicated volumes at time of node failure, therefore, if replication consistency groups reflect a consistent state a failover will still take place * Example shows SVC menu but same option there for all replication options 28

Manual Recovery - Special Instructions In a scenario where the MANUAL recovery action was selected and a fallover did not occur due to the storage relationships being inconsistent the resource groups will go into an ERROR state and special instructions will be printed to the hacmp.out file 29

DLPAR & Disaster Recovery Processing Flow How many licenses do you need? 1. Activate LPARs 2. Start PowerHA Read Requirements Activate LPARs LPAR Profile Min 1 Desired 1 Max 2 Application Server Min 1 Desired 2 Max 2 Application Server Min 1 Desired 2 Max 2 LPAR Profile Min 1 Desired 1 Max 2 Primary Site DLPAR System A HMC System B DLPAR Cluster 1-1 CPU + 1 CPU Oracle DB 21 CPU Oracle Standby DB 12 CPU + 1 CPU - 1 CPU 3. Release resources Fallover or RG_move 4. Site Fallover or movement of resources to Secondary Site Secondary Site HMC System C DLPAR Standby Oracle DB 21 CPU + 1 CPU 30

Enterprise Edition Command Line Interface Additional Commands available in Enterprise Edition: /usr/es/sbin/cluster/pprc/spprc/cmds/cllscss /usr/es/sbin/cluster/pprc/spprc/cmds/cllsspprc /usr/es/sbin/cluster/pprc/spprc/cmds/cllsdss /usr/es/sbin/cluster/svcpprc/cllssvc /usr/es/sbin/cluster/svcpprc/cllssvcpprc /usr/es/sbin/cluster/svcpprc/cllsrelationship /usr/sbin/rpvstat /usr/sbin/gmvgstat /usr/es/sbin/cluster/sr/cmds/cllssr /usr/es/sbin/cluster/tc/cmds/cllstc DS Metro & Global Mirror Relationships San Volume Controller Metro & Global Mirror Relationships GLVM Resources & Statistics EMC SRDF relationships Hitachi True Copy relationships Knowing these will help identify & manage configuration Various usage examples in the new Enterprise Edition Redbook 31

Cluster Test Tool & Enterprise Edition Available Utility in the base level code Automated Test Tool Custom Test Plans Enterprise Edition appends additional tests that can be included in custom test plans 32

Enterprise Edition: Component Failures & Outcomes Failures may not always occur in an orderly fashion ie. rolling disaster In an Ideal Scenario the entire site goes down Traditional Failures: Server / LPAR Failure standard cluster behavior Storage Subsystem Failure (remember AUTO vs. MANUAL) Selective Fallover behavior on quorum loss will result in movement of RG Most risky: Communication links between the sites fail Tested in Redbook by bringing down XD_IP network interfaces Results will vary based on the storage replication type Results: Standby site will acquire and redirect relationship Lost write access to disks and commands hung Might result in a system crash * Note: Environments in same network segment could experience duplicate IP ERROR messages Intermittent Failure (even worse): - Links back up and then log GS_DOM_MER_ERR (halt of Standby Site) - Entire cluster is now down since access to LUNs is N/A on primary site 33

Reference Diagram for Failure Scenario * Note that there is only one network passing heartbeats between the sites * Did not specify replication type but can probably assume that this was an SVC Metro Mirror configuration based on the name of the states * Arrows should really point in other direction for the replication after the failure Avoiding a partitioned cluster: - More XD_IP networks - Serial over Ethernet - Diskhb networks over the SAN Future Considerations: - Quorum server 34

Recovery from Partitioned Cluster: Recommendations Things to check: State of the cluster nodes (connectivity, HMC, state of interfaces, error report) State of heartbeat communication paths (ie. lssrc ls topsvcs) Consistency of the replicated volumes (CLI will vary by replication type) Status of the data What do you do to recover? Identify cause ASAP Beware of intermittent failures Consider bringing down all nodes on one site (avoid a cluster initiated halt) Hard Reset might be the best approach as graceful stop might hang up attempting to release individual resources (ie. unmount, varyoff with no access to volumes) Check consistency of the data Every application will be different Reintegrate Nodes into cluster accordingly Consider Verify & sync before reintegration 35

When to use each Replication Option Major Factors: Distance between sites Campus DR or Extended Distance Infrastructure & available bandwidth What type of Storage currently being used? Same storage type at both locations? Requirement to use CLI for management of relationships SLA Requirements is HA required after a site fallover? What is the True requirement for automated fallover Recovery Time Objective RTO Recovery Point Objective RPO Extended Distance Offerings Introduction to PowerHA SystemMirror for AIX Enterprise Edition HA20 (AIX) Thursday & Friday Shawn Bodily 36

Enterprise Edition: General Recommendations Clustering, Replication and High Availability solutions are not a replacement for backups Mksysbs Flashcopies Snapshots Testing DR Solutions is the only way to guarantee they will work Testing should be performed at least once or twice a year Will help to identify any other components required outside of the cluster Recovery plan should be well documented & reside at both locations Leverage Cluster functions to ensure success CSPOC User functions guarantee that users are propagated to all cluster nodes User Password cluster management functions will ensure that changes are also updated on all cluster nodes 37

PowerHA SM Enterprise Edition: Value Proposition Difference from Standard Edition Automates IP or Disk based Replication Mechanism Stretch Clusters Campus Style DR Distance based on far you can extend shared storage Why pay more for Campus DR? - use Cross Site LVM Automated Fallover Manual Fallover option (based on state of disks) Enterprise Cluster will automatically trigger a fallover To disable alter start up scripts at DR location Ease of Management One time configuration Location of RG will determine direction of replication Installation, Planning, Maintenance & Expected Behaviors Documented in new DR Redbook SG24-7841 38

Questions? Thank you for your time! 39

Additional Resources New - Disaster Recovery Redbook SG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIX http://www.redbooks.ibm.com/abstracts/sg247841.html?open New - RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi server IBM Power Systems Environments http://www.redbooks.ibm.com/abstracts/redp4669.html?open Online Documentation http://www-03.ibm.com/systems/p/library/hacmp_docs.html PowerHA SystemMirror Marketing Page http://www-03.ibm.com/systems/p/ha/ PowerHA SystemMirror Wiki Page http://www-941.ibm.com/collaboration/wiki/display/wikiptype/high+availability PowerHA SystemMirror ( HACMP ) Redbooks http://www.redbooks.ibm.com/cgi-bin/searchsite.cgi?query=hacmp 40