DELL EMC GDDR (GEOGRAPHICALLY DISPERSED DISASTER RESTART) fr PwerMax 8000 & VMAX ALL FLASH 950F ARM yur data center fr prtectin against disaster ABSTRACT This paper presents an verview f GDDR sftware technlgy used t autmate, react, and mnitr (ARM) large scale mainframe and mixed mainframe-pen systems envirnments prviding cntinuus peratins r autmated failver during planned r unplanned events May 2018 WHITE PAPER
The infrmatin in this publicatin is prvided as is. Dell EMC Crpratin makes n representatins r warranties f any kind with respect t the infrmatin in this publicatin, and specifically disclaims implied warranties f merchantability r fitness fr a particular purpse. Use, cpying, and distributin f any Dell EMC sftware described in this publicatin requires an applicable sftware license. Dell EMC, the Dell EMC lg are registered trademarks r trademarks f Dell EMC Crpratin in the United States and ther cuntries. All ther trademarks used herein are the prperty f their respective wners. Cpyright 2016 Dell EMC Crpratin. All rights reserved. Published in the USA.07/2017 white paper part number h13805.2 Dell EMC believes the infrmatin in this dcument is accurate as f its publicatin date. The infrmatin is subject t change withut ntice. Dell EMC is nw part f the Dell grup f cmpanies. Gegraphically Dispersed Disaster Restart 2
CONTENTS INTRODUCTION... 4 KEY GDDR CONCEPTS... 5 Situatinal Awareness... 5 Leadership Arbitratin and Cntrl... 5 Survivr Recgnitin... 5 SUPPORTED TOPOLOGIES... 6 SRDF/S with CnGrup... 6 SRDF/S with AutSwap... 8 SRDF/A... 9 SRDF/STAR... 10 SRDF/SQAR... 11 CONCLUSION... 14 Gegraphically Dispersed Disaster Restart 3
Intrductin GDDR is a mainframe sftware prduct that autmates business recvery prcedures by reacting t events that its mnitring capability detects in the data center. These three functins: autmate, react and mnitr (ARM) cmbine t enable cntinuus peratins acrss bth planned and unplanned utages. GDDR is designed t perfrm planned data center site switch peratins as well as t restart peratins fllwing disasters ranging frm the lss f cmpute capacity and/r disk array access, thrugh t ttal lss f a single data center r a reginal disaster, including the lss f dual data centers. GDDR achieves this gal by prviding autmatin t cmplement the functinality f the Dell EMC hardware and sftware prducts required fr business restart. Because GDDR prvides system restart fllwing disasters, it des nt reside in the same systems that it is seeking t prtect. GDDR resides n separate lgical partitins (LPARs) frm thse that run applicatin wrklads. Fr example, in a three data center SRDF/STAR cnfiguratin, GDDR is installed n a cntrl LPAR at each site. Each GDDR nde is aware f the ther tw GDDR ndes via netwrk cnnectins between each site. This awareness enables the mnitring that is required t react t disasters, identify survivrs, nminate the leader and then autmate the necessary actins t resume peratins at ne f the custmer-chsen surviving sites. T achieve the task f business restart, GDDR autmatin extends well beynd the disk layer where Dell EMC has traditinally fcused and int the hst perating system layer. It is at this layer that sufficient cntrls and access t third party sftware and hardware prducts exist t enable Dell EMC t prvide autmated recvery services. GDDR is unique in that it uses an expert system applicatin f knwledge engineering t dynamically create an autmatin script t handle a planned r unplanned event. As a result GDDR is a single prduct. It can handle the cmplexity and variability f 13 different custmer cnfiguratins f sites and sftware prducts. Cmpetitrs use versins f their base sftware t deal with these cnfiguratin differences. This makes it difficult r impssible t mve frm ne cnfiguratin type t anther withut cmpletely re-engineering the slutin. Fr GDDR envirnments, that s simply a matter f re-describing the cnfiguratin via parameters and rerunning discvery utility sftware. Gegraphically Dispersed Disaster Restart 4
Key GDDR Cncepts GDDR brings sme imprtant cncepts t the deplyment peratin f Dell EMC business cntinuity technlgy: and Situatinal Awareness Survivr Recgnitin Leadership Arbitratin and Cntrl Figure 1: GDDR/Star with AutSwap Situatinal Awareness GDDR brings situatinal awareness t Dell EMC business cntinuity technlgy. Fr example, GDDR is able t distinguish between netwrk utages SRDF link drps versus real disasters. This awareness is achieved by peridic exchange f dual-directin heartbeats between the GDDR LPARs. It seems like a simple ntin, but t a fundatin technlgy such as SRDF/A there is n means t determine the difference between a link utage and a real disaster. Leadership Arbitratin and Cntrl GDDR perates in a Master Owner/ Nn-Owner in relatin t ther GDDR cntrl LPARs. In a three site tplgy, the GDDR master C-System wuld nrmally reside at the DC2 lcatin. Hwever, if the DC2 lcatin is destryed r the GDDR C-System itself fails, then ne f the surviving GDDR C-Systems will assume the rle f the GDDR Master. Changes t GDDR cnfiguratin infrmatin can nly be made n the GDDR Master C-System. GDDR prpagates these changes t the subrdinate GDDR systems using inter-system cmmunicatins facilities built int GDDR. Restart prcedures fllwing disasters are crdinated frm the GDDR Master C-System. Survivr Recgnitin Withut autmatin sftware, replicatin technlgies d nt act n disaster situatins t achieve recvery. GDDR has built in intelligence t lk ut fr ther GDDR systems; cnstantly checking fr disaster situatins and cnstantly ensuring that ther GDDR systems are healthy. This cnstant checking allws GDDR t recgnize and act n ptential disaster situatins, even if nly ne GDDR system survives. Gegraphically Dispersed Disaster Restart 5
Supprted Tplgies A Dell EMC GDDR cmplex cnsists f GDDR cntrl systems (C-Systems), z/os and pen systems hsts, and Dell EMC PwerMax 8000 and r VMAX strage systems which supprt an rganizatin's missin-critical wrklad. GDDR is unique in the ability f a single GDDR cmplex t supprt multiple z/os parallel sysplexes and can manage bth CKD and FBA disk, prviding an enterprise-wide disaster restart slutin. Dell EMC GDDR is available in the fllwing cnfiguratins: Tw site: SRDF/S with CnGrup The 2-site SRDF/S with CnGrup cnfiguratin prvides disaster restart capabilities at site DC2. SRDF/S with AutSwap The 2-site SRDF/S with AutSwap cnfiguratin prvides fr cntinuus availability thrugh device failver between DC1 and DC2. SRDF/A The 2-site SRDF/A cnfiguratin prvides disaster restart capabilities at site DC3. Three site: SRDF/Star with CnGrup The 3-site SRDF/Star cnfiguratin prvides disaster restart capabilities at either DC2 r DC3. Cncurrent and Cascaded SRDF supprt ptins further minimize the DC3 recvery time bjective. 2-site SRDF/Star A variant f 3-site SRDF/Star with CnGrup, this cnfiguratin supprts a DC2 site with n hst and the PwerMax 8000 r VMAX acting as a data bunker. SRDF/Star with AutSwap The 3-site SRDF/Star cnfiguratin prvides bth cntinuus availability between DC1 and DC2 as well as disaster restart capabilities at either DC2 r DC3. Cncurrent and Cascaded SRDF supprt ptins further minimize the DC3 recvery time bjective. Fur Site: SRDF/SQAR with AutSwap The 4-site SRDF/SQAR with AutSwap cnfiguratin prvides fr cntinuus availability thrugh device failver between DC1 and DC2 as well as cntinuus disaster recvery prtectin thrugh redundant SRDF/A replicatin ut f regin t DC3 and DC4. GDDR can be custmized t perate in any f these cnfiguratins. GDDR functinality is cntrlled by a parameter library. During GDDR implementatin, this parameter library is custmized t reflect: The prerequisite Dell EMC sftware cmpnents The desired data center tplgy (tw-site,three-site, fur-sire, synchrnus r asynchrnus, cncurrent and/r cascaded). SRDF/S with CnGrup The 2-site SRDF/S with CnGrup cnfiguratin prvides disaster restart capabilities at site DC2. Figure 2 illustrates GDDR peratin in the SRDF/S with Cnsistency Grup envirnment. Gegraphically Dispersed Disaster Restart 6
Figure 2 GDDR SRDF/S with CnGrup Figure 2 shws the tw GDDR C-Systems with their heartbeat cmmunicatin paths, separate frm the prductin disk and cmputer facilities. Each f the DC1 and DC2 prductin z/os LPARs has Dell EMC Cnsistency Grup (CnGrup) sftware installed. SRDF/S and CnGrup ensure that at the pint that GDDR receives ntificatin f an unplanned r failure event, a pint f cnsistency is already achieved. In this envirnment, GDDR can d the fllwing: Manage planned site swaps Restart prcessing at the secndary site fllwing unplanned primary site events Perfrm standard peratinal tasks: IPL, system reset, activate, deactivate Trigger stp/start f business wrklads Actively mnitr fr unplanned/failure events Sites Systems Lss f SRDF/S CnGrup trip Inter-site cmmunicatin failure Gegraphically Dispersed Disaster Restart 7
SRDF/S with AutSwap The 2-site SRDF/S with AutSwap cnfiguratin prvides fr near-cntinuus availability thrugh device failver between DC1 and DC2. Figure 3 illustrates GDDR peratin in the SRDF/S with AutSwap envirnment. Figure 3 GDDR SRDF/S with AutSwap As Figure 3 shws, the relatinship between the DC1 and DC2 sites is maintained thrugh SRDF/S replicatin f primary disk images at DC1 t DC2. Bth pen systems (FBA) and mainframe (CKD) disk images can be replicated. Figure 3 shws the tw GDDR C-Systems with their heartbeat cmmunicatin paths, separate frm the prductin disk and cmputer facilities. Each f the DC1 and DC2 prductin z/os LPARs has AutSwap and Cnsistency Grup (CnGrup) sftware installed. AutSwap and CnGrup ensure that a pint f cnsistency exists whenever Dell EMC GDDR receives ntificatin f an unplanned r failure event. In this envirnment, GDDR can d the fllwing: Manage planned site swaps Manage recvery after unplanned site swaps Perfrm standard peratinal tasks: IPL, system reset, activate, deactivate Trigger stp/start f business wrklads Actively mnitr fr unplanned/failure events Sites Gegraphically Dispersed Disaster Restart 8
Systems Lss f SRDF/S CnGrup trip Inter-site cmmunicatin failure AutSwap events Cnfigure/recnfigure Cuple datasets Manage cupling facilities plicies SRDF/A The 2-site SRDF/A cnfiguratin prvide disaster restart capabilities at site DC3. Figure 4 illustrates GDDR peratin in the SRDF/A envirnment. Figure 4: GDDR Tw Site SRDF/A As Figure 4 shws, the relatinship between the DC1 and DC3 sites is maintained thrugh SRDF/A replicatin f primary disk images at DC1 t DC3. Bth pen systems (FBA) and mainframe (CKD) disk images can be replicated. Figure 4 shws the tw GDDR C-Systems with their heartbeat cmmunicatin paths, separate frm the prductin disk and cmputer facilities. GDDR des nt have a requirement t freeze I/O t btain a pint f cnsistency. Multi-Sessin Cnsistency and SRDF/A prvide the mechanism. At the pint that GDDR receives ntificatin f an unplanned r failure event, a pint f cnsistency is already achieved thrugh these fundatin technlgies. In this envirnment, GDDR can d the fllwing: Gegraphically Dispersed Disaster Restart 9
Manage planned site swaps Restart prcessing at the secndary site fllwing unplanned primary site events Perfrm standard peratinal tasks IPL, system reset, activate, deactivate Trigger stp/start f business wrklads Actively mnitr fr unplanned/failure events Sites Systems Lss f SRDF/A Inter-site cmmunicatin failure SRDF/STAR The 3-site SRDF/Star cnfiguratin prvides disaster restart capabilities at DC2 r DC3. Figure 5 illustrates GDDR peratin in a cncurrent SRDF/Star envirnment. GDDR can als be cnfigured fr peratin in a cascaded SRDF/Star envirnment. Figure 5: SRDF/Star with Autswap Gegraphically Dispersed Disaster Restart 10
The relatinship between the DC1 and DC2 sites is maintained thrugh SRDF/Synchrnus replicatin f primary disk images at DC1 t DC2. Bth pen systems (FBA) and mainframe (CKD) disk images can be replicated. In a cncurrent cnfiguratin, the asynchrnus relatinship is between DC1 and DC3, while in a cascaded envirnment, the asynchrnus relatinship is between DC2 and DC3. Figure 5 shws the three GDDR C-Systems with their independent heartbeat cmmunicatin paths, separate frm the prductin disk and cmputer facilities. Each f the DC1 and DC2 prductin z/os LPARs has Cnsistency Grup (CnGrup) installed. In this envirnment, GDDR can perfrm the fllwing tasks: Manage planned site swaps Manage recvery after unplanned site swaps Manage recnfiguratin f the SRDF/Star envirnment between cncurrent and cascaded tplgies Manage recnfiguratin f the SRDF/Star envirnment frm cascaded t cncurrent with a primary prcessing site mve Perfrm standard peratinal tasks: IPL, system reset, activate, deactivate Trigger stp/start f business wrklads Actively mnitr fr unplanned/failure events, including: Sites Systems CnGrup trip Lss f SRDF/S Lss f SRDF/A Inter-site cmmunicatin failure SRDF/SQAR The 4-site SRDF/SQAR cnfiguratin prvides disaster restart capabilities at DC2, DC3 r DC4. Figure 6 illustrates GDDR peratin in a SRDF/SQAR envirnment. This tplgy features redundant SRDF/A cnnectins fr cntinuus DR prtectin ut f regin and prvides the ability t resume a tw site SRDF/S peratin in anther regin withut having t perfrm a full resynchrnizatin between the arrays. Gegraphically Dispersed Disaster Restart 11
Figure 6: GDDR SRDF/SQAR In this envirnment, GDDR can perfrm the fllwing tasks: Manage planned site swaps Manage planned regin swaps Cntinue remte SRDF/A replicatin fllwing inter-site link failure Resume SRDF/S with AutSwap prtectin in remte regin fllwing unplanned site/regin utage Manage recvery after unplanned site swaps Manage recvery after unplanned regin swaps Perfrm standard peratinal tasks: IPL, system reset, activate, deactivate Trigger stp/start f business wrklads Actively mnitr fr unplanned/failure events, including: Sites Systems CnGrup trip Gegraphically Dispersed Disaster Restart 12
Lss f SRDF/S Lss f SRDF/A Inter-site cmmunicatin failure GDDR: Tape Supprt with Dell EMC Disk Library fr Mainframe (DLm) Since GDDR is able t prduce cnsistency acrss pen systems platfrms and z/os it was natural t apply this capability t the Dell EMC Disk Library fr Mainframe (DLm), as the DLm is simply treated as anther pen systems hst t GDDR. By including a DLm that uses PwerMax 8000 r VMAX strage as its back-end disk int a GDDR managed cnsistency grup, GDDR is able t prvide cnsistency acrss tape data and the tape file related meta-data stred n DASD, such as the tape catalg, ICF catalg, and DFSMShsm cntrl datasets. This cncept is knwn as Universal Data Cnsistency and is unique in the marketplace in its ability t ensure data integrity acrss tape and DASD in lcal and remte (synchrnus and asynchrnus) replicatin envirnments. Dell EMC DLm w/vmax Dell EMC DLm w/vmax Figure 7: Universal Data Cnsistency GDDR Versin 5.0 Enhancements Versin 5.0 f GDDR ffers the fllwing enhancements: 1) Supprt fr PwerMax 800 and VMAX 950F All Flash array with Mainframe Enablers V8.x 2) Explitatin f TimeFinder SnapVX: SnapVX is a new lcal replicatin technlgy available in the PwerMax 8000 and VMAX 950F All Flash arrays that allws up t 256 pint in time cpies per vlume in a very capacity efficient manner using new pinter based technlgy. While TimeFinder SnapVX is used by GDDR n PwerMax, VMAX3 and VMAX All Flash systems, TimeFinder Clne supprt has als been added t GDDR fr use n lder VMAX systems. Expanding supprt fr bth TimeFinder features allws GDDR t take cnsistent pint in time Gegraphically Dispersed Disaster Restart 13
cpies in envirnments with multiple generatins f VMAX arrays, prtecting custmers investments in Dell EMC technlgy. 3) Supprt fr Data Prtectr fr z Systems (zdp): zdp is a Dell EMC z/os based applicatin that utilizes SnapVX snapshts t enable rapid recvery frm lgical data crruptin. zdp achieves this by prviding multiple, frequent, cnsistent, pint-in-time cpies f data in an autmated fashin acrss multiple vlumes frm which an applicatin level recvery can be cnducted. By prviding easy access t multiple different pint-in-time cpies f data (with a granularity f minutes), precise remediatin f lgical data crruptin can be perfrmed using strage r applicatin-based recvery prcedures. zdp prvides the fllwing benefits: a. Faster recvery times as less data must be prcessed due t the granularity f the available pint in time data cpies b. Crss applicatin data cnsistency fr recvery data c. Minimal data lss cmpared t the previus methd f restring data frm daily r weekly backups. This is especially imprtant fr nn-dbms data, which des nt have the granular recvery ptins prvided by lg files and image cpies assciated with database management systems. GDDR 5.0 prvides supprt fr zdp, interfacing with zdp where required during GDDR autmatin tasks t ensure planned an unplanned actins execute successfully n zdp managed vlumes. Cnclusin GDDR prvides autmatin t Dell EMC s enterprise class business cntinuity slutins frm the mst cmplex and intricate fur-site slutins thrugh t the simpler tw-site cnfiguratins. GDDR prvides autmatin fr bth planned and unplanned utage management fr bth the z/os layer and PwerMax 8000 and r VMAX strage and related sftware. GDDR autmatin prvides the crrect steps, the crrect cmmands, and the crrect sequencing f rderly business cntinuity peratins. Users deplying GDDR autmatin will realize these significant benefits frm a GDDR implementatin: Predictability f utcme Imprved testability f business cntinuance plans Operatinal simplicity, allwing lwer skilled persnnel t perfrm business cntinuity peratins Cnsider shifting respnsibility fr yur business cntinuity requirements t Dell EMC. GDDR s test hardened resiliency cmbined with glbal 7*24 custmer service will help t prvide certainty when yu need it mst. Gegraphically Dispersed Disaster Restart 14