Best Practices Guide for ProtecTIER V. 2.5

Size: px

Start display at page:

Download "Best Practices Guide for ProtecTIER V. 2.5"

Maximilian Egbert Craig
5 years ago
Views:

1 IBM System Storage TS7600 Best Practices Guide for ProtecTIER V. 2.5 for the TS7600 Family of Products GA

3 IBM System Storage TS7600 Best Practices Guide for ProtecTIER V. 2.5 for the TS7600 Family of Products GA

4 Note Before using this information and the product it supports, be sure to read the information in the "Safety and enironmental notices" and "Notices" sections of this publication. Edition notice This edition applies to the TS7650 Appliance and to all subsequent releases and modifications until otherwise indicated in new editions. This edition replaces GC Copyright IBM Corporation 2008, US Goernment Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

5 Contents Figures ii Tables ix Safety and enironmental notices... xi Safety notices xi Danger notices for the TS7650 Appliance and TS7650G (Gateway) xii Caution notices for the TS7650 Appliance and TS7650G (Gateway) x Labels for the TS7650 Appliance and TS7650G (Gateway) xxi Enironmental notices xxi About this document xx Who should read this document xx Terminology xx Related publications xxiii Part 1. ProtecTIER 2.5, Natie Replication Best Practices Chapter 1. Replication operational oeriew HyperFactor, deduplication and bandwidth saings. 5 Replication features Chapter 2. Natie replication normal operation concepts Replication Replication data-transfer Visibility switch control Backup application catalog Single-domain and multiple-domain backup application enironments Consideration for DR site tape cloning in a single domain enironment Policies System characteristics Chapter 3. OpenStorage (OST) network configuration Definitions and Acronyms Bonding/Teaming Bonding on linux machines Bonding on Unix machines Teaming on Microsoft-based machines LAG and similar technologies on the LAN Switches Setting up the network with ProtecTIER Load Distribution Methodology LAN switch Hosts Network topologies Connectiity on a Single Site Connectiity to a Remote Site Chapter 4. Configuration instructions and examples Chapter 5. Planning for replication deployment Bandwidth sizing and requirements Monitoring Firewalled enironments - ports assignments for the Replication Manager and replication operations.. 41 Bandwidth alidation tool Practical approach for replication performance planning Replication rate control and bandwidth throttling. 46 Choosing the replication mode of operation - scheduled s. continuous Choosing a replication mode of operation - isibility switch s basic DR Use cases to demonstrate features of replication/dr operation Capacity sizing Performance and capacity sizing Handling the backup application database Deploying natie replication Chapter 6. Replication solution optimization Performance Automation of daily operation Chapter 7. Recoery management PTCLI Disaster recoery operations Principality Repository replacement Returning to normal operations Taking oer a destroyed repository Flushing the replication backlog Chapter 8. Using the Visibility Switch Control feature ProtecTIER VTL cartridge handling Chapter 9. LUN Masking Working with LUN masking groups Part 2. Back End Storage Configuration Best Practices Copyright IBM Corp. 2008, 2011 iii

6 Chapter 10. General configuration oeriew Raid LUNs Storage manager Chapter 11. Specific guidelines for DS Fibre Channel cabling SAN fabric zoning Rules and standard settings Chapter 18. Deploying replication with specific back up applications Determining which cartridges at the DR site to restore NetBackup (NBU) Tioli Storage Manager (TSM) Reclamation considerations EMC/Legato NetWorker CommVault HP Data Protector best practice with ProtecTIER/TS7650x Chapter 12. Specific guidelines for DS Cabling guidelines Setting up arrays Cabling layouts SAN fabric zoning RAID configuration Rules and standard settings Chapter 13. Specific guidelines for XIV Storage System XIV Storage system hardware XIV Storage manager GUI XIV System zoning XIV System GUI installation Configuration of the XIV Storage System Installation of the XIV Storage System GUI Launching the XIV GUI Adding an XIV Storage System to the GUI XIV / ProtecTIER performance metrics Configuring XIV System storage Part 3. Enironment OS Chapter 14. Driers Specification for AIX all platform ersions to work with ProtecTIER Chapter 15. Driers Specification for Solaris Platform to work with ProtecTIER Part 5. Data Types Chapter 19. RMAN Oracle Best Practice Tuning with ProtecTIER Chapter 20. Lotus Domino Tuning with ProtecTIER Domino Lotus Changing the Domino Serer Database Recommendations for the backup command Chapter 21. DB2 Best Practices tuning with ProtecTIER Chapter 22. Recommendations for specific source data types Part 6. General Best Practices Chapter 23. IBM i BRMS and ProtecTIER Replication How ProtecTIER replication works Backup Recoery and Media Serices (BRMS) Deploying ProtecTIER with BRMS for disaster recoery Examples of BRMS policies and control groups for ProtecTIER replication in a single domain Chapter 24. V7000 (Storwize) Best Practices with ProtecTIER Chapter 16. Driers Specification for Linux Platform to work with ProtecTIER Part 7. Appendixes Accessibility Part 4. Backup Application Chapter 17. Backup application serers General recommendations Best Practices for Specific Backup Applications Notices Trademarks Electronic emission notices Federal Communications Commission statement 267 Industry Canada compliance statement European Union Electromagnetic Compatibility Directie Australia and New Zealand Class A Statement 268 i IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

7 Germany Electromagnetic compatibility directie People's Republic of China Class A Electronic Emission statement Taiwan Class A compliance statement Taiwan contact information Japan VCCI Class A ITE Electronics Emission statement Japan Electronics and Information Technology Industries Association (JEITA) Statement Korean Class A Electronic Emission statement 271 Russia Electromagnetic Interference (EMI) Class A Statement Index Contents

8 i IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

9 Figures 1. Relationship between Cost and % of Enironment Protected Potential deployment of hub and spoke replication configuration Replication Grid View of all actie repositories Replication Manager managing a replication grid with two topology groups of repositories Replication Grid topology example, replication pair (one hub and one spoke) with two-node-cluster systems Multiple teaming topologies Single host with single ProtecTIER serer on the same VLAN Multiple switches High aailability configuration Host with single ProtecTIER serer on the different VLANs Connectiity in remote sites Replication Rate Limits Potential modification of Eth3 interface limit Suggested replication rates limits per the aboe example Set replication time frame screen, showing same 16 hour replication window for all 7 days Resere space for backup entry window Setting limits, continuous backup at the hub example Setting limits, throttling by conseratie limit example Limit spoke to create time frame effect Typical Backup and DR enironment using ProtecTIER ProtecTIER's IP replication functionality as used in a backup & recoery enironment Scenario one, replication complete Scenario two, replication incomplete Scenario three, replication incomplete, DB complete Replication Manager screen Creating a new Grid Screen Adding a repository to the Grid Connect spokes to a hub Shelf cartridges status iew at the hub Moing into ProtecTIER DR-mode Entering ProtecTIER Replication Failback wizard Leaing ProtecTIER DR mode Host initiator management LUN Masking groups Zoning topology Stand-alone fibre channel connections Clustered fibre channel connections DS5300 cabling guide Example of DS5100/5300 with eight enclosures Example of Drie selection for a RAID 5 4+P and a RAID DS5300 Direct attach with dual path from 1 Node to 1 DS Single Node attached to a single DS5300 through a switch Single node attached to one DS5300 through dual switches DS5300 Direct attach with single path from two node cluster DS5300 attached through a switch to a two node cluster DS5300 attached through dual switches to a two node cluster Dual DS5300s Direct attached to a two node cluster Dual DS5300s attached through a switch to a two node cluster Dual DS5300s attached through dual switches to a two node cluster Storage manager window showing subsystem ready for use Storage subsystem information window Host Operating System menu cascade Change host operating system window Properties menu cascade LUN information window XIV hardware XIV Storage manager GUI XIV Patch panel and TS7650 rear iew showing ports XIV to ProtecTIER sample zoning XIV storage management setup wizard Setup dialog window Setup completion window XIV GUI login window Add system management window Throughput Storage pool window Storage pool window Add pool window Storage pool window showing new pool Selecting Volumes by Pools Volumes by Pools window Create olume window Volumes by Pools window showing Meta Data olume Create olumes window with User Data input Volumes by Pools window showing User Data olumes Hosts and Clusters menu Hosts and Clusters window Add Host window The Hosts and Clusters window showing the ProtecTIER host Copyright IBM Corp. 2008, 2011 ii

10 80. Host drop down menu Add Port window Hosts and Clusters window showing the ports Host drop down menu with Modify LUN Mapping Selecting olumes to be mapped View LUN Mapping window Cartridge status report (in Excel) Typical TSM Enironment with ProtecTIER (Pre Replication) ProtecTIER replication and TSM Scenario one, replication complete Scenario two, replication incomplete Scenario three, replication incomplete, DB complete Two site disaster recoery using IP Replication Typical Scenario (CommVault UI) Selected Media moed to New Outside Storage Location (ProtecTIER GUI) Selected Cartridges moed from local library into the shelf (subject to isibility) (ProtecTIER GUI) On the DR site library cartridges in the I/E slots Cartridges moed into DR site Library, CV GUI Cartridges moed into DR site Library, PT GUI Selected cartridge exported successfully Selected cartridge located inside the DR site Library Robotic Barcode Reader Increase block size Enable Lock Name Disable compression, encryption and CRC Multipath Deice Multipath Deice B Load Balancing Use of Mirror Use media sets Lotus Domino Serer command line Backup Recoery and Media Serices Create Backup Control Group Entries Start Backup using BRM Select Recoery Items Change Backup Control Group Attributes Establishing replication and Failback Failback to local ProtecTIER Change Moe Policy Display Media Policy Change Backup Control Group Attributes Library Mode On left, production IBM i System. On right, DR site serer Verify media moe on DR site system Cartridge aailable in DR site system Example of redundant SAN fabric fibre channel configuration Example of a configuration with one FC switch Host type options Choose Host Type Add WWPNs Select a preset olume type iii IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

11 Tables 1. Planning tool added fields to support the replication feature Potential saings for ProtecTIER Replication users Vs. Gen.1 VTL users Aerage costs for different network bandwidth - Bandwidth, Capacities and Costs Netbackup user operation flow: Netbackup Serer/media Netbackup DR test simulation in two domain backup enironment Netbackup isibility switch option in two domain backup enironment Information about the import/export slots in a VTL library Stand-alone fibre channel connections Clustered fibre channel connections DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch DS4000 Settings DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch DS5000 Settings Interface module state chart Zoning Sizing Performance numbers Recommended DB2 settings TSM recommended DB2 settings Copyright IBM Corp. 2008, 2011 ix

12 x IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

13 Safety and enironmental notices This section contains information about safety notices that are used in this guide and enironmental notices for this product. Safety notices Obsere the safety notices when using this product. These safety notices contain danger and caution notices. These notices are sometimes accompanied by symbols that represent the seerity of the safety condition. Most danger or caution notices contain a reference number (Dxxx or Cxxx). Use the reference number to check the translation in the IBM Systems Safety Notices, G manual. The sections that follow define each type of safety notice and gie examples. Specific notices for the IBM System Storage TS7650 ProtecTIER De-duplication Appliance and IBM System Storage TS7650G ProtecTIER De-duplication Gateway are also included. Danger notice A danger notice calls attention to a situation that is potentially lethal or extremely hazardous to people. A lightning bolt symbol always accompanies a danger notice to represent a dangerous electrical condition. A sample danger notice follows: DANGER: An electrical outlet that is not correctly wired could place hazardous oltage on metal parts of the system or the deices that attach to the system. It is the responsibility of the customer to ensure that the outlet is correctly wired and grounded to preent an electrical shock. (D004) Caution notice A caution notice calls attention to a situation that is potentially hazardous to people because of some existing condition, or to a potentially dangerous situation that might deelop because of some unsafe practice. A caution notice can be accompanied by one of seeral symbols: If the symbol is... It means... A generally hazardous condition not represented by other safety symbols. This product contains a Class II laser. Do not stare into the beam. (C029) Laser symbols are always accompanied by the classification of the laser as defined by the U. S. Department of Health and Human Serices (for example, Class I, Class II, and so forth). Copyright IBM Corp. 2008, 2011 xi

14 If the symbol is... It means... A hazardous condition due to mechanical moement in or around the product. This part or unit is heay but has a weight smaller than 18 kg (39.7 lb). Use care when lifting, remoing, or installing this part or unit. (C008) Sample caution notices follow: Caution The battery is a lithium ion battery. To aoid possible explosion, do not burn. Exchange only with the IBM-approed part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call Hae the IBM part number for the battery unit aailable when you call. (C007) Caution The system contains circuit cards, assemblies, or both that contain lead solder. To aoid the release of lead (Pb) into the enironment, do not burn. Discard the circuit card as instructed by local regulations. (C014) Caution When remoing the Modular Refrigeration Unit (MRU), immediately remoe any oil residue from the MRU support shelf, floor, and any other area to preent injuries because of slips or falls. Do not use refrigerant lines or connectors to lift, moe, or remoe the MRU. Use handholds as instructed by serice procedures. (C016) Caution Do not connect an IBM control unit directly to a public optical network. The customer must use an additional connectiity deice between an IBM control unit optical adapter (that is, fibre, ESCON, FICON ) and an external public network. Use a deice such as a patch panel, a router, or a switch. You do not need an additional connectiity deice for optical fibre connectiity that does not pass through a public network. Danger notices for the TS7650 Appliance and TS7650G (Gateway) The following danger notices apply to the TS7650 Appliance and TS7650G (Gateway). xii IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

15 D001 DANGER To preent a possible shock from touching two surfaces with different protectie ground (earth), use one hand, when possible, to connect or disconnect signal cables. (D001) D002 DANGER Oerloading a branch circuit is potentially a fire hazard and a shock hazard under certain conditions. To aoid these hazards, ensure that your system electrical requirements do not exceed branch circuit protection requirements. Refer to the information that is proided with your deice or the power rating label for electrical specifications. (D002) D003 DANGER If the receptacle has a metal shell, do not touch the shell until you hae completed the oltage and grounding checks. Improper wiring or grounding could place dangerous oltage on the metal shell. If any of the conditions are not as described, STOP. Ensure the improper oltage or impedance conditions are corrected before proceeding. (D003) D004 DANGER An electrical outlet that is not correctly wired could place hazardous oltage on the metal parts of the system or the deices that attach to the system. It is the responsibility of the customer to ensure that the outlet is correctly wired and grounded to preent an electrical shock. (D004) Safety and enironmental notices xiii

16 D005 DANGER When working on or around the system, obsere the following precautions: Electrical oltage and current from power, telephone, and communication cables are hazardous. To aoid a shock hazard: Connect power to this unit only with the IBM proided power cord. Do not use the IBM proided power cord for any other product. Do not open or serice any power supply assembly. Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration of this product during an electrical storm. The product might be equipped with multiple power cords. To remoe all hazardous oltages, disconnect all power cords. Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet supplies proper oltage and phase rotation according to the system rating plate. Connect any equipment that will be attached to this product to properly wired outlets. When possible, use one hand only to connect or disconnect signal cables. Neer turn on any equipment when there is eidence of fire, water, or structural damage. Disconnect the attached power cords, telecommunications systems, networks, and modems before you open the deice coers, unless instructed otherwise in the installation and configuration procedures. Connect and disconnect cables as described in the following procedures when installing, moing, or opening coers on this product or attached deices. To disconnect: 1. Turn off eerything (unless instructed otherwise). 2. Remoe the power cords from the outlets. 3. Remoe the signal cables from the connectors. 4. Remoe all cables from the deices. To connect: 1. Turn off eerything (unless instructed otherwise). 2. Attach all cables to the deices. 3. Attach the signal cables to the connectors. 4. Attach the power cords to the outlets. 5. Turn on the deices. Sharp edges, corners and joints may be present in and around the system. Use care when handling equipment to aoid cuts, scrapes and pinching. (D005) xi IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

17 D006 DANGER Heay equipment personal injury or equipment damage might result if mishandled. (D006) D008 DANGER Professional moers are to be used for all relocation actiities. Serious injury or death may occur if systems are handled and moed incorrectly. (D008) Caution notices for the TS7650 Appliance and TS7650G (Gateway) The following caution notices apply to the TS7650 Appliance and TS7650G (Gateway). C001 CAUTION: Energy hazard present. Shorting might result in system outage and possible physical injury. Remoe all metallic jewelry before sericing. (C001) C002 CAUTION: Only trained serice personnel may replace this battery. The battery contains lithium. To aoid possible explosion, do not burn or charge the battery. Do not: Throw or immerse into water Heat to more than 100 C (212 F) Repair or disassemble Exchange only with the IBM-approed part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call Hae the IBM part number for the battery unit aailable when you call. (C002) C003 CAUTION: The battery contains lithium. To aoid possible explosion, do not burn or charge the battery. Do not: Throw or immerse into water Heat to more than 100 C (212 F) Repair or disassemble Exchange only with the IBM-approed part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call Hae the IBM part number for the battery unit aailable when you call. (C003) Safety and enironmental notices x

18 C005 CAUTION: The battery is a nickel-cadmium battery. To aoid possible explosion, do not burn. Exchange only with the IBM-approed part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call Hae the IBM part number for the battery unit aailable when you call. (C005) C007 CAUTION: The battery is a lithium ion battery. To aoid possible explosion, do not burn. Exchange only with the IBM-approed part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call Hae the IBM part number for the battery unit aailable when you call. (C007) C009 CAUTION: or >18 kg (39.7 lb) or kg ( lb) The weight of this part or unit is between 18 and 32 kg (39.7 and 70.5 lb). It takes two persons to safely lift this part or unit. (C009) C013 CAUTION: The doors and coers to the product are to be closed at all times except for serice by trained serice personnel. All coers must be replaced and doors locked at the conclusion of the serice operation. (C013) C014 CAUTION: The system contains circuit cards, assemblies, or both that contain lead solder. To aoid the release of lead (Pb) into the enironment, do not burn. Discard the circuit card as instructed by local regulations. (C014) C018 CAUTION: This product is equipped with a 3-wire (two conductors and ground) power cable and plug. Use this power cable with a properly grounded electrical outlet to aoid electrical shock. (C018) xi IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

19 C021 CAUTION: The power distribution outlets proide 200 to 240 V ac. Use these outlets only for deices that operate within this oltage range. (C021) C022 CAUTION: The product might be equipped with a hard-wired power cable. Ensure that a licensed electrician performs the installation per the national electrical code. (C022) C023 CAUTION: Ensure the building power circuit breakers are turned off BEFORE you connect the power cord or cords to the building power. (C023) C026 CAUTION: This product might contain one or more of the following deices: CD-ROM drie, DVD-ROM drie, DVD-RAM drie, or laser module, which are Class 1 laser products. Note the following information: Do not remoe the coers. Remoing the coers of the laser product could result in exposure to hazardous laser radiation. There are no sericeable parts inside the deice. Use of the controls or adjustments or performance of procedures other than those specified herein might result in hazardous radiation exposure. (C026) C027 CAUTION: Data processing enironments can contain equipment transmitting on system links with laser modules that operate at greater than Class 1 power leels. For this reason, neer look into the end of an optical fiber cable or open receptacle. (C027) C028 CAUTION: This product contains a Class 1M laser. Do not iew directly with optical instruments. (C028) C029 CAUTION: This product contains a Class 2 laser. Do not stare into the beam. (C029) Safety and enironmental notices xii

20 C030 CAUTION: Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following information: Laser radiation when open. Do not stare into the beam, do not iew directly with optical instruments, and aoid direct exposure to the beam. (C030) C031 CAUTION: The power-control button on the deice does not turn off the electrical current supplied to the deice. The deice might also hae more than one connection to dc power. To remoe all electrical current from the deice, ensure that all connections to dc power are disconnected at the dc power input terminals. (C031) C032 CAUTION: Sericing of this product or unit is to be performed by trained serice personnel only. (C032) C033 CAUTION: To reduce the risk of electric shock or energy hazards: This equipment must be installed by trained serice personnel in a restricted-access location, as defined by the NEC and IEC 60950, The Standard for Safety of Information Technology Equipment. Connect the equipment to a reliably grounded, safety extra low oltage (SELV) source. An SELV source is a secondary circuit that is designed so that normal and single fault conditions do not cause the oltages to exceed a safe leel (60 V direct current). The branch circuit oercurrent protection must be rated per the following table. Use copper wire conductor only, not exceeding 3 m (9.8 ft.) in length and sized according to the following table. Torque the wiring-terminal screws to the alues in the following table. Incorporate a readily aailable approed and rated disconnect deice in the field wiring. (C033) The following table appears in the product documentation with actual alues substituted for xxx: Circuit breaker rating Wire size Minimum: xxx amps Maximum: xxx amps xxx AWG xxx mm2 xiii IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

21 Wiring-terminal screw torque xxx inch-pounds xxx newton-meters R001 Part 1 of 2 Use the following general safety information for all rack-mounted deices: DANGER Obsere the following precautions when working on or around your IT rack system: Heay equipment personal injury or equipment damage might result if mishandled. Always lower the leeling pads on the rack cabinet. Always install stabilizer brackets on the rack cabinet. To aoid hazardous conditions due to uneen mechanical loading, always install the heaiest deices in the bottom of the rack cabinet. Always install serers and optional deices starting from the bottom of the rack cabinet. Rack-mounted deices are not to be used as sheles or work spaces. Do not place objects on top of rack-mounted deices. Each rack cabinet might hae more than one power cord. Be sure to disconnect all power cords in the rack cabinet when directed to disconnect power during sericing. Connect all deices installed in a rack cabinet to power deices installed in the same rack cabinet. Do not plug a power cord from a deice installed in one rack cabinet into a power deice installed in a different rack cabinet. An electrical outlet that is not correctly wired could place hazardous oltage on the metal parts of the system or the deices that attach to the system. It is the responsibility of the customer to ensure that the outlet is correctly wired and grounded to preent an electrical shock. (R001 part 1 of 2) Safety and enironmental notices xix

22 R001 Part 2 of 2 CAUTION: Do not install a unit in a rack where the internal rack ambient temperatures will exceed the manufacturer's recommended ambient temperature for all your rack-mounted deices. Do not install a unit in a rack where the air flow is compromised. Ensure that air flow is not blocked or reduced on any side, front, or back of a unit used for air flow through the unit. Consideration should be gien to the connection of the equipment to the supply circuit so that oerloading of the circuits does not compromise the supply wiring or oercurrent protection. To proide the correct power connection to a rack, refer to the rating labels located on the equipment in the rack to determine the total power requirement of the supply circuit. (For sliding drawers): Do not pull out or install any drawer or feature if the rack stabilizer brackets are not attached to the rack. Do not pull out more than one drawer at a time. The rack might become unstable if you pull out more than one drawer at a time. (For fixed drawers): This drawer is a fixed drawer and must not be moed for sericing unless specified by the manufacturer. Attempting to moe the drawer partially or completely out of the rack might cause the rack to become unstable or cause the drawer to fall out of the rack. (R001 part 2 of 2) xx IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

23 R002 CAUTION: Remoing components from the upper positions in the rack cabinet improes rack stability during relocation. Follow these general guidelines wheneer you relocate a populated rack cabinet within a room or building: Reduce the weight of the rack cabinet by remoing equipment starting at the top of the rack cabinet. When possible, restore the rack cabinet to the configuration of the rack cabinet as you receied it. If this configuration is not known, you must obsere the following precautions: Remoe all deices in the 32U position and aboe. Ensure that the heaiest deices are installed in the bottom of the rack cabinet. Ensure that there are no empty U-leels between deices installed in the rack cabinet below the 32U leel. If the rack cabinet you are relocating is part of a suite of rack cabinets, detach the rack cabinet from the suite. Inspect the route that you plan to take to eliminate potential hazards. Verify that the route that you choose can support the weight of the loaded rack cabinet. Refer to the documentation that comes with your rack cabinet for the weight of a loaded rack cabinet. Verify that all door openings are at least 760 x 230 mm (30 x 80 in.). Ensure that all deices, sheles, drawers, doors, and cables are secure. Ensure that the four leeling pads are raised to their highest position. Ensure that there is no stabilizer bracket installed on the rack cabinet during moement. Do not use a ramp inclined at more than 10 degrees. When the rack cabinet is in the new location, complete the following steps: Lower the four leeling pads. Install stabilizer brackets on the rack cabinet. If you remoed any deices from the rack cabinet, repopulate the rack cabinet from the lowest position to the highest position. If a long-distance relocation is required, restore the rack cabinet to the configuration of the rack cabinet as you receied it. Pack the rack cabinet in the original packaging material, or equialent. Also lower the leeling pads to raise the casters off of the pallet and bolt the rack cabinet to the pallet. (R002) Labels for the TS7650 Appliance and TS7650G (Gateway) The following safety labels apply to the TS7650 Appliance and TS7650G (Gateway). Safety and enironmental notices xxi

24 L001 DANGER Hazardous oltage, current, or energy leels are present inside any component that has this label attached. Do not open any coer or barrier that contains this label. (L001) L002 DANGER Rack-mounted deices are not to be used as sheles or work spaces. (L002) L003 DANGER Multiple power cords. The product might be equipped with multiple power cords. To remoe all hazardous oltages, disconnect all power cords. (L003) 1 2 or xxii IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

25 ! 1 2 or L004 DANGER Hazardous oltage present. Voltages present constitute a shock hazard, which can cause seere injury or death. (L004) Safety and enironmental notices xxiii

L005 CAUTION: Hazardous energy present. Voltages with hazardous energy might cause heating when shorted with metal, which might result in splattered metal, burns, or both.

26 L005 CAUTION: Hazardous energy present. Voltages with hazardous energy might cause heating when shorted with metal, which might result in splattered metal, burns, or both. (L005) >240VA L009 CAUTION: System or part is heay. The label is accompanied by a specific weight range. (L009) L013 DANGER Heay equipment personal injury or equipment damage might result if mishandled. (L013) Enironmental notices The enironmental notices that apply to this product are proided in the Enironmental Notices and User Guide, Z xx manual. A copy of this manual is located on the publications CD. xxi IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

27 About this document This document proides best practice guidance for planning, installing and configuring the replication feature of IBM's TS7600 ProtecTIER family of products. This guide is designed to proide expertise gained from both IBM's ProtecTIER Field Technical Sales Support (FTSS/CSS) Group and the deelopment/quality Assurance teams. Topics that are coered include: Bandwidth requirements Network planning and configuring Deployment of new systems Add replication to an existing production ProtecTIER system Backup application configuration Disaster Recoery (DR) practices As an adjunct to the core product documentation as well as the TS7600 ProtecTIER Training sessions and webinars, this document proides additional information and tips on topics outside the box that are required for a successful TS7600 deployment with Natie Replication in arious real user's enironments. Who should read this document This publication is intended for IBM FTSS/CSS personnel, as well as adanced ProtecTIER users, solution architects, and trained ProtecTIER Specialists. Terminology TS7650 Appliance Terminology TS7650 When used alone, this term signifies IBM's family of irtualization solutions that operate on the ProtecTIER platform. TS7650 Appliance or appliance These are terms for IBM's self-contained irtualization solution from the TS7650 family that includes a disk storage repository. The TS7650 Appliance consists of the following: Serer The 3958 AP1 serer is based on the IBM System x3850 X5 Type 7145-AC1 at the ProtecTIER ersion 2.5 release. When used as a serer in the TS7650 Appliance, its machine type and model are 3958 AP1. Use this machine type and model for serice purposes. System console The system console is a TS3000 System Console (TSSC). This document uses the terms system console and TSSC interchangeably. Disk controller The disk controller for the TS7650 Appliance is an IBM Feature Code 3708: 4.8 TB Fibre Channel Disk Controller. Use this feature code for serice purposes. Disk expansion unit The disk expansion unit for the TS7650 Appliance is an IBM Copyright IBM Corp. 2008, 2011 xx

28 Feature Code 3707: 4.8 TB Fibre Channel Disk Expansion Unit. Use this feature code for serice purposes. TS7650 Gateway Terminology TS7650G or Gateway These are terms for IBM's irtualization solution from the TS7650 family that does not include a disk storage repository, allowing the customer to choose from a ariety of storage options. IBM does not support more than one clustered pair of TS7650 Gateway serers in a single frame. The TS7650G consists of the following: Serer There are three types of serer that hae been used in the Gateway: 3958 DD4 This is a newer, higher performance serer aailable in December This serer is based on the IBM System x3850 X5 Type 7145-AC1. When used as a serer in the TS7650G, its machine type and model are 3958 DD4. Use this machine type and model for serice purposes DD3 This is a higher performance serer aailable in March This serer is based on the IBM System x3850 M2 Type When used as a serer in the TS7650G, its machine type and model are 3958 DD3. Use this machine type and model for serice purposes DD1 This is the original serer aailable in August This serer is based on the IBM System x3850 M2 Type When used as a serer in the TS7650G, its machine type and model are 3958 DD1. Use this machine type and model for serice purposes. System console The system console is a TS3000 System Console (TSSC). This document uses the terms system console and TSSC interchangeably. Under IBM best practices, the TS7650G also contains the following: Disk controller The customer must choose the disk controller for use with the TS7650G. A list of compatible controllers is located at the IBM Tape Systems Resource Library website: systems/storage/tape/library.html#compatibility in the TS7650/TS7650G ISV and interoperability matrix document. Disk expansion unit The customer must choose the disk expansion unit for use with the TS7650G. A list of compatible expansion units is located at the IBM Tape Systems Resource Library website: systems/storage/tape/library.html#compatibility in the TS7650/TS7650G ISV and interoperability matrix document. Natie Replication Terminology OpenStorage OpenStorage allows ProtecTIER to be integrated with NetBackup to xxi IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

29 proide the means for backup-to-disk without using a irtual tape library (VTL) emulation. Using a plug-in that is installed on an OpenStorage-enabled media serer, ProtecTIER can implement a communication protocol that supports data transfer and control between the backup serer and the ProtecTIER serer. Therefore, to support the plug-in, ProtecTIER implements a storage serer emulation. replication A process that transfers logical objects like cartridges from one ProtecTIER repository to another. The replication function allows ProtecTIER deployment to be distributed across sites. Each site has a single or clustered ProtecTIER enironment. Each ProtecTIER enironment has at least one ProtecTIER serer. The ProtecTIER serer that is a part of the replication grid has two dedicated replication ports that are used for replication. Replication ports are connected to the customer's WAN and are configured on two subnets as default. replication grid A set of repositories that share a common ID and can potentially transmit and receie logical objects through replication. A replication grid defines a set of ProtecTIER repositories and actions between them and is configured using the ProtecTIER Replication Manager. The ProtecTIER Replication Manager is a software component that is installed on a ProtecTIER serer or a dedicated host. The ProtecTIER Replication Manager should be able to recognize all the members of the entire network the ProtecTIER Replication Manager handles on both replication subnets. The ProtecTIER Replication Manager is deployed separately from the ProtecTIER Manager on the customer's ProtecTIER serer. The ProtecTIER Replication Manager manages the configuration of multiple replication grids in an organization. An agent on eery node in each ProtecTIER serer interacts with the serer and maintains a table of its grid members. replication grid ID A number from 0 to 63 that identifies a replication grid within an organization. replication grid member A repository that is a member in a replication grid. replication pairs Two repositories within a replication grid that replicate from one to another. replication policy A policy made up of rules that define a set of objects (for example, VTL cartridges) from a source repository to be replicated to a target repository. repository unique ID (RID) A number that uniquely identifies the repository. The RID is created from the replication grid ID and the repository internal ID in the grid. replication timeframe A scheduled period of time for replication to take place for all policies. shelf A container of VTL cartridges within a ProtecTIER repository. irtual tape library (VTL) The ProtecTIER irtual tape library (VTL) serice emulates traditional tape libraries. By emulating tape libraries, ProtecTIER VTL enables you to transition to disk backup without haing to replace your entire backup enironment. Your existing backup application can access irtual robots to About this document xxii

30 Related publications moe irtual cartridges between irtual slots and dries. The backup application perceies that the data is being stored on cartridges while ProtecTIER actually stores data on a deduplicated disk repository. isibility switching The automated process that transfers the isibility of a VTL cartridge from its master to its replica and ice ersa. The isibility switching process is triggered by moing a cartridge to the source library Import/Export (I/E) slot. The cartridge will then disappear from the I/E slot and appear at the destination library's I/E slot. To moe the cartridge back to the source library, the cartridge must be ejected to the shelf from the destination library. The cartridge will then disappear from the destination library and reappear at the source I/E slot. The following documents proide information about the TS7650 Appliance and TS7650G (Gateway) components and related hardware. Publications common to both the TS7650 Appliance and TS7650G IBM System Storage ProtecTIER User's Guide for Enterprise Edition and Appliance Edition, IBM form number GC IBM System Storage TS7600 with ProtecTIER Problem Determination Guide for the TS7650 Appliance and TS7650G (Gateway), IBM form number GC IBM System Storage ProtecTIER Software Upgrade and Replication Enablement Guide, IBM form number GC IBM(r) System Storage(tm) TS7600 with ProtecTIER Installation Instructions for the RAS Package, BIOS, and Firmware updates following a FRU replacement for models 3958 DD1, 3958 DD3, and 3958 AP1, IBM(r) part number 46X6059 IBM(r) System Storage(tm) TS7600 with ProtecTIER Labeling Instructions for the TS7650/TS7650G (3958 DD1, 3958 DD3, 3958 AP1), IBM(r) part number 46X6059 TS7650 Appliance-only publications IBM(r) System Storage(tm) TS7600 with ProtecTIER Introduction and Planning Guide for the TS7650 (3958 AP1),IBM(r) form number GC IBM(r) System Storage(tm) TS7600 with ProtecTIER Installation Roadmap Guide for the TS7650G (3958 AP1),IBM(r) form number GC TS7650G-only publications IBM(r) System Storage(tm) TS7600 with ProtecTIER Introduction and Planning Guide for the TS7650G (3958 DD3)),IBM(r) form number GC IBM(r) System Storage(tm) TS7600 with ProtecTIER Installation Roadmap Guide for the TS7650G (3958 DD3),IBM(r) form number GC Appliance Serer (3958 AP1) publications IBM(r) System x3852 M2 and System x3950 M2 Type 7141 and 7233 User's Guide Remote Superisor Adaptor publications Remote Superisor Adapter II Slimline and Remote Superisor Adapter II Installation Guide xxiii IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

31 Remote Superisor Adapter II Slimline and Remote Superisor Adapter II User's Guide About this document xxix

32 xxx IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

33 Part 1. ProtecTIER 2.5, Natie Replication Best Practices Copyright IBM Corp. 2008,

34 2 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

35 Chapter 1. Replication operational oeriew ProtecTIER with replication enables irtual tape cartridges to be replicated from multiple primary sites (spokes) to a central secondary location (Hub) for enhanced disaster recoery (DR) and business continuity (BC) capabilities. By eliminating the need to transport physical tape cartridges, data can be recoered faster and more reliably, enabling users to get back online more rapidly in the eent of a disaster or major system outage. The dramatic reduction in the required network bandwidth between the primary and secondary sites enabled by ProtecTIER's deduplication technology radically reduces the costs associated with electronically transmitting data to a DR site location for disaster recoery purposes. By dramatically reducing costs, ProtecTIER with replication enables IT organizations to easily expand the coerage of replication to all of the applications in their enironment, as opposed to deploying replication for only a select few applications and tolerating significant risks of data loss and slow recoery for most other applications. Figure 1 graphically demonstrates the prohibitie costs of traditional replication, and contrasts this to the increased leel of DR protection enabled by ProtecTIER. This figure represents a generic IT enironment in which the cost to protect 30% of their data with traditional replication is equal to the cost of coering 100% of the data with ProtecTIER replication. Copyright IBM Corp. 2008,

36 $$$$ Traditional Replication (No Deduplication) Cost of DR Operation ProtecTIER Replication $ 30% 100% % of enironment protected by replication for DR Figure 1. Relationship between Cost and % of Enironment Protected Product functionality TS7650 ProtecTIER Replication is a logical feature that enables users to replicate cartridges or tapes from up to 12 primary site (spokes), to a DR location (hub). The model for this multi-site operation is similar in nature to the preious er (2.3) 1-1 natie replication, in other words it behaes like replication pairs in one replication grid.. The user may create policies at the cartridge leel to replicate a single or a range of barcodes at a certain time-frame and with a set priority. Once replicated to the DR site, users may choose to clone these cartridges to real physical tape utilizing their backup application. In the eent of a disaster the DR site's TS7650 ProtecTIER can become the production site until the failed primary site comes back on-line. At that point the user may replicate or moe the newly created tapes back to the respectie main production site. In case of a permanent failure at any of the protected primary sites (spokes), the DR site (hub) can take oer and permanently replace that failed repository. How it works Replication is an additional feature built into the VTL so users can pick and choose some or all of their cartridges to be replicated to the DR site. Since ProtecTIER duplicates data before storing it, only the changes, or unique data, are transferred to the DR site oer the replication link. This translates into substantial saings in the bandwidth needed for the replication link. Data transfer is started based on 4 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

seeral trigger points such as policy based transfer windows, or moement of the irtual tape to a VTL export slot (the VTL emulates import and export slots, as well as opening and closing the library

The illustration below is an example of a deployment for two systems using replication.

Gateway Physical capacity Physical capacity Represented capacity ProtecTIER Replication requires significantly less bandwidth compared to array based replication due to de-duplication Virtual

37 seeral trigger points such as policy based transfer windows, or moement of the irtual tape to a VTL export slot (the VTL emulates import and export slots, as well as opening and closing the library door to insert or eject a cartridge). Data erification and alidation is done at the DR site to ensure integrity of the transferred data prior to making the irtual cartridge/tape aailable. The illustration below is an example of a deployment for two systems using replication. Primary Site In case of a disaster the DRsite B/U serer can become the Primary Secondary (DR) Site Backup Serer Backup Serer IP based NR link Backup Serer ProtecTIER Gateway ProtecTIER Clustered Gateway Physical capacity Physical capacity Represented capacity ProtecTIER Replication requires significantly less bandwidth compared to array based replication due to de-duplication Virtual cartridges can be cloned to tape by the DR B/U serer Tape library Figure 2. Potential deployment of hub and spoke replication configuration HyperFactor, deduplication and bandwidth saings Replication features HyperFactor deduplicates inline data which is receied from backup applications. The cornerstone of ProtecTIER is HyperFactor, IBM's technology that deduplicates data inline as it is receied from the backup application. ProtecTIER's bandwidth efficient replication, inline performance and scalability directly stem from the technological breakthroughs inherent to HyperFactor. HyperFactor is based on a series of algorithms that identify and filter out the elements of a data stream that hae preiously been stored by ProtecTIER. Oer time, HyperFactor can increase the usable capacity of a gien amount of physical storage by up to 25 times or more. With Replication, the data reduction alue of HyperFactor is extended to bandwidth saings and storage saings for the DR operation. These performance and scalability attributes are critical for the DR operation in addition to the primary site data protection operation. ProtecTIER's multi-site natie replication is designed to proide the user with great flexibility, to work seamlessly with any of the leading backup applications, and to fit easily within the oerall tape operations paradigm. Chapter 1. Replication operational oeriew 5

38 ProtecTIER's natie replication is designed to proide the user with great flexibility, to work seamlessly with any of the leading backup applications, and to fit easily within the oerall tape operations paradigm. Replication is policy based, allowing users to define seeral different policies for replicating cartridges from one system to another. The granularity of the replication policies allows users to set policies for an indiidual cartridge, a pool of cartridges, or an entire irtual tape library. Replication is performed asynchronously at the logical cartridge leel, and replication progress can be tracked and monitored at the cartridge leel through the ProtecTIER management GUI. Full data alidation is performed at the target (hub) site to ensure enterprise-class integrity of the transferred data prior to making the irtual cartridge aailable. A few of the key features and design points that demonstrate the flexibility and synergy with the tape operations paradigm include: Virtual tape management framework As a target of the backup application, a ProtecTIER system presents itself as a tape library (or many libraries) to the network. The backup application manages the cartridges within a ProtecTIER system as if they were real cartridges, including read, write, import/export, tracking media with barcodes, and many other operations. Because replication at the ProtecTIER leel is transparent to the backup application, ProtecTIER's replication function is designed to allow synchronization with the backup application by way of normal tape management methodologies. Visibility control This ProtecTIER feature dictates where cartridges actually exist, since from a backup application standpoint, any specific cartridge can exist in only one location at any gien time. ProtecTIER Replication utilizes a irtual shelf that is isible only at the ProtecTIER leel (once exported by the backup application), and proides more flexibility in managing cartridges and where they are kept similar to keeping physical tapes on an actual shelf outside of the tape library. To ensure that a gien cartridge is only isible to the backup application in one location, despite the fact that a replica exists, ProtecTIER offers the isibility control function that allows the user to determine in which location the cartridge should be accessible to the backup application. This is achieed by utilizing the import/export slots of the irtual libraries and exactly mimics the operation of physical tape management. Cartridge cloning and isibility control A common use case for ProtecTIER users is to first replicate data from the primary site to the secondary site, and then to moe the data from the disk-based repository onto physical tape cartridges for long term retention. With the isibility control function, ProtecTIER makes this operation simple for users. Once the cartridges complete replication to the secondary (DR) site, users may clone these cartridges to physical tape using their backup application tape copy function. This allows the backup application to remain in control of the end-to-end process, and maintain its catalog of all cartridges, the associated data, and the location. Policy management Replication policies allow for flexibility in controlling replication, and enable a high degree of automation. Policies consist of rules which dictate the transfer of data 6 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

39 from one repository to another, based on eents such as writing new data to an existing cartridge that belongs to a policy. By setting policies for ranges of barcodes, users may implement differing degrees of protection for different applications. Users can also assign arious priority leels to replication policies, which determine the order in which the data is transferred across the network. Replication-network management ProtecTIER repositories belong to a replication grid, which is a framework for managing and monitoring the replication actiities between ProtecTIER systems. Through ProtecTIER Manager, the user can monitor the status of the oerall replication-network (grid), the relationship between grid members and the data throughput rate of replication. Further statistics on the cartridges inoled in replication policies, as well as the statistics of each repository, are aailable through the ProtecTIER's GUI and through the extensie ptcli command interface to enable ease of management and use. Recoery management When the need to failoer to the DR site arises, whether due to a full disaster or a lower leel disruption, the ProtecTIER system enables rapid recoery of the data and restoration of the production applications such that business operations can continue with minimal downtime. ProtecTIER is designed to enable rapid recoery of data from cartridges using the media serer at the DR site site. Once data is restored, the ProtecTIER system can be brought online as the production system and be used for backup and recoery until the failed primary site is restored. In a multiple site configurations, during that time the DR system (hub) continues to be the target for all other spokes and for a local backup operation if /when exists. In configurations with one replication pair (one-to-one), during the time period that the disaster recoery site acts as the primary production site, its data can be protected by replicating it to another secondary site. ProtecTIER Manager should be used to pair the DR site (now acting as primary) to the other secondary site. (See more details in Chapter 8 of the IBM System Storage ProtecTIER User Guide for Enterprise Edition and Appliance Edition). In all configurations (one-to-one or many-to-one), when the original primary site is ready and back online, the user performs a failback operation to moe production actiity back to the primary site. At that time the user may replicate any new data at the secondary site back to the primary and return the primary to its original status. All of these steps are enabled through the user interface. Chapter 1. Replication operational oeriew 7

40 8 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

41 Chapter 2. Natie replication normal operation concepts The Natie replication normal operation concept describes a ProtecTIER operation sequence when a replication is inoked. Replication Whether deploying replication in a one-to-one configuration or a Many-to-1 configuration, it is strongly recommended to use the latest 2.5 code in order to take adantage of many new replication features and enhancements. When the backup application is writing to a cartridge that is part of a replication policy, ProtecTIER checks when it needs to be replicated and what is its priority so it can be put in the right order in the replication queue. Cartridges are always being replicated from the local library at the primary site to the irtual shelf of the repository at the secondary (DR) site. By definition, cartridges that are created in the primary (local) site repository are set to be read-write enabled, so the backup application at the primary site has full control of them and their content. By the same token cartridges that were replicated to the secondary (DR) site are set to be in a read-only mode. By default, only one cartridge instance is located in a library, the replica is located on the irtual shelf. Note: At any time, through ProtecTIER Manager, the user can oerride the default location of any gien cartridge, and manually moe the replica from the irtual shelf to a library in the secondary site repository. The ProtecTIER system uses a dirty-bit feature/technology and cartridges are marked as in-sync once the data finishes replicating from the primary to the secondary site, so that at the time of sync, the local cartridges and their DR site replicas are identical. Up to 128 cartridges can be replicated simultaneously. Before replication, the system ensures that only unique, new data will be transferred oer the wire. In order to achiee that, each side holds sync data per each of its cartridges. This sync data is used by the destination (secondary, DR site) to figure out which data (if any) should be replicated as only new and unique data is being sent oer the wire to the DR site. The replication mechanism has two types of data to transfer: meta data which is the data that describes the actual data and carries all the information about it, and user data which is the actual backed-up data. Note: Network failures, if and when they occur while the replication operation is being carried out, lead to retries of up to 7 consecutie days to replicate any specific cartridge that didn't finish its replication due to the line failure. Many-to-one replication new and enhanced features ProtecTIER Many-to-one replication supports both single node and clustered ProtecTIER configurations within all platforms, Appliances and Gateway. In addition to capabilities aailable in V2.3 one-to-one replication, it proides the following functionality and support: Note: The following features are described in depth in their sections throughout this document. Copyright IBM Corp. 2008,

42 Replication data-transfer Setting replication performance limits: This feature allows the user to set system-wide physical and/or nominal limits in order to indicate to the replication engine the maximum replication transfer rate allowed in the network for a specific repository. The performance limits refer to oerall resource consumption of the system, and is reflected on to the network transfer rate. Enhancements to Replication Rate Control mechanism: This feature is an enhancement to the current mechanism. Current Replication Rate Control (RRC) is used when a user does not proide a time frame and therefore the system replicated continuously. The enhancement is in the rate calculation. The rate calculation uses the performance limits described preiously to understand the maximum rate possible in both leels of system usage (IDLE, BUSY) and then it normalizes the rate accordingly. Resering local-backup-only space for hub repository: This feature proides the ability to exclusiely assign a fragment of a hub repository's capacity for local backups. In large deployments with many spokes replicating, capacity management done by the user might cause a situation where replication is trying to occupy all the space in hub repository. Since the assumption is that backup has precedence oer replication, this feature was added to ensure that capacity is resered only for local backup, so that replication cannot be written to this storage fragment. Error notifications appear in the eent the capacity resered for the local backup or the capacity resered for replication on the repository hub is running out of space. Enhanced monitoring of repository space consumption : This enhancement is displayed in the GUI as a nominal date pie, where a user can get at-a-glance the proportion of the nominal data out of the repository and the internal capacity distribution of replication data s. local backup data and free space. Additional replication information in repository's cartridge iew: This feature adds cartridge replication properties such as: last update time, destination, etc. to a new replication iew that adds replication properties cartridges which are also displayed in the regular cartridge iew. This feature allows the user to see replication properties for a batch of cartridges each in a single row. Sorting capability for the new replication fields was added, to stay on par with the sorting capabilities for the rest of the cartridge fields. Disaster Recoery site replaces production site operation, Take-oer: This feature supports a scenario where a disaster recoery site (the hub) is chosen to replace one of its spokes permanently. The feature allows the user to take oer the responsibility for the cartridges of the old spoke after running the replace repository wizard from the Replication Manager. Enhanced Timeframe support, weekly scheduler: This feature enhances the replication timeframe capability to be a weekly scheduler on a repository leel. The timeframe is a set of 30 minute time interals across a full week. The user chooses at which time interal the replication should run during a single week. Enhanced Command Line Interface (ptcli) for Disaster Recoery: This feature adds a Command Line Interface (CLI) that allows a user to run complex queries on cartridges and proides cartridge batch moe operation during a manual DR procedure. The feature allows the user to query ProtecTIER in order to obtain a list of cartridges that falls under specific criteria. The query criteria are cartridge replication properties and cartridge location properties. The replication is started either manually or based on a policy, and carries out data-transfers in a ariety of ways. 10 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

43 Visibility switch control When the replication action is started either manually or based on a policy, the source (primary) ProtecTIER system carries out the following procedures: Initiates the sync-cartridge function between its own (source - primary site) and the destination (DR site) repositories. Reads the unique Replication Data Units upon requests from the DR Site ProtecTIER system based on what it is missing. Sends the unique data, using TCP protocol, oer the WAN to the DR site. At the same time the destination ProtecTIER system performs the following hand shake actions in this order: Calculates the releant cartridges' sync point, from which the replication should start. Receies many Data Units concurrently as part of the replication action. Verifies CRC for all replicated data before it becomes aailable as part of the cartridge. Once the CRC check proes successful, the system moes each of the erified data elements into the cartridge scope and makes it aailable for the user. Once all erified pieces are inserted into the cartridge it becomes aailable as a complete cartridge, howeer, as a replica in the destination (DR) repository it is set as read only and cannot be used for backup purposes. This is an important factor under failoer situations when the DR system may temporarily become the production site and can accept local backups for the failed system. At that time the user should create new tapes to accept this local backed-up data. These new local DR site tapes can be replicated to the primary site once it becomes aailable and ready for production again. During the failback process, which is when the user moes operations back to the primary site, the newly created cartridges from the DR site can be replicated to the primary site. Under these circumstances the system will grant read & write permissions to these replicated cartridges at the primary site which becomes the owner of these tapes from that point on, just as if they were created there. Note: The replication data-transfer process requires that the replicated cartridge reside at the DR site ProtecTIER VTL shelf and as long as the data transfer is being carried-out these cartridges cannot be moed from the shelf. An automated process of the ProtecTIER replication feature that transfers the isibility of a cartridge from its master to its replica and ice-ersa. This is an automated process of the ProtecTIER replication feature that transfers the isibility of a cartridge from its master to its replica and ice-ersa. Just like a real physical tape, a irtual cartridge can only reside in or be isible to the backup application in one location at a time. This is carried out by the ProtecTIER replication by utilizing the VTL Import/Export slots. The system uses the Export slots and eject operation as triggers to begin processing a tape moe. The backup application can eject the cartridge into one of the Export slots. As soon as that cartridge replication action is completed, the cartridge will appear at one of the Import slots of the secondary (Hub, DR) site and can be imported into a library. Replication isibility is an attribute defined in a replication policy and can be set by the user. A specifically defined library in the DR site repository must be selected for the isibility transfer to be carried out. Once the cartridge is ejected at the primary (local) site, it moes to the local irtual shelf. Then it will be replicated as part of a policy to the DR site irtual shelf. Once the replication is completed and Chapter 2. Natie replication normal operation concepts 11

44 Backup application catalog erified, that replica moes to the respectie library's Import slot and can be imported into that library. At this point that cartridge is only isible to the backup application at the DR site. A copy of the cartridge stays at the primary ProtecTIER system for fast recoery; howeer, it is hidden from the backup application. The way to moe the cartridge back and make it isible to the backup application at the primary site is to eject it from the DR site library and import it back into the local one (the same library it came from). Since the system keeps a hidden copy at the primary site, this moe back is instantaneous. The backup application catalog proides catalog/db entries for cartridges which are used for backup. The backup application catalog/db has an entry for each cartridge used for backup. These entries include: Date when backup was performed List of files associated with the backup Retention period Other backup application specific information The backup application supports one catalog/db per backup serer instance. In many cases the primary and DR sites will hae two separate backup serers, each with its own DB or catalog. To efficiently read replicated cartridges at the DR site, the DR site backup serer needs access to the actual catalog or DB of the primary backup serer or an exact copy of it. There are two basic backup enironment topologies: A single domain backup enironment shares the same catalog across the primary and DR sites; therefore its entries are isible for both serers at all times. This approach will work with backup applications such as Symantec NetBackup, but will not work for IBM Tioli Storage Manager. Multiple domain enironments requires the user to recoer the DR backup serer using a copy of the catalog/db that matches the replicated repository cartridges set, prior to restoring their replicated data from the ProtecTIER system at the DR site. Remote Cloning is the process of using a secondary (DR) site in order to clone cartridges. ProtecTIER replication enables users to offload tape cloning to their secondary site. Many users are replicating their data from the primary site to the secondary (DR) site, and then moing it from the disk-based repository onto physical tape cartridges for long term retention. One of the adantages of performing this practice at the secondary site is to take the burden of cloning to physical tape from the production enironment to the DR site location, which is where tapes will most likely be kept once cloned. The DR site cloning operation uses the cartridge replicas residing at the destination's ProtecTIER VTL shelf for cloning. The process imitates the commonly used physical process inoling the transportation of physical cartridges from the primary site to a DR site either after being cloned or in order to clone them there for long-term archial purposes. This feature is effectie in single domain backup deployments because in these enironments the backup application serers at both sites share the same catalog and can be connected to the ProtecTIER systems concurrently. Within these enironments the replication isibility switch control feature should be utilized and 12 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

45 the cartridges to be cloned should be moed from the primary repository to the secondary repository and then be cloned to physical tapes. Since in a single domain enironment the catalog is shared across the two sites, the backup application is fully aware of the whereabouts and status of the cloned cartridges. See a detailed discussion on this topic in Chapter 18, Deploying replication with specific back up applications, on page 187. Single-domain and multiple-domain backup application enironments A backup enironment can be set in either a single domain or multiple domain. About this task Any Backup enironment can be set up in many topologies. From ProtecTIER's replication standpoint there are two general set up methods for these enironments: single domain or multiple domains. In a single domain enironment the same backup application catalog (or database) is shared across the separate primary and secondary sites. In these enironments the catalog is always updated in real time on the whereabouts of the cartridges (physical and irtual). For example, this type of enironment is more commonly used with Symantec NetBackup (NBU), howeer will not work with most deployments of IBM Tioli Storage Manager (TSM). A multiple domain approach is more widely used; it is the concept where the backup application does not share the same catalog between the primary (local) and secondary (DR) sites. This is the most common scenario with TSM enironments. In this type of enironment each of the backup serers in both the primary and secondary (DR) locations will hae its own backup catalog. The following characteristics are typical of a multiple domain enironment: The backup serer at the primary site manages the backup actiity into the local ProtecTIER libraries. An independent local backup serer at the secondary remote location will be used in case of a disaster (when the primary site is lost) to recoer the replicated cartridges and potentially continue, temporarily, the production backup actiity. In this type of enironment the general steps to recoer from a disaster will include: Procedure 1. The secondary/dr (hub) site backup serer should be recoered first with the catalog and/or DB from the primary site. 2. Once up and running this backup serer should be used to restore data from replicated cartridges. It may also resume regular backup and restore production actiity using new irtual cartridges. 3. During the time frame in which the production actiity is conducted at the (DR) site, new backups will register with the backup catalog and or DB. 4. Once the primary site system is recoered and is ready to become the production system again, the failback operation should be inoked. This will replicate back all needed data (including newly created cartridges) to the primary location allowing for synchronization between the two ProtecTIER repositories. It is important to synchronize the backup application catalog during this failback action to complete the recoery operation. Chapter 2. Natie replication normal operation concepts 13

46 For more information, see Chapter 18, Deploying replication with specific back up applications, on page 187. Consideration for DR site tape cloning in a single domain enironment Policies The majority of restore requests use the most recent backup tapes. About this task More than 90% of restore requests use the most recent backup tapes. Best practice for users who will use the DR site physical tape cloning option in their operation should follow this procedure: Procedure 1. Backup to local, primary site ProtecTIER system. 2. Replicate to the secondary, (DR) site ProtecTIER while leaing the cartridge isibility at the primary site days after replication has completed, change the isibility of the releant cartridges from the primary, local site to the secondary, DR site 4. Vault/clone DR ProtecTIER tape copy to physical tape. 5. Change the isibility of these cartridges back to local/primary ProtecTIER. This practice improes the probability of a irtual tape being aailable at the primary site when it is most likely to be used in support of a tactical restore at the primary location. Replication policies are a defined set of cartridges from a local repository that need to be replicated to a DR site repository, and establishes the rules for when and how that replication will take place. A replication policy defines a set of cartridges from a local repository that need to be replicated to a DR site repository, and establishes the rules for when and how that replication will take place. When an eent matching a replication policy rule occurs, such as data on a cartridge is changing, or the user ejected a cartridge into an I/E slot, a trigger is created for replication actiity to take place. Replication policies are defined ia the Systems Management iew of ProtecTIER Manager. A policy can only be created on a repository that is the source in a replication grid and the policy only applies to the repository on which it is defined. When creating a replication policy, the following parameters are defined: User object, such as cartridges. Daily/weekly time frame in which the policy will take place (the replication window). Replication destination, with isibility change (specific library), or without isibility change (irtual shelf) at the DR location. The priority - each policy can be set with 3 priority leels, low, medium and high. Users can set one policy with low priority and another with medium or high. The system default is set to low. 14 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

47 By default, the destination of the replication is the irtual shelf at the DR repository. If the isibility control switch is enabled, the destination is defined as a specified target library. This means that if you eject a cartridge that belongs to a policy with the enabled isibility switch, the cartridge is moed to the shelf of the local repository, and at the destination repository, once replicated, the cartridge is placed in the import/export slots of the specified destination library. The user should create a single policy or a small number of them for any single library for ease of management. It may be easier to manage one policy per library rather than many policies to reduce maintenance burden and to preent human errors. Enabling and disabling a policy A successfully created policy is enabled by default. This means that all incoming replication eents will apply their rules onto the policy's definition. The user may choose to disable a policy at any time. When a policy is disabled, all incoming replication eents will ignore the policy from the moment it is disabled. This does not affect current running and pending actiities. Any disabled policy can be enabled from within ProtecTIER Manager at any time as well. This does not affect current running actiities. Running a policy Policies can be run either manually or automatically (such as, during the time frames set). In the automatic mode, wheneer replication eents are receied, policies run continuously, whether cartridges are being written to by the backup application, ejected from a library (if isibility switching is actiated), or unloaded from a drie. These policies start the actual replication actiity during the time frame set by the user when created. Manually run policies create replication jobs for all the alid cartridges included in their list, whether or not they need to be replicated. With both modes of operation, running a policy leads to lining up replication jobs in their respectie priority queues where they wait for resources and the replication timeframe to start replicating. Policies should be created and executed in line with the performance capacity of the ProtecTIER system, and the bandwidth capacity aailable between the two sites kept in mind. This concept is coered at length further in this document. In line with this principle, it is typically not recommended to define a single policy with thousands or een hundreds of cartridges and execute it in the manual mode as this may create a large number of eents and replication triggers that can potentially oerload the system. Instead users should use Manual Cartridge Replication with multiple-selection of the needed cartridges. The following example demonstrates the difference between the two approaches: A repository has 500,000 cartridges with a single replication policy defined that includes all cartridges. In this case, with manual policy execution the user will replicate all cartridges including the empty cartridges' meta data. Howeer, if manual cartridge replication with multiple-selection is used, specific cartridges will be selected by the user and only those cartridges will be replicated. Chapter 2. Natie replication normal operation concepts 15

48 System characteristics Shelf can be used by the user without enabling the replication capability, and the Replication manager is a software module that manages the replication grid's configuration in the user's organization. Shelf The ProtecTIER replication feature introduces the concept of a Virtual Shelf. As a general rule replication always occurs from the source shelf to a destination shelf. As with physical tape sheles there is a limit to the number of tapes that can be put on a irtual shelf. The limit is result of subtracting the total number of tapes in all libraries from the total number of tapes supported in a single repository. For example: assuming a repository with three libraries, each contains 50,000 cartridges, in this case the number of cartridges that can be put on the irtual shelf is: 512, ,000 = 362,000 cartridges. Take care when planning a repository to ensure that its shelf and the sum of tapes in all of its libraries do not exceed this max limit per repository: currently 512,000 for the TS7650G (Gateway), and 128,000 for the TS7650Appliance. The Shelf can be used by the user without enabling the replication capability. In this case eery eject or export operation will send the cartridge(s) to the irtual shelf. Again, the total number of cartridges cannot exceed the maximum number defined during the installation. For the calculation of this limit, all cartridges are taken into account, regardless of where they reside, in a library or on the shelf. Shelf cartridges plus all library cartridges from all libraries equals maximal amount of cartridges for entire repository. For example: assuming a repository with three libraries, if each library contains up to 10,000 cartridges, and the user want to be able to put 30,000 cartridges on the shelf, then they need to plan the entire repository to hae: 30, ,000 = 60,000 cartridges. This fact should be used to plan the repository and the olume of backups and replication. 16 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Replication manager Figure 3. Replication Grid View of all actie repositories The Replication Manager is a software module that manages the replication grid's configuration in the user's organization.

49 Replication manager Figure 3. Replication Grid View of all actie repositories The Replication Manager is a software module that manages the replication grid's configuration in the user's organization. The Replication Manager module will most likely reside on one of the actie ProtecTIER serers in the replication grid. ProtecTIER Manager connects to the Replication Manager using the IP address of the Replication Manager node. The Replication Manager module manages the repositories in the replication grid: Maintaining the IP addresses of all repositories. Update of repositories leaing and joining the grid. High-leel monitoring and statistics of traffic in the replication grids. Only a single Replication Manager instance within an organization is supported. Any single Replication Manager can manage up to 8 hubs with up to twele spokes each. Multiple instances of Replication Manager for multiple standalone networks (their repositories can neer be connected to repositories in other networks) may be created in a single instance per standalone network fashion. For the Replication Manager module to be able to manage a grid with multiple repositories, it needs to hae access and communicate with them all on its network. Chapter 2. Natie replication normal operation concepts 17

50 Once a single Replication Manager instance or a number of independent Replication Managers are set up, a repository cannot be moed from one Replication Manager instance to another. Once a repository has joined a Replication Grid Manger it will always belong to this Replication Manager een if it is remoed (or deleted) from the grid. Any Replication Manager software module running on a ProtecTIER node is limited to a single grid with up to 16 actie repositories (8 replication pairs) on that Replication grid. For managing more than one grid, the Replication Manager software needs to be installed on a separate, dedicated serer, through an RPQ approal process. Replication Manager System B Replication Grid 1 System A RUID 1 System D RUID 2 RUID 3 System R RUID 12 System E System C RUID 5 System X RUID 15 System S RUID 3 RUID 4 Figure 4. Replication Manager managing a replication grid with two topology groups of repositories Figure 5 on page 19 shows in more detail a typical replication network deployment topology for a replication pair. In this example the replication pair consists of a two node cluster in each location. The diagram shows how to utilize the two aailable network ports that each node has, and connect them to two separate subnets for redundancy. 18 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

51 Source Site cluster Replication-Grid with 1 spoke & 1 hub Destination Site cluster Node A 3 4 Subnet Subnet Node A Node B 3 4 Subnet Subnet Node B Figure 5. Replication Grid topology example, replication pair (one hub and one spoke) with two-node-cluster systems Chapter 3. OpenStorage (OST) network configuration This chapter describes OpenStorage (OST) network configuration best practices to support different IP configurations. It will proide the scope and objecties, along with the basic acronyms that will be used in the subsequent sections. The goal is to describe the networking technologies that are used in order to set up the connectiity of the ProtecTIER front end interfaces to the network. This chapter describes the technologies in use such as teaming or bonding on the network serers, 802.3ad Link Aggregation and similar technologies on the LAN switches, along with network topologies for stand-alone and redundant architectures. This chapter describes the technology, topologies and setups for achieing the best performance and suriability oer the network. In an OpenStorage enironment, the ProtecTIER serer connects to the network with multiple Ethernet adapters, and supports a total throughput of hundreds of mega bytes per second oer the network per node. Scope and objecties The purpose of this chapter is to establish a connectiity method for connecting the ProtecTIER serer front end interfaces to the network, and achieing the optimal performance using it. In an OpenStorage enironment, the ProtecTIER serer connects to the network with multiple Ethernet adapters, and supports a total throughput of hundreds of mega bytes per second oer the network per node. The next sections will focus on the following issues: Network connectiity and load balancing method of the ProtecTIER serer. Network connectiity and load balancing method of the hosts connected to the network. Copyright IBM Corp. 2008,

52 Definitions and Acronyms Network configuration and load balancing method of the LAN switches to which the ProtecTIER serer and the hosts are connected. This sections describes definitions and acronyms that are releant to OST network configuration. Link Aggregation A method for grouping seeral interfaces into a single irtual interface, for the purpose of load sharing between the interfaces. IEEE 802.3ad IEEE standard for Link Aggregation for LAN connectiity. Gigabit Ethernet Ethernet that runs in a Gigabit per second bandwidth. VLAN Virtual LAN is a software-defined LAN that groups network elements in the same broadcast domain. Teaming A method for grouping seeral physical adapters into a single irtual adapter for the purpose of load sharing and throughput enhancement. Teaming is a common term for Microsoft Operating System. Bonding This term means the same as teaming and is usually used in Linux/Unix Operating System. Host Bonding/Teaming A network element connected to the network. In the OpenStorage enironment, the NetBackup media serers are referred to as hosts. The connectiity is based on a mechanism called Teaming (usually in Microsoft platforms) or Bonding (usually on Linux/Unix platforms) in serers, and Link aggregation (802.3ad) or Cisco Etherchannel in LAN switches. The purpose of these mechanisms is to achiee higher bandwidth on the connection, as close as possible to the multiplication of the port's bandwidth, along with redundancy between the ports. The teaming is a mechanism that groups interfaces in layer 2 (on the Ethernet layer), while the whole team gets a single IP address. This mechanism can work in seeral common ways, among which are the following: Redundant mode - usually two interfaces: one for carrying the traffic, and one for backup. This is usually called Actie-Backup mode. Transmit and/or Receie load balancing - receie and/or transmit load is distributed independently of the LAN switches to which they are connected ad Mode - the serers and the switches they are connected to must support the standard and load distribution is according to this standard. The first two methods are topologies that are "switch-less", which means that the switch does not hae to support any specific standard. In the last method, the switches in the topology must support the 802.3ad standard, or in some cases the Cisco Etherchannel implementation. 20 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

53 Redundant mode In redundant mode, one of the adapters is actie, and one is passie. When the actie adapter fails, the passie takes its place. Note: This mode does not perform load balancing. Transmit and receie load balancing In transmit mode, the serer adapters usually works in the following way: In transmit load balancing, the traffic is usually sent in a round-robin manner, while the team (or bond) sends out packets with a single MAC address. The load is balanced on the way out from the serer, while all traffic comes in on a single interface. In receie load balancing, the question is how to tell the receier of the packets, to send the packets back to the same interface that it came from. The problem occurs because there are multiple MAC addresses going out with a single IP address. Therefore, when the receier starts to send data, it will send an ARP request, and then will send all packets back to the interface that his address came back in the ARP response. The method that is usually used to sole this problem is to send the receier multiple ARP responses, with all the MAC addresses of the team, so the receier will "beliee" that the IP address is coming from different MACs, and therefore we will get also receie load balancing. Actie Load Balancing (ALB) is a receie load balancing technique ad Topologies This standard does not mandate any particular distribution algorithm(s). Howeer, any distribution algorithm shall ensure that the algorithm will not cause the following: Misorder of frames that are part of any gien conersation Duplication of frames The standard suggests, but does not mandate, that the algorithm may assign one or more conersations to the same port; howeer, it must not allocate some of the frames of a gien conersation to one port and the remainder to different ports. The information used to assign conersations to ports could include the following: Source MAC address Destination MAC address Source IP address Destination IP address The reception port The type of destination address (indiidual or group MAC address) Ethernet Length/Type alue (i.e., protocol identification) Higher layer protocol information (e.g., addressing and protocol identification information from the LLC sub layer or aboe) Combinations of the aboe The policy according to which the bond decides how to distribute the frames across the ports is referred to as "hash algorithm" or "hash policy". The hash policy decides according to what parameters, or combination of parameters, the frames will be distributed. For example, when we hae a serer exchanging information with seeral hosts on the same subnet, configuring a source/destination MAC hash Chapter 3. OpenStorage (OST) network configuration 21

54 will usually gie a reasonable load distribution. On the other hand, if we want to use load balancing oer a router, then a layer-3 hash will not help, since the serer sees only one IP Address (of the router), and therefore all traffic will be sent oer the same interface. In this case, layer-4 hash must be used. Bonding on linux machines This section describes bonding types on linux machines. In linux machines, the following bonding types exist: Mode 0 - sets a round-robin policy for fault tolerance and load balancing. Mode 1 - sets an actie-backup policy for fault tolerance. Mode 2 - sets an XOR (exclusie-or) policy for fault tolerance and load balancing. Mode 3 - sets a broadcast policy for fault tolerance. All transmissions are sent on all slae interfaces. Mode 4 - sets an IEEE 802.3ad dynamic link aggregation policy. Mode 5 - sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. Mode 6 - sets an Actie Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receie load balancing for IPV4 traffic. Receie load balancing is achieed through ARP negotiation. Modes 2 and 4 use a default transmit hash policy of layer 2 (source MAC destination MAC)%N (number of slaes). The hash policy can be modified to layer3+4 where both source and destination IP and port are taken into consideration. Bonding on Unix machines This section describes bonding types on Unix machines. These are the modes supported in IBM-AIX: Standard or 802.3ad Default hash mode - The traditional AIX behaior. The adapter selection algorithm uses the last byte of the destination IP address (for TCP/IP traffic) or MAC address (for ARP and other non-ip traffic). Standard or 8023ad src_dst_port hash mode - The outgoing adapter path is selected by an algorithm using the combined source and destination TCP or UDP port alues. Standard or 8023ad src_port - The adapter selection algorithm uses the source TCP or UDP port alue Standard or 8023ad dst_port The outgoing adapter path is selected by the algorithm using the destination system port alue. Round-robin Outgoing traffic is spread eenly across all of the adapter ports in the EtherChannel. 22 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

55 Teaming on Microsoft-based machines On Microsoft-based machines, the teaming methods are entirely defined by the NIC endor. This section describes the teaming modes for Broadcom and Intel NICs. The teaming modes described are only those used for load sharing. Broadcom NICs Broadcom supports the following balancing modes: SLB Smart Load Balancing In this method both transmit and receie load balancing are enabled, based on source and destination L3/L4 IP addresses and TCP/UDP port numbers. Generic Trunking In this switch-assisted teaming mode, the LAN switch that the serer is attached to, must be also configures for one of the aggregation methods. As is the case for Smart Load Balancing, the IP/TCP/UDP source and destination addresses to load balance the transmit traffic from the serer Link Aggregation (IEEE 802.3ad LACP) Link Aggregation is similar to Generic Trunking except that it uses the Link Aggregation Control Protocol to negotiate the ports that will make up the team. Intel NICs Intel supports the following balancing modes: Adaptie Load Balancing (ALB) This method allows transmission oer 2-8 ports to multiple destination addresses, along with Fault Tolerance. In this method, transmit is done through 2-8 adapters in load balancing, while the team receies packets only through the main adapter. This method works on L3/4 basis. Receie Load Balancing (RLB) This method, than can be configures only with ALB, adds the receie load balancing feature to it, and is also based on L3/4. This is a switch-less method. Virtual Machine Load Balancing (VMLB) Proides transmit and receie traffic load balancing across Virtual Machines bound to the team interface, as well as fault tolerance in the eent of switch port, cable, or adapter failure. This teaming type is switch-less method IEEE 802.3ad In this method, the standard supports static and dynamic modes. Intel supports both modes, and must be configured with a LAN switch the supports the 802.3ad standard, or Cisco Etherchannel technology. LAG and similar technologies on the LAN Switches For LAN switches, configurations are proided for Cisco switches. Some of the features are present in Cisco IOS Release 12.2(33)SB and higher. The 802.3ad is a market-wide standard, supported by all common endors. In Cisco switches, both L2 and L3, two methods are aailable: 1. Etherchannel GEC/FEC (Giga/Fast Ethernet ports), and PaGP Chapter 3. OpenStorage (OST) network configuration 23

56 ad Link Aggregations and LACP control Protocol In switches from other endor's (Aaya, Juniper, 3Com, HP and others), only 802.3ad is used. Etherchannel is a Cisco proprietary technology. In Cisco IOS ersions up to Release 15.0(1)S, mechanisms for load balancing Ethernet serice instances oer member links in a port channel do not account for the serice instances traffic loads, which can lead to unequal distribution of traffic oer member links. In IOS Release 15.0(1)S, a new feature was presented, that is 802.3ad Link Aggregation with Weighted Load Balancing feature (802.3ad LAG with WLB) that allows you to assign weights to serice instances to efficiently distribute traffic flow across actie member links in a port channel. The LAG with WLB feature supports both LACP (actie or passie mode) and manual (mode on) EtherChannel bundling. A weighted load balancing configuration does not affect the selection of actie member links in the EtherChannel. As member links become actie or inactie, a load-balancing algorithm adjusts the distribution of Ethernet serice instances to use the currently actie member links. Setting up the network with ProtecTIER This section describes the set up procedures using the ProtecTIER serer. ProtecTIER serer Configuration The backup application creates a backup or restore stream for each backup set copied to or from the IBM ProtecTIER serer. In an OpenStorage enironment, the ProtecTIER plug-in on the media serer opens by default up to 8 TCP connections for each stream. The best practice for the ProtecTIER serer is to connect it in one of the following configurations: Indiidual IPs (layer 3) configuration In this configuration each interface gets its own IP address. This configuration maximizes the performance but does not offer high aailability of the interfaces. It also requires multiple IPs and subnets. No teaming or bonding is used in this configuration. This configuration is recommended in case no switch support for the other options is aailable. Single team/bond (Layer 2) configuration In this configuration, all front end interfaces on a ProtecTIER node are grouped into one team. This configuration maximizes the high aailability and ease of use, as a single IP and subnet is used for the entire team. Depending on the number of interfaces used, and the switch type and ersion, this method might not maximize the performance. Dual/Three teams (Layer 2+3) configuration In this configuration, 2 or 3 teams are configured. This configuration balances the high aailability, ease of use and performance. In our testing with Cisco 6500 switch configured with Etherchannel, teams containing 2 interfaces each, maximized the performance similarly to option IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

57 Figure 6. Multiple teaming topologies Load Distribution Methodology This section describes the Load Distribution Methodology. The bonding methods which are supported by the ProtecTIER serer are: Mode 0 (Round-Robin) Mode 4 (Dynamic IEEE 802.3ad, switch-assist) L2 L2/L3 L3/L4 Since the ProtecTIER plug-in for OpenStorage which runs on the media serers distributes load on layer 4 basis, the network load balancing should also be performed on this layer, in order to achiee the best performance oer the bond group. Therefore, when configuring bonds on the ProtecTIER serer, choose mode Chapter 3. OpenStorage (OST) network configuration 25

58 4 and L3/L4 as load balancing method. See the IBM System Storage TS7600 with ProtecTIER User's Guide for detailed instructions of how to configure the ProtecTIER serer. LAN switch This section describes the LAN switch. In LAN switches, there are arious technologies that are supported by arious endors, for example Cisco Fast EtherChannel (FEC) and Gigabit EtherChannel (GEC), Nortel's Multilink Trunking (MLT), Extreme Network Load Sharing and others. Most of the endor's supports also 802.3ad. Teams Configuration In the LAN switch, configure as follows: 1. When configured with switch-less topology, no configuration is required. 2. When configured with Etherchannel or 802.3ad, use Layer-3/4 hash mechanism. See Chapter 4, Configuration instructions and examples, on page 33 for instructions how to configure Cisco Etherchannel for this mode. Hosts The hosts can be serers of arious types, operating systems, and hardware profiles. This section describes the configurations of the following deices Broadcom and NICs on Windows platforms and IBM AIX serer Broadcom NICs with Microsoft Platforms If configuring teams, the Broadcom NICs should be configured with the Broadcom network utility. Choose team type of Link Aggregation (802.3ad). Note: Chapter 4, Configuration instructions and examples, on page 33. IBM AIX Platforms If configuring bond, use the AIX configuration tools to create a new link aggregation of type 802.3ad. Set the hash mode to src_dst_port parameter. Network topologies Note: Chapter 4, Configuration instructions and examples, on page 33. Connectiity on a Single Site When connecting the ProtecTIER on a single site with the hosts, it will be connected on the same VLAN with the hosts, or on separate VLANs. Connectiity on a single VLAN, with a single IP subnet In a single switch topology, the ProtecTIER serers and the hosts are connected to the same VLAN, with the same IP subnet, on a same physical switch. 26 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

59 Figure 7. Single host with single ProtecTIER serer on the same VLAN Connectiity on different Switches When connecting the hosts and the ProtecTIER serers on multiple LAN switches, the connectiity between the switches must hae the capability of transferring the data rate required for the backup. Therefore, we recommend using a 10Gb Ethernet connectiity between the switches. Another option would be to define another link aggregation between the switches such that they will be capable of transferring the required bandwidth. Chapter 3. OpenStorage (OST) network configuration 27

60 Figure 8. Multiple switches The dual switch configuration can be used for high-aailability with switch redundancy. 28 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

61 Figure 9. High aailability configuration Connectiity on different VLANs, with different IP subnets In this topology, the host and the ProtecTIER serer are connected on separate VLANs and subnets. The switch has L3 support. Routing is performed between the VLANs. Chapter 3. OpenStorage (OST) network configuration 29

62 Figure 10. Host with single ProtecTIER serer on the different VLANs Connectiity to a Remote Site When connecting the ProtecTIER serer on a site which is remotely connected to the sites where the hosts are located, all backup traffic will be transmitted through a router (or routers) that connects between the sites. The routers (or routers) can be connected to the network ia a single interface, or multiple interfaces in a teaming configuration. 30 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

63 Figure 11. Connectiity in remote sites Chapter 3. OpenStorage (OST) network configuration 31

64 32 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

65 Chapter 4. Configuration instructions and examples This section describes OpenStorage (OST) network configuration instructions and examples. Configuring IP addresses Whether or not bonds/teams are being configured, you will hae to configure unique IP addresses on the hosts and on the ProtecTIER serers. If you are configuring bonds/teams, each bond/team will hae to be assigned with a single IP address. Otherwise, each physical interface will hae to be assigned with a unique IP address. On each system, host or ProtecTIER, each IP address which is configured has to be on a different subnet. Additional hosts and ProtecTIER nodes can share the same subnet. For example, on the first ProtecTIER node, you can configure the following IP addresses: / / / / / /24 In this case, the second ProtecTIER node can use the following addresses: / / / / / /24 In this example, the first network is , and 255 subnet addresses can be defined. Therefore, the first node is using an address in this subnet ( ) and the second node can use a different address on the same subnet ( ). Configuring the network interfaces on the ProtecTIER serer By default, upon installation of the ProtecTIER software, each front end physical interface on a ProtecTIER node is configured with a corresponding irtual interface with a unique default IP address. During the system configuration, it is necessary to proide a new IP address to eery irtual interface, each of the IPs on a different subnet. To configure the ProtecTIER serer front end interfaces, use the appinterfaces command line option. Please see the IBM System Storage TS7600 with ProtecTIER User's Guide, gc or the IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express User's and Maintenance Guide, ga for more information. This command line interface proides the option to: Assign new IP addresses for each irtual interface. Reassign physical interfaces into irtual interface (this is the way to group the interfaces into a bond). Copyright IBM Corp. 2008,

66 Define the load balancing method to be used within a bond. Configuring the network interfaces on the host (media serer) On the hosts, similarly to the ProtecTIER nodes, each physical interface should be assigned with its own IP address or will be grouped within a team/bond that will hae a single IP address. Each interface or bond/team on the host side will need to hae its IP address on a separate subnet. As in the ProtecTIER case, different hosts may share the same subnet. Routing the IP traffic Static routes are a simple and effectie way of instructing the host IP stack how to route IP traffic destined for specific subnets. This is necessary wheneer traffic to any specific subnet is required to be sent through a different gateway and possibly a different network-interface than the default-gateway definition would otherwise dictate. If required, configure your static routes such that each port on the host can reach one irtual port on each ProtecTIER node to which it is connected. If possible, configure all IP addresses on the media serers on the same subnets that you hae defined on the ProtecTIER nodes. Configuring Etherchannel load balancing on Cisco switches This example shows how to configure EtherChannel to use source and destination IP addresses and port numbers as parameters for load balancing (L3/L4): 1. Configure the load balancing Router# configure terminal Router(config)# port-channel load-balance src-dst-mixed-ip-port Router(config)# exit Router(config)# 2. Verify the configuration Router# show etherchannel load-balance EtherChannel Load-Balancing Configuration: src-dst-mixed-ip-port enhanced mpls label-ip Configuring bonds, load balancing and IP addresses on AIX hosts 1. Create the bond: a. Go todeices Communication EtherChannel/IEEE 802.3ad Link Aggregation Add An EtherChannel / Link Aggregation. b. Choose the network interfaces (ports) you would like to bond together and click OK. c. Change the parameter "Hash Mode" from default to src_dst_port Note: By doing this, a new en#"/"et# deice has been created on the machine and assigned to it the ports selected from the list Assign IP address to the newly created bond: a. Go to Communications Applications and Serices TCP/IP Minimum Configuration & Startup. 34 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

67 b. Choose the newly created en# deice (not the et#, that is the "Standard Ethernet Network Interface") and update it's IP and Network mask as needed. Configuring teams, load balancing and IP addresses on Windows hosts In order to configure teams in Windows platforms, use the NIC endor network utility. This example is using BACS which is Broadcom's network utility. 1. Create the team: a. In the Explorer View of BACS, under Team Management, right-click the Teams at the top of the tree, and choose Create Team. Gie a meaningful name to the team (the irtual deice created), the Team Type parameter should be Link Aggregation (802.3ad). b. Check the "members" of this team (the ports to be assigned to the deice), and click Apply / Exit. 2. Configure the IP address to the newly created team: a. Open Network Connections, right-click the new deice (appears under what was defined as "Team Name") Properties Internal Protocol (TCP/IP) Propertiesbutton. b. Choose Use the following IP address, and change the IP address and the Subnet mask. Chapter 4. Configuration instructions and examples 35

68 36 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

69 Chapter 5. Planning for replication deployment The planning process of ProtecTIER systems with replication deployed requires more input and consideration beyond the indiidual capacity and performance planning needed for a system that will only be used as a local VTL. The planning process of ProtecTIER systems with replication deployed requires more input and consideration beyond the indiidual capacity and performance planning needed for a system that will only be used as a local VTL. When a multiple-site, M-to-1 topology is deployed, the entire configuration including all spokes and hub enironments should be planned-for and taken into consideration. The methodology upon which a M-to-1 enironment should be planned for is similar to planning an S x 1-1 replication pairs (when S = number of spokes), and then add all replication loads (and potentially local backup) together for the hub. Before introducing the replication capabilities the ProtecTIER system was merely a target for a backup application data stream. Now when offering the replication capabilities of the local repository to a secondary, DR site, the ProtecTIER becomes a piece of the much larger picture of the organization's Disaster Recoery paradigm. The methodology behind a DR plan today calls for an intricate combination of policies, procedures, operations, and IT systems. The DR plans employed by IT organizations across different locations, regions and types of business are as aried as are the businesses they support. Howeer, there is one fundamental building block upon which all DR plans rest: data accessibility at a secondary, DR site. To achiee preparedness for a potential disaster, IT organizations can utilize the ProtecTIER replication capabilities to ensure that the mission critical data that the business needs to continue operations is safely housed at a DR site, intact, with enterprise-class data integrity. The ProtecTIER Planner tool is aailable for download and should be used at the planning stages before any ProtecTIER deployment. Table 1 summarizes the latest fields added to the tool to support planning for the replication feature, including the replication window, the bandwidth needed, the type of links aailable, etc. Table 1. Planning tool added fields to support the replication feature Field Purpose Comments Physical/ Logical A switch declaring the type of replication. Physical describes the use if disk subsystem replication and Logical describes the user of ProtecTIER serer replication. If Physical is chosen, the payload automatically defaults to 100%. Payload % The amount of the described workload that is to be replicated. The default setting is 100%, meaning all workload is to be replicated. Any alue from can be entered. The increment/decrement arrows change the alue by 10%. Copyright IBM Corp. 2008,

70 Table 1. Planning tool added fields to support the replication feature (continued) Field Purpose Comments Time This alue is optional. If null or zero, the time allotted to the purpose of replication is the amount of time in a day NOT consumed by the backup window. If a alue is entered, the calculations for required bandwidth confine themseles to the amount of time specified. The fact that incremental and full backup output hae different windows is taken into account in the calculations. Bandwidth Links Declare how much bandwidth is required to perform the defined task. Gien the bandwidth required, this field describes the type and number of links needed to perform the job. None. Poor link quality is not taken into account - the calculations are based on good utilization of the link(s) and good quality lines. Bandwidth sizing and requirements The ProtecTIER HyperFactor deduplication engine, which eliminates redundant data that needs to traerse the network from site to site, is a key enabler of cost efficient replication for the DR architecture. The combining of the deduplication and replication features of ProtecTIER proides dramatic improements to Disaster Recoery planning and operations for any size organization. It enables ProtecTIER to extend the high leel of data protection proided by replication to a wider range of data and applications in the data center and beyond. Table 2 demonstrates the potential saings for ProtecTIER Replication users oer Generation 1 VTL with no deduplication function, focusing only on the bandwidth cost. Table 2. Potential saings for ProtecTIER Replication users Vs. Gen.1 VTL users Traditional VTL (no dedup): Daily Backup (TB) 5 Data Transferred (TB) 5 Replication window (hrs) 16 Mb/s (megabits) required Required bandwidth OC12 + Cost of bandwidth per Month $30, ProtecTIER Daily Backup (TB) 5 Data Transferred (TB) 0.5 Replication window (hrs) IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

71 Table 2. Potential saings for ProtecTIER Replication users Vs. Gen.1 VTL users (continued) Traditional VTL (no dedup): Mb/s (megabits) required 72.8 Required bandwidth OC3 Cost of bandwidth per Month $11,000 BW Saings Per Month $19,000 Total 3 year bandwidth saings $684,000 The effect of ProtecTIER deduplication capability on the required bandwidth for replication is similar to the effect that the daily deduplication ratio has on the daily backup workload. For example: If the daily change rate is 10%, which means only 10% of the data changes from one day to the next, then not only does the system backup just the changed, unique data, but more significantly it only requires 1/10th of the replication network bandwidth that would otherwise be needed without data deduplication. Table 3The following table shows some aerage costs by bandwidth type. Bandwidth costs ary dramatically by geography and distance so the figures are approximate as an aerage for the US. Using this table, the potential cost saings of ProtecTIER are easy to see. For example, if ProtecTIER allows the user to deploy an OC12 rather than an OC48, the annual cost saings will be close to $700,000 (the difference of $960,000 and $279,000 as seen in the last column of the chart). Table 3. Aerage costs for different network bandwidth - Bandwidth, Capacities and Costs Mb/s MBPS Lease per month (long haul) Total MB in 24 hours Total GB in 24 hrs Cost/yr T1/DS $2,200 16, $26,400 T3/DS $5, , $60,000 OC $11,000 1,679,400 1,679 $132,000 OC $23,325 6,717,600 6,718 $279,900 OC $80,000 26,870,400 26,870 $960,000 OC $254,524 95,612,400 95,612 $3,054,288 OC $1,144, ,980, ,980 $13,735,488 Network bandwidth sizing tips ProtecTIER replication only replicates the new or unique deduplicated data. Data that was deduplicated on the primary will not be physically sent to the DR site (saing bandwidth); howeer the DR site (hub) still needs to read the entire amount of data to ensure 100% data integrity. Chapter 5. Planning for replication deployment 39

72 A simple example will demonstrate the impact of deduplication on the replication operation: Assuming there are two cartridges at the primary site, cartridge A and cartridge B, that contain the exact same 1GB of data: Replicating Cartridge A will transfer 1 GB of physical, which equals nominal, data, as at this point in time the data is new to the DR site repository. Replicating Cartridge B will transfer 0 GB of physical data to the DR site as all of the data already exists at the DR site repository. In effect, 1 GB of nominal data will be represented and indexed as Cartridge B following its replication action. There are two types of replication data-transfer throughput barriers as follows: 1. Physical data-transfer barrier, stems from the fact that each ProtecTIER node has two 1Gig E ports - A single node system supports up to 190 MB/Sec physical data transfer (2x1 Gig replication Ethernet ports). A dual-node clustered system supports up to 380 MB/Sec physical data transfer (as it has 4x1Gigreplication Ethernet ports). 2. Nominal data barrier (nominal data is the original amount of backed-up data before applying ProtecTIER's deduplication factor). Stems from the maximum processing capability of a gien ProtecTIER system - A single node system supports up to 500 MB/Sec of nominal data replication A dual-node clustered system supports sustainable rates of up to: 1000 MB/Sec of nominal data backup, with no replication actiity 850 MB/Sec of nominal data replication, with no backup actiity 920 MB/Sec of both nominal data backup and replication running concurrently Note: The actual maximum replication throughput for any gien scenario depend on many factors such as the data type and change rate and will be dictated by the barrier type (physical or nominal) that the ProtecTIER system will reach first. The following formula should be used in order to calculate the replication data transfer. The formula calculates the actual changed data that must be sent across the network, and adds some oerhead capacity that is equal to 0.5% of the daily backup workload: Daily network workload = (daily backup * Change rate) + (0.5% * daily backup) As an example: For 6 TB daily backup with change rate of 10%: (6000 GB * 10%) + (0.5% * 6000 GB) = 630 GB Thus, in this scenario, 630 GB of physical data is replicated to the second site, rather than 6 TB that would otherwise be transferred without deduplication. The following formula should be used to calculate the replication bandwidth required: Diide the daily network workload by the aailable hours in a day for replication 40 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

73 For example: Timeframe for replication window (preferred mode of operation) of 10h GB / 10h = 63 GB/hour required replication bandwidth Continuous replication operation (24h period, concurrent with backup operation, rarely the recommended mode of operation) 630 GB / 24h = 26 GB/hour required replication bandwidth Note: It is recommended to add 10% of the required bandwidth for headroom due to network outages or slowdown periods. Monitoring The backlog alue in ProtecTIER Manager can be monitored in order to proide a rough estimate for what is left to replicate. It is adised to monitor the backlog alue (and the ETA) in ProtecTIER Manager in order to get a rough estimate for what is left to replicate, and a rough approximation of the time to complete replication. This can help users ealuate whether or not their bandwidth planning and replication-window time allocation is correct, and try to predict possible pitfalls if not. Firewalled enironments - ports assignments for the Replication Manager and replication operations Bandwidth alidation tool The Eth3 and eth4 ProtecTIER nodes/serers in the Natie Replication operation should be configured. Eth3 and eth4 on all ProtecTIER nodes/serers inoled in the Natie Replication operation at both the primary and secondary sites should be configured as detailed in Chapter 2 of the IBM System Storage ProtecTIER User's Guide. In addition, in user enironments where firewalls are used it is important to open up the following TCP ports for IP replication to function: 1. The Replication Manager utilizes ports 6202, 3501 and The replication operation between any two repositories requires the following TCP ports: 6520, 6530, 6540, 6550, 3501 and Note: The ProtecTIER Replication does not use or utilize any UDP ports at all. The ProtecTIER Replication Network Performance Validation Utility (pt_net_perf_util) is a tool that tests and erifies the user's replication network performance before replication is deployed on the TS7650 as part of the new ProtecTIER R2.4 software installation or upgrade. About this task This tool should be used to ensure that the replication network is capable of deliering the expected performance. This utility will not predict replication performance, but it may discoer performance bottlenecks. The first element to consider is the bandwidth aailability across the network, but when assessing the network one should remember two other major components, Chapter 5. Planning for replication deployment 41

74 latency and packet loss, which determine the network quality. The latency in any users WAN is dependant upon many factors along the network span and may ary, but should neer exceed 200ms. If it does, it may significantly decrease the system replication throughput. Packet loss across the network should be 0%; any other alue implicates a major network problem that needs to be addressed before replication is deployed. The pt_net_perf_util network testing utility is included as part of the ProtecTIER software package and the installer needs the physical ProtecTIER serer nodes at both sites to run this utility concurrently. At that point in the installation process howeer, it is not yet necessary to build a repository or configure the ProtecTIER back-end disk. The requirements of this utility are: Red Hat Enterprise Linux 5.4 Standard external utilities expected to be in the current path: ping, netstat, getopt, and echo. How to use the utility The pt_net_perf_util utility's objectie is to test maximum replication performance between two future ProtecTIER repositories by emulating the network usage patterns of ProtecTIER's Replication component. This utility will not predict replication performance, but it may discoer performance bottlenecks. The pt_net_perf_util utility and the iperf and nuttcp tools it uses are installed as part of the ProtecTIER software installation. To test the replication performance, use one of the following tools: iperf /usr/local/bin/iperf nuttcp /usr/local/bin/nuttcp This utility has two modes of operation, client and serer. The serer must be started before the client. Before running the utility, shut down all other programs on both the client and serer ProtecTIER systems. The client is the ProtecTIER system that transmits the test data and the serer is the ProtecTIER system that receies the data (also known as the target serer). Based on the data sent by the client and receied by the serer, the script outputs key network parameter alues which indicate certain attributes of the network. The goal of these tests is to benchmark the throughput of the network. The most important benchmark is the direction that replication will actually take place, that is, the target should be tested as the serer since the flow of data will be to that serer from the client. Howeer, it is also important to also test the reerse direction to measure the bandwidth performance during disaster recoery failback. Network bandwidth is not always the same in both directions. In the following procedure, the goal is to test network performance between two machines on a WAN, serer1 and serer2. Each test will run for fie minutes. Since there are fie tests, the process will take a total of 25 minutes. 1. Start the serer mode of the utility on serer1 by entering the following commands on the command line: 42 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

75 cd /opt/dtc/app/sbin./pt_net_perf_util -s Note: The preious command uses the iperf tool. To use nuttcp tool instead, add -n to the command. Enter one of the following series of commands to use: nuttcp: cd /opt/dtc/app/sbin./pt_net_perf_util -sn or cd /opt/dtc/app/sbin./pt_net_perf_util -s -n 2. Start the client mode of the utility on serer2 by entering the following command on the command line:./pt_net_perf_util -c serer1 -t 300 Note: This step uses the iperf external utility. To use nuttcp instead, add -n to the command. Enter the following command to use nuttcp:./pt_net_perf_util -c serer1 -t 300 -n 3. The utility will automatically perform the tests in sequence. The client output (serer2 in the example below) will look similar to the following: Note: In the sample output below the test ran for only 5 seconds instead of 300. *** Latency PING ( ) 56(84) bytes of data ping statistics packets transmitted, 5 receied, 0% packet loss, time 4001ms rtt min/ag/max/mde = 0.257/0.406/0.484/0.079 ms *** Throughput - Default TCP [ 3] sec 56.6 MBytes 94.8 Mbits/sec *** Throughput - 1 TCP stream(s), 1MB send buffer [ 3] sec 57.0 MBytes 95.0 Mbits/sec *** Throughput - 16 TCP stream(s), 1MB send buffer [SUM] sec 65.0 MBytes 94.3 Mbits/sec *** Throughput TCP stream(s), 1MB send buffer [SUM] sec 127 MBytes 94.1 Mbits/sec Number of TCP segments sent: Number of TCP retransmissions detected: 21 (0%) Done. See the next section for information about interpreting the results of the tests. Interpreting the results The utility performs fie foreground tests (Tests 1-5 below), and one background test (Test 6 below). The example outputs below are from the client side, with the script using iperf (not nuttcp) in tests 2-5. Each of the first fie tests below ran for 30 seconds (-t 300), while the last test monitored TCP performance during that time. Test 1: Latency This test checks the nominal network link latency and packet loss. Example result: *** Latency PING ( ) 56(84) bytes of data ping statistics packets transmitted, 120 receied, 0% packet loss, time ms rtt min/ag/max/mde = /78.491/ /9.872 ms Chapter 5. Planning for replication deployment 43

76 Interpreting the results: The aerage round-trip-time (rtt) was 78.4ms and there was 0% packet loss. The latency in WAN topologies may ary, but should neer exceed 200ms. Contact your network administrator if latency reports more than 200ms, as it may significantly decrease replication throughput. Higher latency alues will cause a major deterioration in replication throughput. Packet loss should be 0%. Any other alue implicates a major network problem. Test 2: Throughput - default settings This test checks maximal TCP throughput using a single data stream with default TCP settings. Example result: *** Throughput - Default TCP [ 3] sec 2.41 GBytes 173 Mbits/sec Interpreting the results: The test ran for seconds, transferred 2.41 GB, with an aerage throughput of 173 Mbits/sec. Note: 1 MByte = 1,048,576 bytes. 1 Mbit/sec = 1,000,000 bits/sec. Test 3: Throughput - single stream, 1MB send buffer This test checks maximal TCP throughput using a single data stream with a 1MB send buffer. Example result: *** Throughput - 1 TCP stream(s), 1MB send buffer [ 3] sec 2.51 GBytes 180 Mbits/sec Interpreting the results: The test ran for seconds, transferred 2.51 GBs, with an aerage throughput of 180 Mbits/sec. There may be an improement here on high-latency links. Test 4: Throughput - 16 streams, 1MB send buffer Example result: *** Throughput - 16 TCP stream(s), 1MB send buffer [SUM] sec 5.91 GBytes 418 Mbits/sec Interpreting the results: The test ran for seconds, transferred 5.91 GB, with an aerage throughput of 418 Mbits/sec. The extra streams yielded higher utilization of the connection. The Mbits/sec reported in this test is the maximum replication performance your system will achiee if your backup enironment is using up to 2-3 cartridges in parallel. Test 5: Throughput streams, 1MB send buffer Example result: *** Throughput TCP stream(s), 1MB send buffer [SUM] sec 8.08 GBytes 550 Mbits/sec 44 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

77 Interpreting the results: The test ran for seconds, transferred 8.08 GB, with an aerage throughput of 550 Mbits/sec. TCP takes a while to reach its maximal throughput. Longer testing times, 300 seconds or more, will produce more accurate results. The throughput alue gien by this test is the potential physical replication throughput for this system. It is directly affected by the aailable bandwidth, latency, packet loss and retransmission rate. The Mbits/sec reported in this test is the maximum replication performance your system may achiee. If this number is lower than anticipated, contact your network administrator. Test 6: TCP Retransmissions s. Total TCP segments sent Example result: Number of TCP segments sent: Number of TCP retransmissions detected: (12%) Interpreting the results: A total of TCP segments were sent during the fie tests, out of which, were lost and retransmitted. The retransmission rate imposes a direct penalty on the throughput, as the retransmission of these packets take up bandwidth. The retransmission can be 23 caused by the underlying network (such as packet dropping by an oerflowed router) or by the TCP layer itself (such as retransmission due to packet reordering). Segment loss can be caused by each of the network layers. TCP retransmission larger than 2% may cause performance degradation and unstable network connectiity. Contact your network administrator to resole this issue and reduce it to approximately 0%. Recommendation It is recommended to run these tests again to test the reerse throughput in the network. To run the tests in reerse, change serer1 to the client and serer2 to the serer and repeat the procedures. Practical approach for replication performance planning ProtecTIER offers a number of tools for controlling the performance rate and bandwidth that the replication actiity will use both at the serer and at the network leels. The following is a list of those particular tools: Replication Rate Control Nominal data limits Physical data rate limits: Per serer Per port Time frame limits: Weekly scheduler Chapter 5. Planning for replication deployment 45

78 Replication rate control and bandwidth throttling The nominal and/or physical throughput (data flow rate), can be limited by setting the replication rate control. The following information should be taken into consideration: Both nominal and physical amounts of data being processed/transferred. The ability to send and receie new/unique data between spokes and the hub. The ability to alidate new/updated object (irtual tape cartridge) at the target repository before making it aailable for the user. The Spoke, Hub and link dictate the rate at which data is transferred. The Hub dictates the rate (nominal terms) at which a new/updated object is alidated. Both of these metrics must be considered when planning a deployment. Setting the replication rate control allows the user to limit the nominal and/or physical throughput (data flow rate) of replication. The feature can be used on spokes as well as on the hub in both ways, sending & receiing. The alues set for the physical and nominal limits hae no explicit influence on one another. That is, the alues set in the physical throughput may, but do not necessarily impact those alues set in the nominal throughput, and ice ersa. Howeer, when using both methods, the physical settings will oerride the nominal ones. Nominal limit - is the method to define ProtecTIER serer system resource sharing between replication and local backup operations at both a spoke and the hub. The Nominal throughput directly affects the replication data flow and thus the load on both the source and destination repositories. When Nominal limit is set at a ProtecTIER serer that performs both backup and replication operations, the replication nominal rate does not necessarily compete with the backup operation (depending upon the system set-up replication and backup running concurrently or in separate timeframes). Howeer, setting the limit on a source repository guarantees that the backup operation will get the total possible throughput minus the nominal limit set here: For example gien a node that has a max performance capability of 500 MB/s and performs backup and replication operation concurrently, the user may choose to limit the max amount of system resources dedicated to replication to a 300 MB/s when running on its own (no specific time frame), and only 100 MB/s when running concurrently with backup, using the right top quadrant of the settings screen (Nominal Throughput). Physical limit setting - the method to limit ProtecTIER's replication network bandwidth consumption. Use when the user's network is shared between PT and other applications, so that they could all run concurrently. Although can be done at both spoke and hub, this is a spoke issue mostly because limiting the hub will limit bandwidth for the entire replication operation, resulting in de-facto limitations on all spokes. The Physical throughput-limit restrains the amount of I/O and resources that the replication consumes at the local repository. Implicitly, this reduces the total load on the replication networks used by the repository and the amount of resources needed at the peer repository, as well. 46 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

79 The Replication Rate Limits dialog is diided into separate areas for Physical and Nominal throughput. These areas are diided further into arious scenarios where the user can limit the rate of replication: Replication rates/throughput during backup or restore Replication rates when there is no backup or restore operations With no replication timeframe defined and with a replication timeframe defined The same limit settings options are aailable for both nominal and physical throughputs. Figure 12. Replication Rate Limits Limiting the network interface (port) bandwidth consumption Bandwidth throttling is a way to control the speed at which replication actiities operate, whereby the user can specify a maximum upper limit for the physical network usage. The ProtecTIER hardware platform uses two TCP/IP Gigabit Ethernet interfaces per node for sending the actual replicated data traffic across the network. In terms of physical ProtecTIER data transfer, after applying oerhead (TCP/IP) this translates into MB/s per single node and MB/s for a dual-node cluster. By default, there is no configured bandwidth limit, ProtecTIER will attempt to utilize as much bandwidth as it can. If the physical network layer consists of dark fibre or other high-speed network infrastructure, there is typically no reason to limit replication throughput. Howeer, if ProtecTIER is running oer a smaller network pipe that is shared by seeral applications simultaneously, the user may choose to restrict the maximum throughput used by ProtecTIER replication. This parameter is configurable per GigE port on all nodes in the replication grid but it only applies to outgoing data, so it is only effectie to set it at the source (sending) system. If the source system is Chapter 5. Planning for replication deployment 47

80 composed of a dual-node cluster, it is important to set the limit at each node. For example, if you want to hold ProtecTIER replication to no more than 100 MB/s and you are using all four aailable Gig E ports of the source dual-node cluster, you must set each port's limit to 25 MB/s. Likewise, if the replication traffic is split between two networks with different bandwidth capacities, you can set different limits per port to implement a network-specific cap. By default, the setting per port is: Unlimited. Note: If the bandwidth limitation is changed during replication, the change does not take effect immediately. If replication begins after the bandwidth limitation change, the effect is immediate. Figure 13. Potential modification of Eth3 interface limit Spoke replication rate Assuming a deployment where each spoke backs up 4 TB /day with a 16 hour replication window) Assuming 10:1 deduplication ratio, 400 GB new/unique data must be transferred 400 GB in 16 hours demands 7 MB/sec sustained for the 16 hour period Each spoke will consume 7 MB/Sec equalling 38% of 1 OC3 link (20MB/Sec) The same network infrastructure can be shared with other applications (under the assumption they hae similar control mechanisms to ensure they do not consume ProtecTIER secured bandwidth) Network sharing at the spoke The user should take adantage of the limit-setting tools to enable sharing of common resources such as network bandwidth. Assuming another application needs 50% (10MB/Sec) of the OC3 link (20MB/Sec) used by ProtecTIER replication, there is still enough bandwidth to share between the two application (ProtecTIER uses only 38% of the OC3 link capability). But if another application needs 70% (14MB/Sec) of the OC3 link used by ProtecTIER replication, the spoke in question can only replicate at 6 MB/sec. It is necessary to set the replication physical-limits to 6 MB/Sec for this spoke: 6 MB/sec for 16 hours equals 346 GB of physical data to transfer 48 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

81 At 10:1 the maximum local backup work for this spoke is 3.5 TB The other spokes can now do a little more work (4.04 TB/night each) Or: The Hub can do less local backup so the replication window can be extended to approximately 20 hours Or: Plan some oerlap time-frame of backup and replication actiities to accommodate the slower spoke (example to follow) Figure 14. Suggested replication rates limits per the aboe example Choosing the replication mode of operation - scheduled s. continuous ProtecTIER offers two modes of operation for the replication actiity: scheduled replication (with a predefined time window) and continuous replication which runs concurrently with the backup operation. The mode of operation is configured at the source system, and all replication policies defined, operate in one of these modes. In almost all cases, scheduled replication is the recommended approach, as it enables the user to accurately plan for performance, and better ensure that SLAs are met. Chapter 5. Planning for replication deployment 49

82 There are seeral ariables such as payload, aailable bandwidth, network utilization, time windows and required SLAs, that may influence the final decision on which mode to choose. Gien this mix and understanding the importance to the end user of any one of these relatie to others, the choice of approaches becomes clearer. The two choices are: Timeframe Window, No Backup Precedence in the ProtecTIER Manager. This is the recommended first choice mode: Defines start and end of the replication time Begins replication actiity as soon as the time window begins Replication will be prioritized at the same leel as backup/restore and compete on the same resources if backup/restore actiity will take place at that time frame. Stops replication actiity as soon as the time window ended Each cartridge in transit will stop at a consistent point after window ended Will not replicate at all out of the time window scope Continuous/simultaneous, Precedence to Backup in the ProtecTIER Manager: Data will automatically start replicating to a DR site repository as soon as it is written to a cartridge Replication will run faster (up to 100% of aailable performance) if system is considered idle (no backup/restore actiity) Replication will be prioritized lower compared to backup/restore if running concurrently Both will compete on the same CPU and Disk resources Typically, it may require a significantly larger system to enable concurrent operations Choosing the method of replication Global/timeframe Window: The first choice method, which imitates the procedure used with physical tapes that are being transported to a DR site after backup is completed. This method allows users to keep complete sets of backup data together with matching B/U catalog/db for eery 24 hour period. The user specifies a window during which replication is to take place. This approach allows the backups to finish without replication impacting performance/backup window times. Continuous/Simultaneous: : Aailable/recommended for certain user's scenarios, as explained later in this section,, and enables concurrent operation of backup and replication actiities. This should be considered when a user has consistently lower bandwidth, and when their operation calls for few backup windows that are spread throughout the day or when deploying a mult-site scenario especially across multiple time-zones (examples to follow). Care must be taken when choosing this option as there will be an impact on backup performance; read performance of the disk subsystem will be shared between the different processes needed to deduplicate the backup stream and read the data to be replicated. Global replication time frame The objectie of configuring the replication time frame is to control when replication actiities occur. This is a system-wide option, meaning it affects all 50 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

polices in the system, and is configured only at the source (spoke) repository, as well as at the receiing (hub) system. There is no replication window definition for the receiing (target) system.

83 polices in the system, and is configured only at the source (spoke) repository, as well as at the receiing (hub) system. There is no replication window definition for the receiing (target) system. When selecting the replication mode, the user is gien two options: 1. No Backup Precedence: This is the Dedicated Window mode. With this option, the user can schedule a time window per day during which replication actiities will occur. This mode of operation should be considered first, and for almost all customer 1-1 replication deployments (1 pair) use cases, it will perform best. 2. Precedence to Backup: This is the continuous mode of operation, in which replication occurs concurrent to the backup operation. This mode is aailable and recommended for multi-site use-cases, as will be further described later in this section. Figure 15. Set replication time frame screen, showing same 16 hour replication window for all 7 days Critical points to note: Aoid time-window conflicts when defining time frames at the hub and at the spokes. There is no synchronization mechanism to foresee misalignments, therefore if the user set the hub and spokes to different time slots, replication will neer run. User needs to make sure the hub has enough timeframe slots to accommodate all of the spokes time frames combined. Dedicated replication time frame window In the dedicated mode, the user defines a time slot during which replication operations will execute throughout the day. Once this daily period is exhausted, all replication actiities are halted at a consistent state and queued until the following day's window to be resumed. Chapter 5. Planning for replication deployment 51

84 The purpose of dedicated mode is to proide separation between backup/restore and replication operations. Because both backup/restore and replication jobs access the same backend disk repository, contention between the two will elongate both jobs in an unpredictable manner, which may impact the backup SLA and oerall RTO. Thus, it is a fundamental requirement that replication tasks will not hae to compete with general backup and restore operations oer system resources within the dedicated time frame. Note: The irtual tape library will remain fully aailable to the backup application throughout the dedicated window; no operations are actually blocked. In the eent a backup or restore operation must be performed during the replication time frame, there will be nothing preenting it. Since backup/restore operations are not impeded in any way, it is important to be cautious if utilizing the system during the replication window. Een if replication is not running at nearly the maximum throughput that the system was configured for, there is risk of resource contention if backup/restore jobs are executed that use physical disk spindles used by replication. Therefore it is recommended to plan for and configure the system resources accordingly with enough performance rate allocated for both actiities to finish their tasks within the desired time frames. Continuous replication The alternatie to the time frame option is to hae no specific replication window at all. This mode might stresses the system resources and impair performance and bandwidth utilization; it should, howeer, apply to certain user enironments when a user has consistent low network bandwidth aailable and the operation policy calls for few backup windows spread throughout a day period or when there are multi-site spokes that may be spread across multiple time-zones and some of them need more time or different replication windows to accomplish their load. In this mode of operation, no specific window is defined and replication actiities may run at any time continuously throughout the day. Howeer, there is a core difference in how system resources are managed in this mode compared to a dedicated window. Within a dedicated replication-window, there is no weighing of which internal serice request will be executed at higher priority; replication and backup/restore resource requests are equally weighted and processed in a first-in-first-out fashion. Although the oerall system throughput (backup/restore plus replication) can reach the maximum configured rate, because there is no precedence gien to backup/restore jobs, the backup duration may ary be prolonged, and carful planning should be implemented to ensure that the oerall SLA policy is met. Conersely, in Continuous replication mode, ProtecTIER uses a built-in resource-goerning mechanism, Replication Rate Control (RRC), that gies precedence to backup/restore requests and throttles down replication traffic wheneer backup/restore actiity increases aboe what the system calls idle-state, until the backup/restore workload drops below that idle threshold again. The rate control calculation uses the performance limits (defaults, or alues set by the user) to determine the maximum replication-rate allowed for that serer for both leels of system states, IDLE and BUSY, under any type of operation. The system defaults are based on the specific ProtecTIER node (repository) max expected performance calculated/entered during deployment. The rate control is a dynamic function, and is based on real-time machine I/O actiity statistics. During backup IDLE state, the 52 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

85 replication actiity will receie 100% of aailable machine resources or up to the limit set by the user (the lowest between the two). During the time frames when backup/restore actiity picks-up (BUSY-state), the replication actiity will receie limited resources equal to either the system's defaults (roughly 15% of the system's aggregate performance capability) or the alues set by the user (that could be lower or higher than the 15% default). During these time frames (called busy) when backup/restore actiity picks-up; the replication actiity will receie limited resources equal to roughly 15% of the system's aggregate performance capability. Both the threshold and the minimum allocated resource percentage are pre-configured settings, based on the specific ProtecTIER node expected performance during deployment. The rate control is a dynamic function, based on real-time machine I/O actiity statistics, and the actual serer capabilities (or user set limits). During backup idle state, the replication actiity will receie 100% of aailable machine resources or up to the limit set by the user. This is done to preent race conditions (oer machine resources) when backup/restore actiity starts, and allows fro the system to dynamically control the allocation of machine resources between the actiities. For example: Assuming a system is set to run in the continuous mode and is capable of deliering 500 MB/Sec. Backups are currently running at 35 MB/Sec, replication will not be throttled, and could use roughly up to 465MB/s (500-35) which is the maximum capability of the system (or up to the limits set by the user). Howeer, when backup throughput increases to 200 MB/Sec, replication will be throttled down to lower performance consumption until the backup throughput decreases. In this case the minimum throughput that is expected from replication would be roughly 75MB/Sec (15% of 500MB/Sec), and the maximum (with no backup actiity at all) will be about 500MB/Sec (100% of 500MB/Sec). The rate-control function is designed such that when both replication and backup operation are running concurrently it will respond ery quickly to changes in the backup actiity/pattern so the oerall system resources will be allocated and utilized in the most efficient way between both actiities. Considerations for choosing continuous operation or dedicated window - summary Deploying ProtecTIER replication with a dedicated window will be in most 1-1 replication use-cases the best option for the user. The sum of backup/restore and replication throughput, measured nominally, cannot exceed the oerall throughput that the source system is capable of deliering. Thus, when operating in Continuous mode where replication tasks compete for resources with backup/restore operations, it can be difficult to ensure that all replication jobs complete within the mandated RTO time frame. Thus, it is recommended to first explore the dedicated replication time frame window mode of operation and when choosing oerlapping time frames, size the performance rate for both actiities accordingly. A primary benefit of the dedicated window is the ability to strictly classify when ProtecTIER utilizes the network infrastructure, allowing the user to accurately isolate arious applications' usage of the network. This mode of operation is aligned with the current backup and DR actiity, in which users typically manage a specific backup window and schedule cloning or aulting jobs that follow the backup window. Howeer, in certain user enironments when a user has consistent low network bandwidth aailable and the operation policy calls for few backup windows spread throughout a day period, or when deploying a multi-site configuration, Chapter 5. Planning for replication deployment 53

86 when the user has many spokes replicate into the target hub, and especially when they may be spread across multiple time-zones, a continuous mode of operation may be unaoidable and required to ensure that all spokes (and potentially the hub local backup operation) will be able to finish their workloads within the time-frames allotted. Choosing a replication mode of operation - isibility switch s basic DR ProtecTIER Replication has a two phase process; the isibility switch and the basic DR. The isibility switch allows physical cartridges to be shipped from the source/primary location to the DR, and the basic DR allows cartridges to be kept on a physical shelf at the DR site. ProtecTIER Replication can be conceptually described as a two-phase process. First, the core replication tasks are executed to asynchronously moe deduplicated cartridge data across the wire to the target/dr site (hub). This phase is entirely managed by the ProtecTIER and has only one mode of operation that is, there is only one way in how data is physically transmitted to the other site. As soon as a cartridge replication job begins at the source, the respectie replica cartridge will be created at the target's system's irtual shelf (unless it already preiously existed). The replica cartridge will grow on the target system's shelf as its incoming data is processed. When the replication jobs complete, phase 2 of the oerall process comes into effect. This phase, configured ia the replication policy, dictates the logical location of the replica cartridge(s). The options are: either the cartridge will be just copied to the target side or it will be actually ejected from the source irtual library. When the source cartridge is ejected from the source system's irtual library, it will automatically be moed to that library's Import/Export slot. As soon as replication completes for the ejected olume (source and replica cartridges are in sync), the replica can either (a) stay on the target shelf, if it was just copied, or (b) automatically moe to an aailable Import/Export slot at the target library. From here on, option (a) will be referred as basic DR, and option (b) will be referred as isibility switching. General description Basic DR can be compared to disaster recoery strategy with physical tape libraries where cartridges are kept on a physical shelf at the DR site or at a remote storage facility. When the source site fails, physical cartridges are rapidly transferred to the DR location and imported into the standby library. The same notion exists with ProtecTIER when using the basic DR mode. Cartridges reside on the target system's irtual shelf ready to be imported to an existing or new irtual library. Granted, the time consumed moing physical cartridges can be far greater than irtual, so this is not an analogy of effort or conenience but that of process flow. Visibility Switching on the other hand, resembles a warm backup site practice, where physical cartridges are shipped from the source/primary location to the DR site and stored in the standby library's physical slots. When a disaster is declared, the necessary cartridges are immediately aailable for recoery at the DR site. Visibility switching emulates this process. Beyond sering as a warm DR site, there is a key functional alue-add that isibility switching proides to select backup enironment that support single domain between the primary and secondary sites. 54 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

87 Added alue of isibility switching In a irtual world, the difference between the preiously described modes is minimal in terms of operational oerhead Importing cartridges from the irtual shelf to a library is ery fast, requires little effort, and can be done from anywhere (with network access to the system). Storing cartridges on the irtual shelf does not make the DR system significantly less reactie for recoery, so the RTO that a ProtecTIER Replication based DR solution can offer represent a significant improement oer physical tape based solution. Further more, the major alue-add of isibility switching is more ersatile management for backup applications that support cross-site distributed tape library management of their catalog, such as within a single domain. Backup applications that can manage multiple sites ia a uniersal catalog/database can leerage ProtecTIER's automated cartridge moement to easily moe cartridges from site to site without using any other interface other than the backup application itself. After initially configuring ProtecTIER's replication policy to use isibility switching, the backup application can eject a cartridge from a source library causing -- pending completion of replication -- the cartridge to appear at an Import/Export slot of a designated DR site library. Likewise, cartridges can be moed back to the source site library using the same process. Haing control of the cartridge moement through the backup application simplifies the process of cloning cartridges to physical tape at the target site if desired. Cutting physical tape from replicas at the target site requires a few additional steps from ProtecTIER if isibility switching is not used. Due to the fact that cartridges can only be isible at one library at any gien point in time (because backup applications cannot handle the same barcode at multiple sites) ProtecTIER does not permit a cartridge to be isible in two libraries, een though physically the data exists in both locations. This, in order to cut a physical copy at the DR site without isibility switching, an operator would hae to import the replica into the target library after replication completes, and also import the source cartridge back into the primary library when the clone job completes and the replica is ejected back to the shelf. Some of the known major backup applications that support single domain are: Symantec NetBackup, Legato NetWorker, and IBM System i BRMS. Applications that do not support a single domain hae no real alue utilizing the isibility switching mechanism as each library is managed by a separate entity with no shared knowledge of the replicated olumes content and whereabouts. In these enironments, the local backup serer has to process through a recent backup catalog/database that describes the content of the associated data olumes. Eery set of replicated data cartridges imported into a target library needs to be preceded by a recent catalog/database update (typically stored on one of the ProtecTIER replicated olumes, but may also be replicated ia other means). After the local backup serer at the target site is updated with a recent catalog, the procedure for cutting physical tapes is the same as in a single domain without isibly switching. Cartridges must be moed from the shelf to the library, and exported back to the shelf when finished. One important note that applies mainly to multi-domain enironments is that ProtecTIER will not replicate a cartridge if both source and target instances are in libraries. As such, if a cartridge is not exported to the shelf following a clone, but rather is left in the target library while the source cartridge is moed into a library, Chapter 5. Planning for replication deployment 55

88 when new data is appended to the cartridge, replication will not start, and errors will be generated. In turn, when manually moing cartridges into a library, it is recommended to erify the state of the other site's cartridge. Use cases to demonstrate features of replication/dr operation The daily operational features aailable with ProtecTIER replication, are demonstrated by uses cases. This section includes three flow-chart use cases to demonstrate daily operation features aailable with ProtecTIER replication in different backup enironments. The following tables show step-by-step, per site (primary & secondary), all the actions/actiities that are needed to perform backups, replication, cloning to physical tape at the DR site, and conducting a DR test. Single-domain scenario This use case is for backup applications that allow and support sharing of their catalog/db across sites/locations (potentially in different geographies) such as Symantec NetBackup, Legato, BRMS, etc. This allows the backup serers on both the primary and DR locations to share the same catalog/db which makes the ProtecTIER isibility switch control ery effectie and makes the operation of cloning cartridges to physical tape at the DR site ery simple as there is no need to bring on-line and recoer the backup application eery time the user wants to do the cloning. The following table shows the user operation flow when working in a single-domain enironment and conducts cloning to physical tape at the DR site. The assumed enironment consists of one master NetBackup serer managing both sites, including the libraries in both locations (Lib A, A', B). Table 4. Netbackup user operation flow: Netbackup Serer/media ProtecTIER 1 (local site) ProtecTIER 2 (Hub/DR site) Lib A Lib A' + Lib B (either irtual or physical) Local Site Prerequisites Create repository Create library(s) Install the Replication Manager SW module (on the local or DR ProtecTIER node) DR Site Create repository Create library(s) Create Grid A Add repository to Grid A Pair local and DR repositories Create replication policy to select the cartridges for replication and the specific DR library for the isibility switch Utilizing the Visibility feature use case Run regular backup actiity to Lib A (to cartridges included in the replication policy) 56 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

89 Table 4. Netbackup user operation flow: Netbackup Serer/media (continued) ProtecTIER 1 (local site) ProtecTIER 2 (Hub/DR site) Create a Vault policy related to Lib A to manage the export/import of the cartridges from/to this library ault_policy_local Run ault policy (ault_policy_local) to eject all required cartridges from Lib A All ejected cartridges will moe from lib A to repository 1 (local) shelf Create Vault policy related to Lib A' to manage the export/import of the cartridges from/to this DR library ault_policy_hub_dr, Cartridges that were ejected from Lib A will automatically moe from repository 2 (DR) shelf to Lib A' Import/Export slots (pending completion of replication/data-sync for each cartridge) To enter the cartridges to the Lib A' slots, run a command to import them: 'ltinject <ault_policy_hub_dr> This command need to be ran in a loop until all respectie cartridges are imported into the library Duplicate cartridges to physical library (Lib B) Once the duplication/clone operation is completed, run ault policy to eject all required cartridges using ault_policy_hub_dr All ejected cartridges will moe from Lib A' to the repository 2 (DR) shelf Cartridges ejected from Lib A' will moe from repository 1 (local) shelf to Lib A Imp/Exp slots To moe the cartridges into the library, there is a need to run a command to import them: ltinject <ault_policy_local> This script will need to run in a loop until all respectie cartridges are imported into the library Once all the cartridges are imported, they can be used for new backups Recoery at the DR site from duplicate/cloned cartridges: Restore cartridges from Lib B to the NBU backup serer The NBU catalogue already contains all records about cartridges in Lib A' and Lib B, therefore can immediately be used for restore operations Chapter 5. Planning for replication deployment 57

90 DR test operation in a multiple (two)-domain backup enironment This use case describes the option to perform a DR test in order to simulate a scenario of disaster at the primary site and the recoery operation from the replicated cartridges at the Hub/DR site. This scenario assumes: Backups may still run at the primary/local site (as this is just a DR test). The DR site has different/separate backup serers than the local site. Some or all of the cartridges may be part of a isibility-switch enabled policy. The user will be able to recoer the DR site from the replicated cartridges, while backups continuously running at the local (primary) site, once exiting DR mode, the replication of data will automatically resume between the two sites. Table 5. Netbackup DR test simulation in two domain backup enironment NBU Serer/media 1 NBU Serer/media 2 ProtecTIER 1 (local) Lib A ProtecTIER 2 (Hub/DR) Lib A' Lib B Physical library Local Site Prerequisites Create repository Create library(s) Install the Replication Manager SW module (on the local or DR ProtecTIER node) DR Site Create repository Create library(s) Create Grid A Add repository to Grid A Pair local and DR repositories Create a replication policy to select the cartridges for replication and the specific DR library for the isibility switch DR test use case Run regular backup actiity to Lib A (to cartridges included in the replication policy) Enter DR mode using the designated ProtecTIER Manager wizard Recoer the backup application from the replicated catalog/database (either from the catalog backup cartridge, or from other means) Rescan the backup application in order to learn the library dimensions 58 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

91 Table 5. Netbackup DR test simulation in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Backups can continue running to Lib A while the system is in DR mode, howeer no data will be replicated to the DR until exiting DR mode. As a result the new backup data to be replicated will be accumulated as a replication backlog/queue. Moe required cartridges from the repository shelf to the required libraries (through ProtecTIER Manager) Run the command line to import cartridges that are located in the Import/Export slots - this will import all aailable cartridges in these slots into the designated library REMEMBER that all cartridges in the library are in a read-only mode, Chapter 5. Planning for replication deployment 59

92 Table 5. Netbackup DR test simulation in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Restore any required data from these cartridges. To understand which catalog/db image contains cartridges that completed replication to the DR site: 1. Find the required backup image for restore. 2. Get a list of cartridges included in this image using the backup application reporting capabilities. 3. Get the time this image started and completed (such as the backup start and completion timestamp). 4. a. If there is a small number of cartridges The user can inquire through the ProtecTIER Manager cartridge iew for the last sync time of the cartridges from #2. b. If there are a large number of cartridges the user should Use the PTCLI inentory filter command for cartridges status report. SeeChapter 18, Deploying replication with specific back up applications, on page 187 for more information. 5. If all cartridges report sync time later than the image time, all cartridges are in sync with the catalog image and it can be used for restore operations. 6. If some cartridges sync time is earlier than the image start backup time, it means that these cartridges had pending data that was not replicated, but required by the catalog image, therefore this image can not be used for restore and a preious, complete image, should be used instead. 7. Scan the catalog/db for a preious image of the same data and perform steps #5 - #6 again. Before exiting DR mode, in order to moe cartridges to their original location from prior to entering the DR mode, the user will need to eject all cartridges in the library using the backup application GUI. 1. Cartridges that were on the shelf prior to entering the DR test mode will return to the DR repository shelf. 2. Cartridges that were in library A' following a isibility switch operation, will moe to the DR repository shelf and will appear in library A Import/Export slots. 60 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

93 Table 5. Netbackup DR test simulation in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Exit DR mode Run a command in the backup application to rescan the Import/Export slots in order to import cartridges that were moed back to the library as part of the eject operation in lib A' All pending replication data/cartridges will resume replicating to the DR-site automatically Utilizing isibility switch, performing clone-to-tape and recoery operation, in a multiple (two)-domain backup enironment This use case describes the option to use the isibility switch option from site 1 (primary/local) to site 2 (DR) when each site has its own backup application serer, each with its own catalog/database. The user should clone the replicated cartridges at the DR site into a physical copy for longer retention/recoery purposes. This allows for two options for the recoery operation at the DR site (both are presented): From the replicated cartridges. From the cloned cartridges (if the longer retention period is required or if both the local and DR sites are in DR mode and the only aailable repository copy is the cloned one). Since each of the sites has its own backup application catalog/db, eery recoery attempt at the DR site must be done with the respectie catalog/db replica that includes the complete set of replicated cartridges. Table 6. Netbackup isibility switch option in two domain backup enironment NBU Serer/media 1 NBU Serer/media 2 ProtecTIER 1 (local) Lib A ProtecTIER 2 (Hub/DR) Lib A' Lib B Physical library Local Site Prerequisites Create repository Create library(s) Install the Replication Manager SW module (on the local or DR ProtecTIER node) DR Site Create repository Create library(s) Create Grid A Add repository to Grid A Pair local and DR repositories Create a replication policy to select the cartridges for replication and the specific DR library for the isibility switch DR use case (with cloning to physical tape) Chapter 5. Planning for replication deployment 61

94 Table 6. Netbackup isibility switch option in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Run regular backup actiity to Lib A (to cartridges included in the replication policy) Create Vault policy related to Lib A to manage the export/import of the cartridges from/to this library (such as ault_policy_local) Run ault policy (ault_policy_local) to eject all required cartridges from Lib A All ejected cartridges will moe from lib A to repository 1 (local) shelf Create Vault policy related to Lib A' to manage the export/import of the cartridges from/to this DR library (such as ault_policy_hub_dr) Cartridges that were ejected from Lib A will automatically moe from repository 2 (DR) shelf to Lib A' Import/Export slots (pending the completion of data replication/sync for each cartridge) To moe the cartridges into the Lib A' slots, run a command to import them: ltinject <ault_policy_hub_dr>' This command needs to be run in a loop until all respectie cartridges are imported into the library Recoer the backup application with the replicated catalog (either located on one of the replicated cartridges, or receied at the DR site through other means) 62 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

95 Table 6. Netbackup isibility switch option in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Use the Rescan option of the backup application in order to learn the dimensions and scan both libraries (Lib A' and Lib B), as the backup application catalog was rebuilt as part of the recoery operation Restore any required data from these cartridges. To understand which catalog/db image contains cartridges that completed replication to the DR site: 1. Find the required backup image for restore. 2. Get a list of cartridges included in this image using the backup application reporting capabilities. 3. Get the time this image started and completed (such as the backup start and completion timestamp). 4. a. If there are a small number of cartridges The user can inquire through the ProtecTIER Manager cartridge iew for the last sync time of the cartridges from #2 b. If there are a large number of cartridges the user should use the PTCLI inentory filter command for cartridge status report. SeeChapter 18, Deploying replication with specific back up applications, on page 187 for more information. 5. If all cartridges report sync time later than the image time, all cartridges are in sync with the catalog image and it can be used for restore operations. 6. If some cartridges sync time is earlier than the image start backup time, it means that these cartridges had pending data that was not replicated, but required by the catalog image, therefore this image can not be used for restore and a preious, complete image, should be used instead. 7. Scan the catalog/db for a preious image of the same data and perform steps #5 - #6 again. Duplicate cartridges from Lib A' to physical library (Lib B) Once duplication/clone operation is completed, backup the catalog/database to cartridges located on Lib B (to be used during recoery from these cartridges) Chapter 5. Planning for replication deployment 63

96 Table 6. Netbackup isibility switch option in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Eject all cloned physical cartridges from Lib B (including the catalog backup) and sae them in a safe location for recoery purposes. The cloned (irtual) cartridges can not be left in the library, as the next isibility switch iteration will run oer the backup application catalog/database, therefore the cartridges used for cloning will be considered as scratch Once duplication/clone operation is completed, run ault policy to eject all Lib A' required cartridges using ault_policy_remote. All ejected cartridges will moe from Lib A' to the repository 2 (DR) shelf Cartridges ejected from Lib A' will moe from repository 1 (local) shelf to Lib A Import/Export slots To moe the cartridges into the library, the user needs to issue a command to import them: ltinject <ault_policy_local>' This script will need to run in a loop until all respectie cartridges are imported into the library Once cartridges are imported, they can be used for new backups Recoery at the DR site from duplicate/cloned (physical) cartridges: Eery cloned box consists of two types of cartridges: 1. The backup application catalog consistent with this complete set of cartridges. 2. The cloned cartridges. For eery box of cartridges that requires recoery, perform the following: 1. Import all cartridges from the box into library B. 2. Recoer the backup application from the catalog/database located on one of the cartridges. 3. Rescan the library dimensions. 4. Restore cartridges from Lib B to the backup serer. 5. Once Restore operation completes, eject all required cartridges from Lib B to the box. 6. Continue with the next box. 64 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

97 Table 6. Netbackup isibility switch option in two domain backup enironment (continued) NBU Serer/media 1 NBU Serer/media 2 Full/Selectie Recoery at the DR site from replicated cartridges: Recoer the backup application from the latest complete catalog/database located on one of the irtual cartridges in Lib A' Rescan the library dimensions Restore cartridges from Lib A' to NBU backup serer: 1. If selectie restore is required, scan the catalogue/db for the cartridge containing the exact file. 2. If full recoery is required, restore all required cartridges. Capacity sizing The ProtecTIER Planner is essential to accurately size and configure the disk and the file system infrastructure. Introduction to the ProtecTier Planning Tool Core to any capacity sizing and subsequent configuration effort is the ProtecTIER Planner. The primary methodologies for accurately sizing the required capacity as well as configuring the disk and file system infrastructure depend on the use of this tool. The primary function of the ProtecTIER Planner is to enable the field engineer to perform key actiities in properly sizing and configuring the physical disk systems which support the back-end of the ProtecTIER implementation. The process starts at a high leel where a general capacity sizing is performed based on key ariables within the user's enironment. Another aspect of the sizing process is to understand how many meta data and user data file systems are required based on disk technologies and RAID configurations to ensure proper performance. The ProtecTIER Performance Planner aids in this process. Additionally, the ProtecTIER Meta Data Planner enables the field engineer to understand how many meta data file systems are required to support a repository of a gien size as well as the size of these file systems. There are other utilities within the ProtecTIER Planner such as upgrading from a preious ersion (such as importing historical user data) and customizing any disk performance information based on unique user scenarios when planning performance. For more detailed information on Capacity Planning for a TS7650 Enironment, please refer to the IBM System Storage TS7650 and TS7650G with ProtecTIER RedBook: Specific considerations for replication The following factors should be considered when sizing all primary sites (spokes) and the DR site (hub) repositories: Chapter 5. Planning for replication deployment 65

98 Use similar tools used with preious ersions that are updated for R2.4 Users may replicate their entire repository to their DR site or may select a set or sets of cartridges to be replicated to the secondary/hub site Ealuate the amount of data to be replicated If all data in the primary repositories at all spokes is planned to be replicated, the DR repository (hub) should be sized at least to the sum of all spokes' local repositories If only part of the data is replicated from all or any of the spokes, the DR repository (hub) capacity can be reduced accordingly If the user DR plan calls for using the DR site hub as the primary site for backup actiity at the DR site: Ealuate the amount of data to be locally backed-up and its retention period at the DR site during a disaster while the primary site is unaailable Add this to the hub's repository size needed for replication alone to get the final capacity size needed Performance and capacity sizing Performance and capacity sizing are essential for calculating the data amount to be backed-up and the amount of data to be replicated. When conducting the system performance sizing for a ProtecTIER deployment there are seeral key factors to consider between the primary sites and the DR site. As discussed at length in the preious chapter, there are two aailable modes of operation for the replication actiity: scheduled replication (with predefined time window) and continuous replication which runs concurrently with the backup operation. In most use-cases, scheduled replication approach should be explored first as it enables the user to accurately plan for performance and it better ensures that SLAs are met. The chosen approach has major consequences on the expected or required performance rates from the system and should be discussed thoroughly with the user. The performance requirements from the system are based on calculating the amount of data to be backed-up and ingested during the backup window time frame, and the amount of data to be replicated during the replication window. As mentioned, the preferred and thus recommended strategy is to hae separate windows for backup and replication with no oerlap. In this scenario the system will perform the backup action at its peak aailable performance and then during the replication window it will delier the same max performance for the replication operation. The following are the key factors to be considered when sizing the system for all spokes and the hub for required performance: Assume that backup and replication operations require the same amount of system resources (CPU, I/Os, etc.) Sizing performance should take into account the planned mode of operation for the particular deployment: Backup and replication actiities are running during separate time windows there is a planned backup-window and then a planned replication window with no oerlap Continuous operation of replication, in this case backup and replication actiity may run concurrently on the same repository 66 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

99 Under any mode of operation, a system designed for X MB/Sec will delier this maximum throughput: Backup/restore alone will run at X MB/Sec Replication alone in a time frame mode will run at X MB/Sec, howeer, because of the rate-control function it will be [X MB/Sec 5%] while in continuous mode When continuous replication is deployed, concurrent backup/restore and replication will yield the MAX of X MB/Sec for all actiities combined Note: While running in a continuous replication mode, the system will balance priorities of backup and replication in real-time based on a special dynamic algorithm (rate-control) that in most cases prioritizes the backup operation higher than the replication actiity. Deployment planning guidelines using an example The following needs to be determined: How much data a spoke can backup daily if all spokes do the same olume of work. How much data a Hub can backup daily. Approach/configurations: Quantify a maximum configuration of Hub and Spokes to be deployed - 12 spokes and 1 hub (2 node cluster) Assume equal backup workload at all of the Spokes Assume all backup data at each spoke (entire repository) is replicated to the Hub Assume the Hub system is also performing local backups (at the DR site) Assume adequate bandwidth between all Spokes and Hub Assumptions: 8 hour backup windows (hub & spoke) 16 hour replication window All windows are aligned, meaning the 8 hour backup window is the same actual time at all 13 ProtecTIER systems (hub and spokes) Adequate bandwidth between all spokes and hub 10:1 deduplication ratio throughout the system Data change-rate at the spokes doesn't saturate the physical reception capabilities at the hub Maximum workloads assumed: Hub Backup: 8 hour backup window 3.6 TB/hr (1,000 MB/sec) 28.8 TB nominal daily backup Hub incoming Replication: 16 hour replication window 3 TB/hr (850 MB/sec) replication performance 48 TB nominal data replicated from Spokes Chapter 5. Planning for replication deployment 67

100 Spoke Backup: 8 hour backup window 48 TB for all 12 spokes =4TBdaily backup data per spoke 4 TB / 8 Hrs = 500 GB/hr or 138 MB/sec sustained for 8 hours A spoke could potentially backup 28.8 TB of nominal data but can only replicate 4 TB due to configuration constraints Primary/spoke repository sizing Use the ProtecTIER Planner tool for sizing the repository to enable the required performance. In doing so, keep in mind the maximum throughputs the configurations are capable of. The specs below are based on a realistic customer workload, assuming properly configured back-end disk arrays for the repository: A single node serer is capable of 500 MB/Sec of I/O actiity (backup and/or replication) A two node cluster is capable of: 1000 MB/Sec of I/O actiity when only backup is running. 850 MB/Sec when only replication actiity is running. 920 MB/Sec when running concurrently. The following two examples demonstrate the calculation of required performance from a ProtecTIER system under the two different scenarios: scheduled mode and continuous mode of replication operation: Example A (scheduled replication mode of operation) Backup actiity running for 10h a day, at 500MB/Sec ingest rate The replication actiity running in a separate time slot of 12h, at 500 MB/Sec The repository should be planned to support a sustained rate of 500 MB/Sec Example B (replication actiity runs in a continuous mode of operation) Backup actiity running 24x7 at an ingest rate of 400MB/Sec Replication runs concurrently and in parallel to the backup actiity at 400 MB/Sec The repository should be planned to support a sustained rate of 800 MB/Sec Bandwidth considerations per spoke are also the same as in V2.3. For example: A spoke backs up to 4 TB/night of nominal data ~400 GB new/unique data must be replicated to the hub (assuming 10:1 dedupe ratio) Assuming a 16 hour replication window: GB in 16 hours requires 7 MB/Sec physical bandwidth sustained for that time period. - This spoke will need/consume ~38% of 1 OC3 link Capacity planning for the spoke is exactly the same as in preious releases, it must consider the following: Nominal Data Data change rates Retention times Spare capacity In this example: 68 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

101 Each spoke ingests 4 TB of nominal data daily. Each spoke will need 400 GB of physical space daily for local backups (4TB at 10:1) Total daily space for 27 incrementals 11.2 TB physical (400 GB * TB) Plus 2 TB for the first "full" backup (4 TB at 2:1 compression): =12.8 TB Total Physical repository that is needed for each spoke: % spare = TB Hub repository sizing Capacity design must accommodate receipt of replication from all of its spokes, receie local backups (if applicable), and hae some spare capacity: Total hub Capacity = Physical capacity for the replication of all spokes + local backup capacity + spare capacity Critical success factors: Must assess change rate for each spoke, and then derie the physical data transferred per day Must assess the retention of both local and replicated data at the hub Total Hub capacity must account for all local backups and all replicated data, plus some spare capacity and room for growth May leae more than 10% spare capacity depending upon the number of spikes and their data types In this example: Capacity for Replication: 24 TB First full for all 12 spokes [4 * 12 / 2 (2:1 compression) = 24] Each spoke will send 400 GB new data daily (4 TB nominal at10:1) 4.8 TB new data receied daily at the hub from all spokes (400 GB * 12 Spokes) 4.8 TB * 27 incrementals = TB All together: 24 TB TB = TB Capacity for Hub local Backup: 14.5 TB for the first full backup (29 TB nominal at 2:1 compression) 2.9 TB of space daily for local backups (29 TB nominal at 10:1) 2.9 TB * 27 incrementals = TB All together: 14.5 TB TB = 92.8 TB Total space required for the hub in this example: TB Replication TB local backup + 10% spare capacity = 271 TB For the hub system repository planning the max ProtecTIER performance assumptions are: Single node serer is capable of 500 MB/Sec of incoming replication actiity and/or local backup Two node cluster is capable of: 850 MB/sec of incoming replication actiity alone 920 MB/sec when incoming replication and local backup run concurrently 1000 MB/sec when performing only local backup Chapter 5. Planning for replication deployment 69

Resering local-backup-only space for hub repository This feature proides the ability to exclusiely assign a fragment of a hub repository's capacity for local backups.

102 Resering local-backup-only space for hub repository This feature proides the ability to exclusiely assign a fragment of a hub repository's capacity for local backups. It is recommended to resere backup-only space at the hub when local backup is being planned for. It is also recommended to set the exact backup capacity needed for resered space. Since backup has priority oer replication it may cross the resered space to the replication space (assuming not all space was used/consumed). Replication does not hae permission to write in the backup-only resered space. In large deployments with many backup target repositories (i.e. "spokes") replicating to the replication and backup target (i.e. "hub"), there may occur a situation where replication is trying to occupy all the space in the hub repository. Since the assumption is that backup has precedence oer replication, this feature ensures that capacity is resered only for local backup when replication is going on, so that replication cannot be written to this storage fragment. Error notifications will appear in the eent that the capacity resered for the local backup, or the capacity resered for replication on the repository hub is running out of space. If there is no replication, backup data can stream into the repository and fill the entire storage capacity. If there is replication going on, backup data can stream into the repository and fill the entire resered space for backup, while replication data streams into the repository and fills the remainder of the storage capacity. Keep in mind that replication and backup data compete for this space. While planning the hub repository, it is essential to subtract the amount that is resered for backup-only from the total space in order to figure how much repository space is aailable for replication (or additional for backup if resered space is exceeded). Figure 16. Resere space for backup entry window 70 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

103 Setting limits - continuous backup at the Hub example Use Case I using the same example as in preious section aboe, the user had reorganized their hub site and decided that they need to hae 40 TB of daily local backup instead of 28.8 TB We will now need to back-up an additional 11.2 TB at he hub ( = 11.2 TB) Assuming we can use the entire replication window to complete the daily backup: 11.2 TB/16 hour (replication window) = 700 GB/hr That translate into 194 MB/Sec of additional nominal ingest rate needed for completing the backup during the 16 hours replication window (700 / 3600 = ~194 MB/Sec) This leaes us with 726 MB/Sec of nominal replication throughput: [~920 MB/Sec (max throughput for backup + replication at the hub) 194 = 726 MB/sec] So if we limit our hub incoming replication rate to 726 MB/sec nominally, we will be able to process all of the local backup The impact would be that we now hae only 726 MB/sec aailable for replication for the 16 hour window instead of 850MB/sec. So the hub can now receie 41.8 TB of nominal incoming replication data during the 16 hour window (726 * 3600 * 16 = 41.8 TB) This translates into 3.48 TB daily for each spoke (41.8TB / 12 spokes) Assuming entire repository is replicated at all spokes, each spoke can back-up at an aerage of 121 MB/sec during its 8 hour backup window (3.48 TB /8Hrs= 435 GB/hr or 121 MB/sec) Chapter 5. Planning for replication deployment 71

104 Figure 17. Setting limits, continuous backup at the hub example Setting limits - throttling by conseratie example Use Case II - using the same example as in preious section aboe, the user needs to lower the bandwidth consumption of all PT spokes in order to run another network consuming serice. Remoe the replication time frames so the system can replicate as much data as possible within a 24 hour period with as little bandwidth consumption as possible Assuming we still hae 28.8 TB of nominal local backup at the hub, we need to be able to ingest backup at 333 MB/sec for 24 hours (28.8TB / 24h = 333 MB/sec) Since 920 MB/sec is our max combined processing capability at the hub, than 587 MB/sec nominal is left for replication ( = 587 MB/sec) We now hae 50.7 TB of nominal capacity aailable for receiing replication at the hub (587 MB/sec * 24hr = 50.7 TB) So each spoke can replicate 4.22 TB nominal daily (50.7 TB / 12 spokes = 4.22 TB) This means we can limit each spoke to approximately 49 MB/sec nominal data replication rate (4.22TB / 24hr = ~49 MB/sec), which translate into ~5MB/sec physical (with 10:1 dedupe ratio) We can set the physical limit in any system state for each spoke to be 5 MB/sec Summary: No time frames, howeer our hub still achiee the 28.8TB of nominal local backup. 72 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

105 All spokes can replicate at ~5 MB/sec physical continuously and accomplish up to ~422 GB physical data daily. HUB Limit during backup or restore load 587 MB/Sec SPOKE Limit when no backup or restore load Limit during backup or restore load 5 MB/Sec 5 MB/Sec Figure 18. Setting limits, throttling by conseratie limit example Setting limits - limit spoke to create time frame effect Use Case III - using the same example as in preious section aboe, the user decides that they'd rather do continuous replication from one of its spokes in order to get better performance balance at that spoke: The user remoes the spoke's and the hub's replication time frames settings As illustrated in the preious example - when the user remoed the hub's timeframe, the hub was capable of receiing 50.7 TB of nominal data from the 12 spokes Throttle that one spoke to send additional 2.7 TB nominal daily (50.7 TB 48 TB (preious alue for hub replication reception capability) = 2.7 TB) Spoke now replicates 6.7 TB daily (4 TB TB) which means it needs 77.5 MB/sec nominal (6.7 TB / 24 hr = 77.5 MB/sec) for replication. Assuming 10:1 deduplication ratio, this translates into 7.8MB/sec of physical replicated data This spoke could now backup 6.7 TB nominal daily Assuming the spoke is capable of 300 MB/sec max, it can now allocate MB/sec for backup ( = MB/sec) This translates into approximately 8.36 hours of sustained backup along side replication ( MB/Sec = ~8.36 hrs) Summary: Chapter 5. Planning for replication deployment 73

106 Each one of the other spokes will do the same as it did before, and the non-time frame spoke will do 6.7 TB of nominal data daily using a 77 MB/sec nominal limit setting It would now take this spoke ~8.36 hours to backup that amount of data Figure 19. Limit spoke to create time frame effect Scheduling time frames per preious example Aoid time window conflicts when defining time frames at the hub and at the spokes: There is no synchronization mechanism to foresee misalignments, therefore if you set the hub and spokes to different time slots, replication will neer run. Make sure the hub has enough timeframe slots to accommodate all of the spokes time frames combined. In our example we need to align the timeframes of all spokes and hub to the same 16 hours to allow all data to be replicated: All spokes are set from 6am to 10pm (16 hours total). The hub is also set from 6am to 10pm (16 hours total). The 8 hours remaining on each PT are used for backup. The hub is planned to process replication from all 12 spokes within the 16 hour replication window. Then it can handle its own local backup as it is done in a separate timeframe (after replication completes). 74 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

107 Handling the backup application database This section describes the utilization of the ProtecTIER IP replication system in a backup and disaster recoery (DR) enironment, and discusses the considerations of the backup application database (or catalog) management. An oeriew of the operations of ProtecTIER IP replication in a backup/dr enironment is proided as well as a description of the arious scenarios that could occur. Operational Best Practices are also offered where applicable. Technical oeriew Figure 10 illustrates a typical Backup and DR enironment using ProtecTIER. The backup application enironment is straightforward. The backup application serer(s) are connected to storage deices (disk, real tape or irtual tape) which are used to store data backed-up from the clients it is sering. Eery action and backup set that the backup serer(s) process is recorded in the backup application database or catalog. The catalog is always at the heart of any recoery operation. Without a alid copy of the database or catalog, restoration of data is ery difficult, if not impossible. ProtecTIER proides a irtual tape interface to backup application serer(s) and allows the creation of tape storage pools the ONSITE VTAPE Pool in the example in Figure 10. The customer can also maintain another storage pool to create real tapes to take offsite, called OFFSITE_TAPE in our example. The customer has sized the ProtecTIER system to store about 30 days worth client backup files on irtual tape. The ONSITE VTAPE Pool is where most client recoeries/restores will come from. The key adantage of this architecture is that it allows restores to occur much faster as they are coming from the ProtecTIER disk based irtual tape s. real tape. Chapter 5. Planning for replication deployment 75

108 Figure 20. Typical Backup and DR enironment using ProtecTIER ProtecTIER IP replication in the backup and disaster recoery enironment ProtecTIER's IP replication function proides a powerful tool enabling customers to design robust disaster recoery architecture. Customers can now electronically ault backup data with much less network bandwidth, thus changing the paradigm of how data is taken offsite for safe keeping. ProtecTIER IP replication can eliminate some of the expensie and labor extensie handling, transporting and storing of real tapes for DR purposes. Figure 11 illustrates how ProtecTIER's IP replication functionality can be used in a backup and recoery enironment. This customer has chosen to use ProtecTIER-IP to replicate all of the irtual tapes in the ONSITE_VTAPE Pool offsite for DR purposes. The Customer also backs up their backup application database or catalog to irtual tapes. These database backup irtual tapes are also replicated to Site B. In the eent of a disaster, the Customer now has the ability to restore their backup serer enironment in Site B, which is then connected to a ProtecTIER irtual tape library which contains the backup application Database (catalog) on irtual tape as well as all of the client backup files held on irtual tapes in the ONSITE_VTAPE pool. 76 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

109 Figure 21. ProtecTIER's IP replication functionality as used in a backup & recoery enironment Backup application database replication status When designing a ProtecTIER IP replication enironment, one of the most important questions to consider is, What is the recoery point objectie? In other words, how much lag time is acceptable for a backup, written to irtual tape in Site A, to be completely replicated to and aailable in Site B? The RPO for physical tape based DR is typically 24 hours. For example, in a generic user case, backups begin at 6 p.m. on Monday eening and the tape courier picks up the box of physical tapes at 10am Tuesday morning for transport to the ault. Therefore, on a typical day, there is a 14 hour delay between the time the first eening's backup begins and when the data is safely offsite. Howeer, if a disaster occurs before the courier arries - for example, a fire destroys the entire set of Tuesday's backup tapes early Wednesday morning - the customer will recoer the applications from Monday's backup workload. Monday's workload is by default, a day behind, proiding a 24 hour RPO. With ProtecTIER IP replication, it is possible to start getting the backups offsite almost immediately, and replicate them at the same rate as they were backed-up with enough bandwidth. Because ProtecTIER is always working within the backup application paradigm, the RPO typically remains 24 hours. The improements enabled by ProtecTIER are in recoery time (restoring data rapidly from disk) and reliability of the DR operation. Chapter 5. Planning for replication deployment 77

110 The backup application database status is critical to the recoerability of the enironment in a disaster recoery situation. In a backup enironment, if a good copy database is not aailable, the time intensie task of doing a leel one and leel two import of all tapes is required. This can take hours or een days to complete before recoery can begin. Therefore, the status of the irtual tape(s) holding the backup application database is of utmost importance. Scenario One in Figure 12 illustrates a situation where the user strategy is to utilize ProtecTIER replication in a scheduled time frame mode once the backup cycle is completed. This allows the data to start replicating to the offsite location immediately following the backup window's completion. Our example is using a TSM enironment, but the concepts are applicable for all backup application enironments. In the first scenario, assuming the replication window ended at 6am, the disaster strikes at 6:15 am, 15 minutes after the last night's replication cycle was completed. The status of all irtual cartridges is 100% completely replicated when the disaster eent occurs. The DR restoration actiities in Scenario One are therefore straightforward. The user brings up the DR TSM serer in Site B using the TSM database backup that occurred at 5am Monday on Tape 5. The TSM database has knowledge of all prior tapes, and restores can begin immediately from Tapes 2-4, which hold the last client backups. * See more detail on system sizing in a preious section of this chapter. 78 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

111 Figure 22. Scenario one, replication complete Scenario Two, shown in Figure 13 - Figure 13 illustrates the second possible disaster recoery situation. This disaster eent occurs at 5:15am, shortly (45 min) before the nightly replication cycle has completed, thus, some of last night's backups hae not yet been 100% replicated to Site B. At 5:15 am, Mon. morning, tapes 4 and 5 hae not been completely replicated when the link goes down due the disaster eent. Attempts to restore the TSM database are unsuccessful as the database was not replicated in its entirety yet. The user must therefore restore from the last preious TSM DB backup, which is on tape 1 from Sunday at 5 p.m. Since tapes 2 and 3 were created after Sunday at 5 p.m., they are not in the TSM DB and cannot be used. Clients A, B and C must therefore be restored from tapes that exist in the TSM DB as of Sunday at 5 p.m. Chapter 5. Planning for replication deployment 79

112 Figure 23. Scenario two, replication incomplete Scenario Three in Figure 14 illustrates another possibility where the most recent TSM database irtual tape gets replicated, but not all associated tapes hae completed replication when the disaster eent occurs. As the figure shows, the disaster strikes at 5:30 am (30 min prior to the anticipated replication cycle completion). At this point, tapes 1, 2, 3 and 5 hae replicated 100% and tape 4, due to the size of the backup dataset stored on it, had not completely finished replication when the disaster eent occurs. The TSM serer is restored using the TSM DB held on Tape 5 (backed up at 5am Monday morning). Howeer, since Tape 4 had not completed replication when the disaster occurred, the tape must be audited and the TSM database fixed to represent exactly what is on the tape. The TSM command would be: # audit olume 4 fix=yes. The audit/fix would need to be performed for eery tape that had not been fully replicated when the disaster struck. 80 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

113 Figure 24. Scenario three, replication incomplete, DB complete Reclamation considerations for TSM enironments Another important consideration when using ProtecTIER IP Replication in a TSM enironment is the effect that Reclamation will hae. Reclamation is the TSM process that moes expired data off of tapes and moes any unexpired data to other tapes, thus freeing up space and returning empty tapes to the scratch pool. All reclamation tape moement is recorded in the TSM DB and must be replicated to reflect an accurate tape enironment in the eent of a disaster. If the disaster strikes while reclamation is running and the DB backup aailable for restore does not reflect the data on tape moements, an audit/fix of the olumes may be required. Therefore, TSM can be programmed to include a delay on when reclaimed tapes are reused. Setting this parameter to 2 or 3 days assures that een if reclaimed tapes hae been replicated, they can still be accessed for DR purposes. See chapter 7 of this document for more details on TSM implementation with ProtecTIER Replication Summary 1. Assure catalog/database backups are performed to irtual tape and replicated along with the daily workload each day. A database backup should be performed and replicated at the end of each backup cycle. 2. A separate tape pool should be created for database backups. Chapter 5. Planning for replication deployment 81

114 Deploying natie replication 3. Consider adjusting TSM reclamation processing to assure actions are in sync with the replicated database. This includes setting the REUSEDELAY to proide a 2 or 3 day delay in the reuse of tapes that hae been reclaimed. 4. Consider deploying additional network bandwidth to assure a replication back-log does not occur. Database or catalog synchronization becomes a greater issue if the back-log exceeds one backup cycle (typically hours). This section describes the processes and information concerning deploying natie replication. Adding replication to an existing production system This section describes the options for upgrading an existing TS7650 ProtecTIER to a system that will use the replication feature to send data to a DR site ProtecTIER system. The specific user scenario will dictate the right approach to upgrading the primary system and deploying the target system. An important detail to keep in mind here in the planning process is that all of the nominal data associated with the initial cartridges will physically need to be replicated (there will be no deduplication benefit on the bandwidth during these initial replication jobs). Also, the full contents of each cartridge associated with the replication policy must be replicated, een if only a fraction of the cartridge was used in the latest backup job. For example, cartridge AB1234 is used during the first backup window following the deployment of replication, and 10GB is written on it. Howeer, it also holds 190GB from preious backups. All 200GB will need to be replicated during the first replication window for that cartridge. For this reason the strong recommendation to a new user of ProtecTIER replication, is to start fresh with new cartridges for all the jobs that are associated with the newly created replication policies, so no preiously backed-up data will need to be physically replicated as new data is appended to an existing cartridge. A deployment of a second ProtecTIER at a secondary, DR site, suggests a significant implication on the planning cycle - the first replication jobs will consume much more bandwidth than required once deduplication takes effect. So when preparing for deployment the Field Engineer need to help the user's team plan and prepare the infrastructure and resources that are temporarily needed to support this. The plan will need to take into account the amount of physical data to be replicated, the amount of dedicated bandwidth, and the extra time needed for these first seeral replication runs. The user may need to allow the first replication job to complete before the next backup actiity begins. Note: A dedicated ProtecTIER.2.4 Software Upgrade and Replication Enablement Guide is also aailable. Before deploying the secondary (DR) system The first step prior to deploying the target ProtecTIER system is to confirm the bandwidth needed (or conseratie estimate of it) with the user. Work with the user's network team to obtain IP addresses, port connections, and additional network cabling, if needed. Verify that the network connections are clean and proide adequate speed. Use the Network Validation Utility aailable with the ProtecTIER system to erify the amount of bandwidth and quality of the aailable user's network (see more details on this tool earlier in this chapter). 82 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

115 The second step is to determine the approach to repository syncing between the local and the new DR site systems that the user will take. As mentioned, the key to a successful deployment here is a gradual replication load approach. The most efficient way to deploy replication with an existing operational ProtecTIER system requires a load-throttle that introduces workload into the replication operation (policies) while staying under the aailable network bandwidth ceiling assigned between the two sites, until steady state is reached. This would be the only feasible and practical method to bring the secondary system to the point that it will be fully synced with the primary system while utilizing the aailable network bandwidth that was planned for the steady state once the deduplication factor takes effect. For example, if the first backup set to be replicated is 6TB (full nominal), and the replication window is 10 hours, then the amount of bandwidth that the user will need to allot will be: 6TB/10hours = 600GB/hour for the replication to be completed in the 10 hour window. Later on, once the target and the primary systems are completely in sync, the deduplication factor takes effect, and assuming a 10% data change rate for that same job, the network bandwidth needed will be: [(6000GB * 10%) + (0.5% * 6000GB)] / 10hours = 63GB/hour. That means that for the initial period of time the user needs to allot enough network bandwidth to account for the full nominal size of the data to be replicated, thus creating replication policies with a gradual increase approach should be taken in order to stay within the aailable network bandwidth boundaries and within the replication window time frame. Upgrade the existing system Note: More details can be found in theibm Systems Storage with ProtecTIER User's Guide and in the IBM Systems Storage with ProtecTIER.2.4 Software Upgrade and Replication Enablement Guide. Once the sync approach is chosen and planned for, follow this step-by-step instruction set, which describes how to upgrade an existing ProtecTIER system to V2.4 and then begin replicating to a target system: 1. Install target TS7650, either locally for the initial sync, or at the target location (depending on the chosen approach and aailable network bandwidth). 2. Upgrade Primary system to ProtecTIER R Add additional GigEx2 Card, if existing node(s) is a DD1. 4. The installation wizard will confirm primary system has enough meta data space (all appliances are already configured with it). If not it will add it automatically. After this is done, it should be possible to complete all other actiity while the primary system is still in use. 5. Install Replication Manager on one of the TS7650 systems, either the primary or the DR one. If an existing Replication Manager is installed in the user's enironment already from a preious deployment, it may be used and there is no need to install a new one. 6. Create a new replication grid and add the primary TS7650 and the target TS7650 systems to this grid. If there is already a grid for another system in the enironment, the existing grid should be used rather than creating a new grid. 7. Pair the primary TS7650 system with the Remote TS7650 system. Chapter 5. Planning for replication deployment 83

116 8. Create the replication policy. This step defines which of the tapes that were preiously created need to be replicated. In most cases, the user will probably begin replicating new tapes, ersus all the tapes that were preiously written. Definition of when replication window, daily/weekly Definition of what cartridges or set of barcodes to be replicated In the initial installation of the primary system, planning for the use of separate tape pools (or irtual libraries) for replication will be beneficial when it comes time to upgrade the system to support the replication feature. For example: one tape pool for tapes that do not need to be replicated and another tape pool for those that do need to be replicated. This will make the creation of the replication policies easier. If the isibility control switch function will be used, the user should create scripts to automate the ejection of irtual tapes identified as those to be replicated. This must be done for both local and DR systems. See Chapter 2 of the IBM System Storage ProtecTIER User Guide for Enterprise Edition and Appliance Edition, IBM form number GC for more details. Note: During the upgrade process, before the replication operation commences, and if at all feasible, that is, both the primary and the DR system are installed in their locations prior to starting the sync process, run the ProtecTIER Replication Network Performance Validation Utility as explained in a separate chapter in this document. A step-by-step process to enable replication on an operational node: This procedure is applicable to both the TS7650 Appliance and TS7650G serers. To add replication to a node that is part of a two-node cluster, the following procedure must be run on each node. To add replication on a node: 1. Connect or erify that a USB keyboard and display are connected to the serer. 2. When the localhost login: prompt appears, login as root and type the password admin. 3. Set the user to ptadmin by typing the command su - ptadmin (the default password is ptadmin). 4. Change the directories to the /opt/dtc/install directory. From the command line, run the following command: cd /opt/dtc/install 5. Run the following command to change the Ethernet port assignments of the serer. You should see eth3 and eth4 identified as the Ethernet ports to be used for replication:./ptconfig -addreplication <enter> 6. At the prompt, type yes <enter> to continue. 7. When prompted, enter the IP address, netmask, and hostname for the first replication port. 8. When prompted, enter the IP address, netmask, and hostname for the second replication port. 9. The tfd serice is restarted and the procedure ends. 84 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

117 Planning for a new installation This section outlines the process of installing new ProtecTIER Gateways or Appliances at mulitiple locations (source spokes and target hub) and configuring replication between them. Prior to the installation, capacity, performance, and bandwidth planning must be complete. Refer to the Introduction and Planning Guide as well as the Bandwidth Sizing and Requirements section in Part 1 Chapter 2 of this document for these respectie guidelines. The most efficient way to deploy multiple new ProtecTIER systems utilizing replication between them requires a load-throttle that introduces workload into the replication operation (policies) while staying under the aailable network bandwidth ceiling assigned by the user between the two sites, until steady state is reached. This would be the only feasible and practical method to bring the DR system (hub) to the point that it will be fully synched with the primary system while utilizing the aailable network bandwidth that was planned for the steady state once the deduplication factor takes effect. The key here as mentioned in the preious section, is to Introduce workload by backup policies such that nominal data stays under the bandwidth ceiling aailable between the sites until the secondary system will be fully primed. Pre-replication installation sequence, source and target sites The following installations steps are to be performed at all source and target locations. The order in which the sites are installed is flexible; they can be installed in parallel or sequentially, one after the other. The only obligation is that all steps be completed in the order they are presented, at all sites. It is also mandatory that all general ProtecTIER deployment planning guidelines be met, including but not limited to: appropriate floor space, power, cooling, cable runs, IP addresses, hostname assignments, completed backend disk array setup aligned to the Implementation capacity planning recommendations. For a complete list, see the Introduction and Planning Guide. 1. Install the TS7650 hardware See Chapter 4 of the Gateway Installation Roadmap Guide for Gateway hardware installation instructions can be found. See Chapters 3-4 of the Appliance Installation Roadmap Guide for Appliance hardware installation instructions. 2. Install the TSSC hardware The TSSC code leel used for a ProtecTIER V2.4 installation MUST be: If the TSSC ersion is lower than , the TSSC OS must be re-imaged using the procedure described in Chapter 4 of the Software Upgrade and Replication Enablement Guide. See Chapter 6 of the Gateway Installation Roadmap Guide. See Chapter 5 of the Appliance Installation Roadmap Guide for more information on the Appliance. 3. Configure the RAS package on the ProtecTIER serer(s) The RAS code leel used for all ProtecTIER V2.4 installations must be M E9826. See Chapter 7 of the Gateway Installation Roadmap Guide for more information on the Gateway. Chapter 5. Planning for replication deployment 85

118 See Chapter 6 of the Appliance Installation Roadmap Guide for more information on the Appliance. 4. Perform the RAS erification to alidate that the ProtecTIER node(s) established communication with the TSSC, and test call-home functionality See Chapter 8 of the Gateway Installation Roadmap Guide for more information on the Gateway. See Chapter 7 of the Gateway Installation Roadmap Guide for information on the Appliance. 5. Configure the first/single ProtecTIER node ia ptconfig install and create the file systems (Gateway only). See Chapter 9 of the Gateway Installation Roadmap Guide. 6. Install ProtecTIER Manager on a network-accessible workstation/pc 7. Use ProtecTIER Manager to build the ProtecTIER repository (Gateway only) See Chapters 11 and 12 of the Installation Roadmap Guide for more information. 8. Upgrade ProtecTIER to a dual-node cluster (if installing two nodes) from the second node using ptconfig install and alidate the cluster installation with ptconfig alidate (Gateway only). See Chapters 13 and 14 of the Gateway Installation Roadmap Guide. 9. Configure the replication IP addresses on all ProtecTIER nodes using ptconfig updatereplicationip. Detailed instructions and examples are listed in Chapter 2 of the ProtecTIER User Guide: Adding Replication on a node. 10. Configure the replication networks static routes on all nodes using ptconfig staticroutes. Static routes are a simple and effectie way of instructing the ProtecTIER IP stack how to route replication IP traffic destined for remote/undefined subnets. This is necessary when the source and target sites don't share the same network subnet, which is typically the case. Without static routes, the ProtecTIER nodes will not be able to communicate cross-site, een if physically the Ethernet endpoint-to-endpoint link exists. See Chapter 2 of the ProtecTIER User Guide: Routing replication IP traffic for detailed guidelines on how to define the static routes. 11. Validate that the static routes are properly configured by erifying successful ping to each cross-site node from each local-site node. 12. Perform the replication network performance alidation on all replication subnets to measure the aailable throughput (assuming it hasn't already been done). See Appendix E: ProtecTIER Replication Network Performance Validation Utility of the Introduction and Planning Guide for more details. This concludes the Pre-replication Installation Instructions. At this point, both sites should hae a working repository, with alidated replication network link(s) between the source and destination sites. Creating the replication grid, target (hub) site only The following installation steps describe the process of installing Replication Manager and configuring the replication grid. The Replication Manager must only 86 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

be installed on one node at one site. Although the Replication Manager can be installed on a source-site node, it is the best practice to install it on a target-site node (hub). 1.

119 be installed on one node at one site. Although the Replication Manager can be installed on a source-site node, it is the best practice to install it on a target-site node (hub). 1. At the target site, install and actiate the Replication Manager software on the first node of the cluster. For more detail on the Replication Manager Installation, see Chapter 10 of the Installation Roadmap Guide. Note: During the installation, when prompted for the Replication Manager IP addresses enter the local node's eth3 and eth4 IP addresses. 2. Log into the Replication Manager (using the eth0 IP address of that node) using ProtecTIER Manager. Note: The default Replication Manager user name and password is gmadmin. Figure 25. Replication Manager screen 3. Create a new Grid. Chapter 5. Planning for replication deployment 87

NOT customer IP addresses (eth0) Leae the port numbers at their default alues and use the ptadmin user account for log in information.

120 Figure 26. Creating a new Grid Screen 4. Add the source repository to the Grid. Note: IMPORTANT - For the replication IP address, you must enter one of the replication IP addresses (eth3 or eth4) of any source repository node, NOT customer IP addresses (eth0) Leae the port numbers at their default alues and use the ptadmin user account for log in information. Note: This step only has to be done once per repository; if dealing with a dual-node cluster, pick only one. Figure 27. Adding a repository to the Grid 88 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

5. Repeat the preious step for the target repository. 6. Once both repositories are added to the Grid, connect the spokes to a hub under the Repository menu of the Grid Management View. Figure 28.

121 5. Repeat the preious step for the target repository. 6. Once both repositories are added to the Grid, connect the spokes to a hub under the Repository menu of the Grid Management View. Figure 28. Connect spokes to a hub When this operation completes, the replication pair configuration is complete and the Grid configuration is complete. End-user replication configuration and settings, source sites (spoke) only The third phase of the installation process pertains to the actual replication settings that impact the end-user. All steps described here can only be performed at the Source repository. Of course you can physically perform these tasks remotely (such as from the target site) assuming you hae access to the source system's repository ia ProtecTIER Manager. 1. Set Bandwidth throttle If the customer wants to limit ProtecTIER's network consumption either (or both) of the replication network links, you can throttle outgoing traffic (such as replication interfaces only at the source site) ia ptconfig bandwidthlimit. See the Bandwidth Throttling section in Chapter 5, Planning for replication deployment, on page 37 for more information. 2. Define the replication time frame through ProtecTIER Manager Continuous or dedicated window. Chapter 5. Planning for replication deployment 89

122 See the Choosing a replication mode of operation - continuous s. scheduled section in Chapter 5, Planning for replication deployment, on page 37 for more information. 3. Configure replication policies. Before creating replication policies you will need to hae a library with cartridges. Consider creating a test library in order to make effectie use of a replication policy for testing. See Chapter 6 in the IBM System Storage with ProtecTIER User Guide for details on replication policies. Options for syncing the primary and DR repositories In all cases, following the installation of the DR site, it will be necessary to conduct a sync process of the primary to the secondary system. There are two methods that can be utilized and both focus on gradually adding workload to the replication policies oer time while keeping in mind that the network has the amount of bandwidth that was planned to support the steady state, which will take effect once the deduplication factor kicks in: Gradual management of policies oer time This is the preferred method, whether the user is deploying a new system or adding replication to an existing system. In this method, the user will add new replication policies oer time, while manually ensuring that the total daily olume of replicated data remains within the bandwidth limit. Gien that eery new replication policy will send the full nominal olume each day, the user can calculate the total bandwidth consumption and stay within the limit. The following steps proide the oeriew of this process: 1. Following the completion of backup, manually select tapes that are used in a backup policy intended for replication, and execute the replication policy. 2. Following the replication completion of the preious replication policies, perform the following within the backup application: a. delete scratch tapes b. create new barcodes c. point backup policies to the newly defined tape pool d. repeat a-c until the secondary system is fully primed Priming the DR repository at a common locality with the primary system The option of priming the DR system at a primary site first and then moing it to its DR location has limited practical alue, and should not be considered as a first choice or at all in a multi-site deployment. If that approach is taken howeer, the user will hae to manage the synchronization process once again when the systems are placed in their final location. Gien this procedure might introduce weeks of catch-up; the aforementioned recommended approaches should be adopted as the best practice methods to successfully deploy a secondary/dr system. Howeer, if syncing a full, partial, or een a newly started repository, either the user should hae enough network bandwidth allotted to allow for the primary and secondary systems to sync within the aailable time frame, or the target system could be installed first at the primary location, and the full duration of the sync process must be scheduled locally there while the system are connected locally to allow the max throughput between the serers. 90 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

123 Chapter 6. Replication solution optimization Performance General replication performance considerations A single node ProtecTIER system has 2 x 1 Gig dedicated replication Ethernet ports: Can delier a maximum throughput of 190 MB/Sec in physical data-transfer terms Deliers up to 500 MB/Sec replication throughput of nominal data (amount of data before deduplication) A dual-node clustered system has 4 x 1Gig dedicated replication Ethernet ports: Can delier a maximum throughput of 380 MB/Sec in physical data-transfer terms Deliers up to 850 MB/Sec replication throughput of nominal data (amount of data before deduplication) Increasing utilization of the ProtecTIER system with replication feature The following guidelines are aimed at increasing utilization of the ProtecTIER system specifically while implementing the replication feature: The system will be at maximum replication throughput when it has 128 GB or more of nominal data in the replication queue. Note: When there is no other I/O operation consuming resources and the operation runs both backup and replication actiities concurrently, the peak throughput will be the max of the ProtecTIER serer capabilities. When creating a library the user has a default of 8 Import/Export slots. If the user would like to utilize the isibility control feature, or implement a DR strategy that requires a larger number of cartridges to be ejected, a low Import/Export slots count will slow the oerall process down. To remedy this, the user should allocate more than 8 Import/Export slots for DR purposes (max alue is 1022 per library, 4096 per repository). When moing replicated cartridges from the irtual shelf to a library, the backup application needs to scan and find the new cartridges in the Import/Export slots. This is a time consuming action. Alternatiely, the user can create additional empty slots (for example: when creating the library chose X cartridges and X+100 slots), the slots location will be already known to the backup application and therefore will reduce the time to search for it when needed. While in continuous replication mode, and when backup actiity is running concurrently with replication, the backup and the replication operations will share all of the system resources. As a default, when running scheduled replication in its own time window, the throughput is slightly better compared to running in the continuous replication mode that has the Rate-Control actie when the system is IDLE. This is due to the fact that in the scheduled replication mode of operation, Copyright IBM Corp. 2008,

124 the assumption is that there will be no backup actiity during that time window. The Rate-Control mechanism, which takes affect specifically when using continuous replication mode, allowing the replication actiity to gracefully tune itself down (throttle) when backup actiity starts in parallel. Automation of daily operation This chapter outlines the operational concepts of ProtecTIER with Natie Replication in a typical users' enironment and introduces the aailable options to automate it. The focus of this discussion is to suggest ways to automate the moement of irtual cartridges from the primary site to the secondary, DR, site, in order to clone it to physical tape for archiing and longer term retention. In any users' enironment, the backup application manages the inject (import) of cartridges into a library. Cartridges that were moed from the irtual shelf into a VTL Import/Export slot will reside there until moed again by the backup application into a specific library. In a typical ProtecTIER replication enironment that is designed to support a basic DR strategy with no cloning to physical tape, cartridges will be replicated by the system from the primary site to the DR site irtual shelf and will stay there unseen by the backup application, until they are manually moed by the user, under a DR scenario or DR test scenarios, from the shelf to the Import/Export slot of the DR site library. At that time the DR site backup application serer can take oer the repository and enable the user to restore data, or continue their backup operation at the secondary site. Alternatiely, if the users' operation calls for creating physical tape copies at the secondary site (outside of the main production enironment), the user can utilize ProtecTIER's isibility control switch when creating the replication policies. In this scenario, once replicated, cartridges will be automatically moed from the DR site irtual shelf into a designated specific library's Import/Export slots, and the DR site backup application serer will be made aware of it. This process can be automated from within the backup application enironment, especially when it enables a single domain enironment (shared catalog across sites). When using this approach, there are some conceptual topics to remember: The inject command is backup application dependent and is identical to the way physical cartridges are being injected to the library from the Import/Export slots The number of cartridges in a irtual library can be significantly higher than in a physical library. If a large number of cartridges are requested to be imported to a library, it is recommended to script the inject command to ensure cartridges are inserted to the library while freeing the Import/Export slots for other cartridges As mentioned, the automation process of moing cartridges between sites and performing clone-to-physical-tape operation at the secondary site is more suitable in single-domain backup application enironments. Some of the major backup applications such as Netbackup, Legato, and BRMS allow for this type of enironment. Note: More details and tips on using the isibility control switch are proided in Chapters 2 and 6 of this document. The following is an example of a possible automation opportunity within a Netbackup backup application enironment: 1. Example for eject/inject command for Symantec Netbackup: Using ault Profile 92 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

125 1. Create ault profile for ejecting cartridges from a library. 2. Eject cartridges from the library using ault profile: ltrun <ault profile name> 3. Inject cartridge at DR site to the library ltinject <ault profile name> Using eject/inject commands 1. Eject cartridges from the library: a. mchange res multi_eject -ml... <barcodea:barcodeb:..:barcodez> 2. Inject cartridges to a library a. mchange res multi_inject... ml <barcodea:barcodeb:..:barcodez> Using inentory command to inject cartridges: 1. This will scan the Imp/Exp slots and will inject all aailable cartridges to the library a. mupdate rt <robot type> -m <robot #>-empty_map 2. Scripting the inject commands: Vault and inject/eject commands can be scripted to run periodically on the backup application host This will trigger automatic cartridge moement from import/export to the library wheneer the releant cartridge is located in the import/export slot, preenting running out of free import/export slots Scripting the inentory command is not recommended as it scans the robot, and therefore may take long time to complete on libraries with large numbers of cartridges. Example for ault script 2. #!/bin/csh 3. while (1) 4. ltinject myvault 5. sleep end Chapter 6. Replication solution optimization 93

126 94 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

127 Chapter 7. Recoery management The Dirty bit attribute (in-sync), should help to understand consistency points during DR. Dirty bit feature Dirty bit attribute (in-sync) should help to understand consistency points during DR. When Dirty bit is off (check mark) for a cartridge at the Hub, this cartridge is fully synchronized with the spoke. During DR this means that the cartridge is fully synched, not only at the consistency point, but also after the last replication took place (the two might be the same). When a cartridge is out of sync during DR time, we need to explore/understand the time that the cartridge was fully synched in order to determine, which consistency point it adheres to. In order to understand all the aboe use-cases and best utilize the dirty bit (in-sync) attribute, the user can take adantage of some new ptcli's commands. Note: The following is an upgrade - all the known info on the cartridge (aailable in V2.3) will still be displayed, howeer the dirty bit "In-Sync" column will be empty until there will be an actiity following the V2.4 upgrade (backup and/or replication), then the in-sync will be populated accordingly (screenshot in next slide). The following figure shows a system that was just upgraded from V2.3 to V2.4. Some new actiity already occurred following the upgrade and the status of cartridges with a character under In-Sync (either X or check mark) reflect that actiity. Copyright IBM Corp. 2008,

128 Figure 29. Shelf cartridges status iew at the hub PTCLI This section describes how to best configure and monitor the ProtecTIER serer (especially during a DR situation) through the command line interface (CLI) using ptcli. The ptcli is loaded during the installation of PT software and PT Manager software. 96 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

129 Creating a profile It is necessary to create a profile prior to accessing ptcli. 1. Actiate ptcli with p followed by a file name with full path. 2. Once prompted, enter the desired username and password. This step created the user-specified file with the user name and password that is needed for login. Example: Ptcli p h:\ptcli\ptuser User name: Ptuser Password: <?xml ersion= 1.0 encoding= UTF-8? <response command= createprofile status= success /> Usage Ptcli can be used to do the following: Configure ProtecTIER (including configuration of a PT repository and configuration of PT VT libraries) Monitor ProtecTIER (including statistics of PT VT and statistics about PT repository). Snapshot and filter PT VT cartridges (mostly used for DR scenarios). To run the ptcli on a PT node, moe into the following directory: /opt/dtc/ptmanager. To run the ptcli on a host running PTManager, moe into the PTManager directory (in Windows: C:\Program Files\IBM\ProtecTIER Manager). Queries This section describes what a query is and how to use it in the ProtecTIER serer command line interface (CLI). A query is a statement in which the user can put one of the following: White spaces Numbers Tokens (and/or/not/is/in/between) String literals (within single quotes): AB0000' Boolean (TRUE/FALSE/true/false) Column names as defined by the column names of the query type Date: the date format in a query is as follows: datetime( :23:00) Results can be saed in a.cs file using the output command switch. The.cs file can be used as an input to a CLI moe command. This.cs file can be partially edited by the user by remoing lines (each line represents a cartridge). Chapter 7. Recoery management 97

130 Users can also create their own barcodes file, and use this file as an input to a moe command. Cartridges set and query type A filter is always working on the specified set of cartridges: All: All cartridges in the ProtecTIER repository. Replica: All cartridges that were replicated into this repository. Origin: All cartridges that were replicated from this repository. The set of cartridges is stated in the CLI command as querytype'. So, the query type can be: all, replica or origin. Inentory command options for DR scenario The Inentory command options are used to filter cartridges in a ProtecTIER repository using a ariety of criteria. Also, these options can be used to moe cartridges that match a certain criteria. Before beginning to filter the cartridges, and/or moing cartridges using the CLI, you must first create a snapshot of the cartridges using the InentoryRefresh command. The snapshot will include the most updated properties of the cartridges at the time it is created. Note: Any filter/moe operation is executed using the snapshot's contents. Running such a command without a preious refresh is considered an error. Also, for larger repositories, a refresh operation may take considerable time and reduce ProtecTIER's performance during that time. Moing a cartridge using the CLI may fail if the snapshot is not up to date for the cartridge (for instance, if the cartridge is moed or deleted after a snapshot is taken). Operations on libraries for which the snapshot is not up to date may hae undesirable consequences. Disaster recoery with ptcli It is recommended to use the ptcli in order: To figure out the consistency point at the hub, when all cartridges from a specific spoke were fully synched (replicated). For automatic bulk moement of cartridges from shelf to library and back. To figure out the new cartridges that were created at the hub during DR mode, and bulk moe them to a shelf so they can be replicated back to the spoke. Keep in mind when using ptcli commands outside of DR mode: The ptcli snapshot may take up to 15 minutes to be created and populated. The snapshot is a static one, therefore it reflect the state of all cartridges only at the point in time that it was taken. A DR scenario Assuming a disaster occurs at a spoke while replication was running: A DR condition for the specific spoke is declared at the hub The user turns to the DR site (the hub) with the goal of determining the last full backup so its data can be recoered The user can use PTCLI to sort through the repository and determine which cartridges were in-sync at the time of the disaster. 98 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

131 Disaster recoery operations The user can use PTCLI to determine which cartridges were not in-sync at the time the last full backup at the spoke was completed The user decides which cartridges to use at the DR site and uses PTCLI to moe them (all or some) from the shelf to the library. Latest PTCLI automates numerous operations that the user had to manually perform in V2.3: Results can be saed in a cs file using the output command switch. The result cs file can be used as an input to a CLI moe command. The result cs file can be edited, user can remoe lines (each line represents a cartridge). Users can also create their own barcodes file, and to use this as an input to a moe command. Useful DR queries 1. Creating a snapshot the first thing to do before we should run the other queries./ptcli InentoryRefresh --ip xxx.xxx.xxx.xxx login file 2. Getting all in-sync cartridges./ptcli InentoryFilter --ip xxx.xxx.xxx.xxx --querytype replica --query in_sync = true login file output /tmp/not_dirty_carts 3. Getting all not in-sync cartridges ( dirty')./ptcli InentoryFilter --ip xxx.xxx.xxx.xxx --querytype replica query in_sync = false login file output /tmp/dirty_carts 4. Getting all cartridges synced with destination at a certain time range on source:./ptcli InentoryFilter --ip xxx.xxx.xxx.xxx -- querytype replica query source_time_for_last_sync_point > datetime( :00:00') login file 5. Getting all cartridges replicated to repository 18 in grid 1:./ptcli InentoryFilter --ip xxx.xxx.xxx.xxx --querytype origin query destination_repository_id = 18 and destination_grid_id = 1 login file 6. Getting all cartridges in barcode range BR0300 and BR0330:./ptcli InentoryFilter --ip xxx.xxx.xxx.xxx --querytype all query barcode > BR0300 and barcode < BR0330 login file output barcodes_file 7. Moing all in_sync cartridges to the shelf:./ptcli InentoryMoeFilter --ip xxx.xxx.xxx.xxx --querytype replica --query in_sync = true destination shelf login file Disaster recoery is the process of recoering production site data in a DR location which was the target for the replication operation from the primary site prior to the disaster occurrence. In case of a disaster or a situation where the production (or primary) site has gone offline, the Hub, or DR site, can take the place of the production site until the primary site comes back online. When the primary site comes back online, preiously replicated as well as newly created tapes can be moed to the main production site using the failback process so it can once again become the production/primary site for the user. Chapter 7. Recoery management 99

Once the user enters the ProtecTIER Disaster Recoery (DR) mode, all incoming replication and isibility switching actiities from the failed production site to the DR site (hub) are blocked.

If the user has a situation where they must stop using the primary repository, such as the repository is down or has been destroyed, and they want to use the DR repository as their backup target, the

132 Once the user enters the ProtecTIER Disaster Recoery (DR) mode, all incoming replication and isibility switching actiities from the failed production site to the DR site (hub) are blocked. When that primary site is rebuilt or replaced, the user can then return the data to that primary site and continue the backup and replication operations. If the user has a situation where they must stop using the primary repository, such as the repository is down or has been destroyed, and they want to use the DR repository as their backup target, the user must enter the DR mode at the DR repository to start working with the DR site as their primary site. This chapter summarizes the different stages in handling a disaster scenario: 1. Moing to ProtecTIER DR mode The main consideration is to keep the moing-to-dr mode ery simple for both the user and the operation in order to aoid potential complications and human errors when handling an emergency situation. The user should open the special DR wizard in ProtecTIER Manager at the DR site, and initiate the DR mode procedure that will block all replication actiities (including isibility switch) at the DR site from the particular, failed, primary site (as well as at that primary site if still up in case of a DR test). Note that the key processes of recoering the backup application catalog and recoering data from tapes are coered in detail in the subsequent chapters. Figure 30. Moing into ProtecTIER DR-mode 2. Working at the DR site 100 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

133 Replicated cartridges at the DR (hub) repository that were created at a primary site (spoke) are in read only mode. If the user needs to run local backups at the DR site, they hae to use new, locally created cartridges. 3. Failback for all data at the DR site The purpose is to allow the user to return to the original working mode at the primary site. The DR site needs to be able to use a One Time policy that holds all cartridges to be replicated to the Primary site. This special policy can also transfer principality attribute of the releant cartridges that were created at the DR site while the primary site was down, to the Primary repository or to its substitute in case the original primary is dead. Note that the Failback procedure can only be initiated under DR mode. Figure 31. Entering ProtecTIER Replication Failback wizard Principality Principality is the priilege to write to a cartridge (set it to R/W mode). The principality of each cartridge belongs only to one repository in the grid. By default the principality belongs to the repository where the cartridge was created. The cartridge information file includes principality Repository ID field. Principality can be transferred from one repository to another during the failback process. Principality of a cartridge can be transferred only in one of 3 cases: 1. Principality belongs to the DR repository. Chapter 7. Recoery management 101

134 2. Principality belongs to the original primary repository and this site is the destination for the failback. 3. Principality belongs to the original primary repository but a. The original primary repository is out of the replication grid b. The target for the failback is a repository that was defined as a replacement repository through the ProtecTIER repository replacement procedure (see below) Repository replacement A repository replacement is utilized when you want to failback to a different or rebuilt repository. If the user wants to failback to a different or a rebuilt primary repository then the following procedure is needed: 1. The user cancels the pairing of the original repositories in the Replication Manager. 2. The user takes the original primary repository out of the replication grid. 3. If a new repository replaces the original one then the new repository has to be installed and join the replication grid. 4. If it is an existing repository, it has to be out of a replication pair. 5. The user runs the ProtecTIER repository replacement wizard and specifies which is replacing which. Returning to normal operations Once a disaster situation ends, the primary site is either back on-line or rebuilt and now has an empty repository. Failback is the procedure for replicating back updated cartridges, new or old, from the DR site, to the original (or restored) production site to bring it up-to-date in case it was down, or lost and rebuilt. If the primary repository was down and has been restored, you can return to normal operation at the production site as the primary site and use the DR site as the DR site, or secondary site, again. Define a failback policy on the DR repository and select all the cartridges that were used for backup during the time the primary repository was down. The failback policy procedure will transfer the principality of all the cartridges that belonged to the temporary primary repository at the DR site to the restored primary repository at the production site. The following shows the procedure to initiate the failback process using the ProtecTIER Failback wizard: 1. The user should define a policy with all the cartridges that need to be transferred. 2. The ProtecTIER Manager/tfd creates a policy in a failback mode and with transfer principality option. This policy can be executed only manually and the system log will ignore run-time eents for this policy. 3. The user should approe the execution of the policy; it is translated to manual execution of the policy in the VTL. 4. Cartridges will be replicated only if they follow the principality rules described earlier. 102 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

5. Before initiating replication, cartridges will be ejected out of the library to the shelf at the DR site. 6. At this point the user can close the failback wizard.

8. ProtecTIER Manager will present this information to the user, thus the user will be able to conclude when the failback process is complete. 9.

135 5. Before initiating replication, cartridges will be ejected out of the library to the shelf at the DR site. 6. At this point the user can close the failback wizard. Note: The user cannot do any editing action on any failback policy through the normal iew. 7. The system will supply and monitor information about the number of pending, running and completed objects. 8. ProtecTIER Manager will present this information to the user, thus the user will be able to conclude when the failback process is complete. 9. The user is expected to delete the policy once the failback objects are replicated. For more details check chapter 13 and Appendix A of the IBM System Storage ProtecTIER User Guide for Enterprise Edition and Appliance Edition, IBM form number GC Figure 32. Leaing ProtecTIER DR mode Taking oer a destroyed repository Cartridge ownership, or principality, is the priilege to write to a cartridge (i.e. to set it to R/W mode). The principality of each cartridge belongs only to one repository in the grid, and, by default, principality belongs to the repository in which the cartridge was created. Cartridge ownership takeoer is used to allow the local repository, or hub, to take control of cartridges belonging to a destroyed repository. The repository can only take ownership of a cartridge if the repository is defined on the Replication Manager as the replacement of the destroyed repository. Taking ownership of the cartridges on a destroyed repository will allow the user to write on the cartridges preiously belonging to the replaced (destroyed) repository. Chapter 7. Recoery management 103

136 Flushing the replication backlog In case of an unscheduled long network or DR site outage, replication backlog may become too large for the system to catch up. A prolonging replication backlog may be an indication that there is not enough aailable bandwidth allocated for the replication operation. Remember that for the system to support the organization set SLAs, enough bandwidth should be planned and allotted for replication during the time frame that it runs (replication window) so all the policies will be executed in time. 1. There are seeral ways to delete replication backlog: Abort all replication tasks associated with specific policy. On ProtecTIER Manager, replication policy iew, select policy and press abort actiities button. Abort specific running replication actiities. On ProtecTIER Manager, replication actiities iew, select specific actiity and press abort actiities button. 2. Aborting replication tasks will remoe them from the pending/running tasks. 3. These tasks will rerun automatically wheneer the specific cartridge is: Appended Ejected from the library Selected for manual execution 4. In order to preent these replication tasks from rerunning, mark those cartridges as read only either on ProtecTIER, or by the backup application. These cartridges will not be used for further backups, and therefore will not replicate the backlog. New/Scratch sets of cartridges will be used for subsequent backups; they will not contain any backlog data that is not required to be replicated. Note: To resume I/O actiity, you should use different barcodes, as the earlier data on the older cartridges will hae to be replicated before the new and the backlog will be huge again. Using a different set of barcodes will allow this new data to be replicated, without the need to replicate the data from the old cartridges. 104 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

137 Chapter 8. Using the Visibility Switch Control feature Visibility control is the means to imitate a moe (as oppose to copy) of a tape/cartridge from one location to another. Cartridges can only be present and aailable at one location at a time. ProtecTIER replication software can emulate a moe using the VTL Import/Export slots the same way a physical tape library does. This method of moing cartridges from one location to another is ery useful in one domain B/U enironments. For example, B/U apps such as NBU, Legato and BRMS can be deployed in a single domain enironment (please check with the specific B/U app literature for more details). In these enironments there is one catalog across multiple sites. If a user will use the VTL Export/Import slots to moe tapes from one site to the other the catalog will always be updated and aware of the whereabouts of all cartridges. If you are planning on using the VTL Export/Import slots you may want to increase the number of aailable such slots to the max of 1022 (the default when creating a library is 8). ProtecTIER Manager Imports/Exports tab displays detailed information about the import/export slots in the selected library. Table 7. Information about the import/export slots in a VTL library Column Imp/Exp No. Address Barcode Capacity Data Size Definition The number of the import/export slot. The import/export slot's address number. If the import/export slot contains a cartridge, this column displays the cartridge's barcode. If the import/export slot contains a cartridge, this column displays the cartridge's estimated data capacity in megabytes. If the import/export slot contains a cartridge, this column displays, in megabytes, the amount of nominal data currently stored on the cartridge. ProtecTIER VTL cartridge handling This section outlines the manner in which ProtecTIER manages cartridges, library slots, and Import/Export slots of the irtual library. When it comes to utilize the ProtecTIER cartridge Visibility Switch Control feature, the user should consider the following: Cartridge slots inside the ProtecTIER VTL: Cartridge imported into a VTL library will be place in an empty slot. The operation of moing a cartridge from an Import/Export slot into a specific library Copyright IBM Corp. 2008,

138 slot will fail if there are no free slots in the library at that time. Therefore the DR site libraries should be created with enough free slots to host replicated cartridges before the users' attempts to moe them into a library following a DR scenario when the DR site becomes actie. Import / Export (I/E) slots: These VTL slots are used to inject or eject cartridges to or from a specific irtual library inside the repository. The default number of Import/Export slots configured during the Create library operation is 8 slots. It is recommended to increase this number to the max number of Import/Export slots supported by the backup application (up to 1022 per ProtecTIER irtual library) as it will increase the capability of the backup application process to import or export cartridges to or from a ProtecTIER VTL library. Visibility Control Switch: Although this ProtecTIER feature can be used within any backup application enironment, this mode is more suitable and recommended to use when the backup application master serer controls both the primary and DR libraries, such as a single-domain backup enironment, where multiple sites share the same catalog/db. In this scenario, the backup application master serer will know the content of the ejected cartridges at the primary site as well as the injected cartridges at the secondary (DR) site, since the shared catalog/db already contains entries for them. Furthermore, the catalog will always be updated with the whereabouts of all cartridges at both sites. When to export cartridges: Cartridges should be exported (ejected from the library) immediately after the backup is completed, to preent the backup application from using it for subsequent backups. The recommended concept here is to use the same strategy for exporting (ejecting) cartridges as is being practiced by the user when handling physical tapes. When is a cartridge aailable at the DR site, utilizing the Visibility Switch Control feature: Once the exported/ejected cartridges from the primary site are fully synced (hae finished their replication) at the DR site, they will immediately be moed to the DR library import/export slots. Cartridges that are not in sync between the primary and remoter repositories will hae to transfer (replicate) their data in full, and only then they will be moed to the DR library import/export slots. The user can check the DR library for cartridges that completed this stage and were moed into the library Import/Export slots. The user can monitor the replication tasks iew at the primary location's ProtecTIER Manager, to learn about the status of these cartridges that are still transferring data to the secondary site. Alternatiely, the user can create a cartridge report at the DR site to learn about which cartridges already finished replication and were moed into the library Import/Export slots. 106 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

139 Chapter 9. LUN Masking LUN Masking is used to monitor deice isibility by allowing specific deices (such as tape dries or robots), to be seen only by a select group of host initiators. This feature allows users to assign specific dries to a specific host running backup application modules. It enables multiple initiators to share the same target FC port without haing conflicts on the deices being emulated. The LUN Masking setup can be monitored and modified at all times during system operation. LUN Masking in ProtecTIER influences the isibility of the deices by the hosts systems. Keep in mind that eery modification in the LUN Masking in ProtecTIER may affect the host configuration and may require re-scanning by the hosts. By default LUN masking is disabled. Without LUN Masking each backup host will be limited to one front-end port or oer-exposed all other hosts irtual deices. When LUN masking is enabled, no LUNs are assigned to any host. The user must create LUN groups and associate them with backup host(s). When defining backup hosts aliases use a practical naming scheme as opposed to just WW names (example: hostname-fe0). With more than two backup hosts it is recommended to use LUN Masking to load balance VTL performance across multiple front-end ports. LUN Masking is recommended for use to establish two or more front-end paths to a backup serer for redundancy. For example: In enironments with few backup serers (1-2) the user may choose to dedicate front-end ports per serer rather than use the LUN Masking feature. In enironments where front-end ports are shared and the user wants to preent backup hosts from sharing irtual LUN Masking should be used to isolate each backup host. Working with LUN masking groups Enable LUN Masking through the VT main menu - LUN masking is disabled by default. The system behaes as in.2.3 or before, where deices are accessible by all hosts zoned to the respectie FE port(s). When LUN masking is enabled for the first time, all deices are masked/hidden from all hosts. The User can then create LUN groups, associating host initiators with specific VTL deices in order to open paths between hosts and deices. Define host initiators through the Host Initiator Management screen. You can scan for aailable DR site WWNs or input WWNs manually (Figure 49 below): A maximum of 1024 host initiators can be defined on a system. Use practical and consistent naming scheme, for example hostname-fe# (e.g. myhost-fe1). This scheme helps you track which FE port the host is physically attached to. Create a LUN Masking group (Figure 50), associating host(s) to irtual deice(s). Copyright IBM Corp. 2008,

Regardless of LUN masking, irtual dries are still physically assigned to a single FE port, so backup hosts will hae to be attached to

140 A maximum of 512 groups can be configured per system. A deice can belong to multiple LUN Masking groups, but a host initiator can only belong to one LUN Masking group. Regardless of LUN masking, irtual dries are still physically assigned to a single FE port, so backup hosts will hae to be attached to that single port. For load-balancing purposes, it's recommended to distribute dries across multiple FE ports, ideally all 4. Figure 33. Host initiator management 108 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

141 Figure 34. LUN Masking groups Chapter 9. LUN Masking 109

142 110 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

143 Part 2. Back End Storage Configuration Best Practices Copyright IBM Corp. 2008,

144 112 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

145 Chapter 10. General configuration oeriew Note: For detailed information on setting up the configuration scripts for your own repository, refer to the IBM support documentation (Command Line Interface and Scripts Command Programming Guides) at the link proided below: docdisplay?lndocid=migr &brandind= ' A critical component of the TS7650G enironment is the storage array that connects to the TS7650G Gateway. The purpose of this document is to list the key factors and common configuration requirements on the back end storage array that must be set appropriately in order to establish a proper enironment for ProtecTIER. During the pre-sales process, you will receie a spreadsheet containing information to determine the optimal disk repository configuration. It is the customer's responsibility to use this information, along with these guidelines, to configure the disk repository and erify that it is configured correctly before the Serice Support Representatie (SSR) comes on site for installation. A preinstall conference call with the trained ProtecTIER Specialist install team is required before the disk repository is configured. All aspects of the disk repository, including configuration, monitoring, code updates, and repair actions are the responsibility of the customer in conjunction with their disk supplier. Raid It is critical to use RAID for data protection and performance. It is recommended that RAID5 with at least fie disk members (4+1) per group be used for Fibre-Channel User Data LUNs (4+P or 8+P for SATA disks) and RAID10 groups for Meta Data LUNs (with layout per planning requirements). Een if SATA dries are used for User Data LUNs, it is highly recommended that Fibre-channel disks be used for Meta Data LUNs. Data RAID groups: Fibre Channel User Data Logical Unit Number (LUN)s: It is recommended that RAID5 with at least fie disk members (4+1 per group) be used. SATA User Data LUNs: It is recommended to use RAID5 with at least fie disk members (4+P or 8+P per group) or RAID6 (enabling dual-parity to sustain dual disk failure). Meta Data LUNs: It is required that RAID10 groups be used (with layout per planning requirements). It is highly recommended to use Fibre Channel disks for Meta Data LUNs een if SATA disks are used for User Data LUNs. Important: Use of SATA disks for Meta Data will affect performance. Copyright IBM Corp. 2008,

146 User Data RAID Groups: It is recommended that at least 24 User Data RAID groups are created for optimal performance. Meta Data RAID Groups: The recommended number of Meta Data RAID groups is determined by the capacity planning tool during the pre-sales process. This number can range from 2 to 10 or more RAID groups (based on repository size, factoring ratio, and performance needs). Only create one LUN per RAID Group; that is, one LUN that spans the entire RAID group. The only exception is for the single 1GB Meta Data LUN. The 1GB MD LUN should be created on any of the Meta Data RAID group. The ProtecTIER serer host type (host connectiity settings) must be tuned for a Linux deice-mapper-multipath client; this may be generally denoted as a Linux host. A repository requires a minimum of 4 LUNs, one for Cluster database, one for User Data and two for Meta Data. At least 27 LUNs (1 Cluster, 2 Meta Data and 24 User Data) gie best performance. The size of the required Meta Data LUNs/file systems is a function of the nominal capacity of the repository (physical space and expected factoring ratio) and should be determined prior to the system installation by FTSS/CSS. The size of User Data RAID Groups/LUNs should be consistent. For example, don't mix 7+1 SATA User Data LUNs with 3+1 SATA LUNs. Smaller disk groups will hold back the performance of the larger groups and will degrade the oerall system throughput. Same policy goes for Meta Data. When expanding the repository, it's important to use the same tier of RAID groups (spindle type & quantity) for Meta Data or User Data as the existing, respectiely. For example, if the original two Meta Data LUNs were built on RAID 4+4 Groups, new Meta Data RAID groups added must be at least 4+4 to maintain the same leel of performance. Using storage from 2+2 or 4+1 RAID Groups, for example, for the expansion could result in performance degradation due to IOPS bottleneck. LUNs Only create one LUN per RAID group. (The only exception is for the single 1 GB Meta Data LUN. This LUN should be created on any of the Meta Data RAID group.) A repository requires a minimum of four LUNs: one for Cluster database; one for User Data; and two for Meta Data. (Using at least 27 LUNs optimizes performance: one for Cluster database; two for Meta Data; and 24 for User Data.) The size of the required Meta Data LUNs is a function of the nominal capacity of the repository (physical space and expected factoring ratio) and should be determined prior to the system installation by FTSS/CSS. The size of RAID groups and LUNs should be consistent. This is because smaller disk groups negatiely affect the performance of the larger groups and degrade the oerall system throughput. This applies to both User Data and Meta Data RAID groups. Do not share RAID groups or LUNs with other applications. 114 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

In storage arrays with actie-actie controller support (i.e. a LUN can be accessed from both controllers simultaneously) LUNs should be mapped to both controllers for best load-balancing and redundancy.

147 In storage arrays with actie-actie controller support (i.e. a LUN can be accessed from both controllers simultaneously) LUNs should be mapped to both controllers for best load-balancing and redundancy. In arrays with only actie-passie support (i.e. LUNs can only be accessed by one controller at once) LUN mapping should be interleaed between controllers in order to establish load-balancing to a similarly effectie degree. Build the array and assign the LUNs such that LUN 0 is from an Array built with using een and odd dries from the odd numbered Enclosures in the cabling diagram, and LUN1 is from an Array built with using een and odd dries from the een numbered Enclosures in the cabling diagram. That is why for best performance we are ery specific in how we build arrays and assign the preferred path to a controller. We always want to assign all LUNs in an array to hae the same preferred path controller. Each ProtecTIER node should hae two FC link/paths to each disk array controller. With low SAN switching speeds (e.g. 1GB or 2GB) two paths per controllers are strongly recommended. Not all storage supports 1GB SAN switching. When using only two backend FC links, use separate HBA ports on the TS7650G to protect against HBA HW failure. If using SAN P2P topology to connect the TS7650G to the disk array, create dedicated zones (one zone per initiator) for ProtecTIER backend ports. Do NOT mix the backend ports (Qlogic) with the frontend PT ports (Emulex) or any other SAN deices in the same zone. Figure 35. Zoning topology Note: While Zone D in this figure is shown as a single zone, the actual zoning of the back end ports is much more complex. There is one zone per initiator, and a two-node cluster will hae eight initiators, so there will be eight zones. Zones will include the WWPN of the HBA port, and the two ports on both controller A and controller B that the HBA port will use. Chapter 10. General configuration oeriew 115

148 Storage manager If possible, dedicate the storage array to the TS7650G. If not possible, use zoning and LUN masking to isolate the TS7650G from other applications. The TS7650G should neer share Raid Groups or LUNs with other applications. ProtecTIER is a random read oriented application % of I/O on a typical TS7650G enironment is random reads at 60Kb block size; therefore, suitable performance optimizations/tuning recommended by the disk endor for this I/O profile should be implemented. It is recommended that Array FW leel is equal to or greater than the FW ersion listed in the ProtecTIER Interoperability Matrix. Assure ISL links between SAN switches connected to TS7650G ports and Storage Arrays are not oersubscribed. ISL links for DS4000 and DS5000 are not recommended. Use only one type (FC or SATA), speed and size of drie in any one array, and it is recommended that all the arrays in one system be the same as well. Storage manager is used to configure and manage the DS4000 and DS5000 IBM storage solutions supported by ProtecTIER. You should always hae the most current Storage Manager Client software loaded on your management station. XIV Storage manager is used for the XIV storage solution. 116 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

149 Chapter 11. Specific guidelines for DS4000 Fibre Channel cabling This chapter outlines replication guidelines for DS4000. Use this figure and table as reference for cabling the fibre channel connections in a stand-alone installation. Copyright IBM Corp. 2008,

150 Figure 36. Stand-alone fibre channel connections Table 8. Stand-alone fibre channel connections From On Deice To On Deice/Location Port 1, slot 6 Serer Port H1 Disk Array, Disk Controller A Port 1, slot 7 Serer Port H1 Disk Array, Disk Controller B Port 1, slot 1 Serer Designated deice Customer's host network Port 2, slot 1 Serer Designated deice Customer's host network 118 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

151 Table 8. Stand-alone fibre channel connections (continued) From On Deice To On Deice/Location Port 1, slot 2 Serer Designated deice Customer's host network Port 2, slot 2 Serer Designated deice Customer's host network Use this figure and table as reference for cabling a clustered installation. Figure 37. Clustered fibre channel connections Table 9. Clustered fibre channel connections From On deice To Header Port 1, slot 6 Serer A Port H1 Disk Array #1, Disk Controller A Port 2, slot 6 Serer A Port H2 Disk Array #2, Disk Controller A Port 1, slot 7 Serer A Port H1 Disk Array #1, Disk Controller B Port 2, slot 7 Serer A Port H2 Disk Array #2, Disk Controller B Port 1, slot 6 Serer B Port H1 Disk Array #2, Disk Controller A Port 2, slot 6 Serer B Port H2 Disk Array #1, Disk Controller A Chapter 11. Specific guidelines for DS

152 Table 9. Clustered fibre channel connections (continued) From On deice To Header Port 1, slot 7 Serer B Port H1 Disk Array #2, Disk Controller B Port 2, slot 7 Serer B Port H2 Disk Array #1, Disk Controller B Port 1, slot 1 Serer A Designated Customer's host network deice Port 2, slot 1 Serer A Designated Customer's host network deice Port 1, slot 2 Serer A Designated Customer's host network deice Port 2, slot 2 Serer A Designated Customer's host network deice Port 1, slot 1 Serer B Designated Customer's host network deice Port 2, slot 1 Serer B Designated Customer's host network deice Port 1, slot 2 Serer B Designated Customer's host network deice Port 2, slot 2 Serer B Designated deice Customer's host network SAN fabric zoning Use World Wide Port Name (WWPN) zoning. A zone can hae multiple targets (Storage Ports), but we do not want multiple initiators (HBAs) in the same zone. Each Node HBA port will hae a zone that includes its WWPN and the WWPN for the ports on Controller "A" and Controller "B" that it will use. Keep the total number of Storage WWPNs that a HBA port will see to eight (8) or less: four (4) WWPNs on the preferred Controller and four (4) ports on the non preferred Controller. Table 10. DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch Defined configuration: cfg: bas_config zone_1; zone_2; zone_3; zone_4 cfg: zone_config zone_1; zone_2; zone_3; zone_4 zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) 120 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

153 Table 10. DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) Effectie configuration: cfg: bas_config zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) Chapter 11. Specific guidelines for DS

154 Table 11. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch Defined configuration: cfg: bas_config zone_1; zone_2; zone_3; zone_4 cfg: zone_config zone_1; zone_2; zone_3; zone_4 zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1)) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) 122 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

155 Table 11. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Effectie configuration: cfg: bas_config zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Chapter 11. Specific guidelines for DS

156 Table 11. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Storage Manager Rules and standard settings Storage Manager is used to configure the storage array. The spreadsheet proided by the pre-sales team for this purpose will hae the detailed information for your specific installation. If you do not hae this information, contact your pre-sales team. See the documentation proided with your storage system for specific information on procedures for creating arrays and LUNs. RAID10 must be used for metadata RAID5 and preferably 4+P should be used for user data Use the DS4000 Settings table and your spreadsheet to check the appropriate settings. Table 12. DS4000 Settings Option Default setting ProtecTIER setting Host Type LNXCLVMWARE AVT Disabled User label Media scan frequency (in 30 days) Default logical drie Unnamed Hot spare settings Cache block size (in Kb) 32 Start cache flushing at (in 80% 50% percentage) Stop cache flushing at (in 80% 50% percentage) Host Topology Option Default setting ProtecTIER setting 124 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

157 Table 12. DS4000 Settings (continued) Option Default setting ProtecTIER setting Flush write cache after (in seconds) Write cache without batteries Disabled Write cache with mirroring Enabled Read cache Enabled Write cache Enabled Enable background media Disabled scan Media scan with redundancy Enabled Modification priority High Pre-read redundancy check Disabled Chapter 11. Specific guidelines for DS

158 126 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

159 Chapter 12. Specific guidelines for DS5000 Cabling guidelines ProtecTIER These best practices are supported from ProtecTIER ersion only Deice Mapper Multipath There is no need to manually configure the multipath specific settings for the storage (multipath.conf), as this is done automatically by ProtecTIER's installation script (autorun). Fibre Channel Connection Topology The DS5100 and DS5300 support redundant connections to the clustered ProtecTIER nodes. To ensure full protection against the loss of any one fiber channel path from the node serers to the DS5100 and DS5300, always use redundant host connections by connecting each host to the appropriate host channels on both RAID controllers A and B. Single LUN path is not supported as it does not allow redundancy within the same controller. These configurations hae host and drie path failoer protection and are best practices for high aailability. Eery ProtecTIER Node must be configured to obtain two paths to the primary controller and two paths to the secondary controller of each configured LUN. To achiee the best performance with your DS5100/DS5300 always balance the enclosures connected to the Controller. By adding the expansion enclosures in the correct sequence you will always be balanced across the controllers. The following figure has the attachment order for cabling enclosures one (1) through eight (8) Expansion units and for Enclosures nine (9) through twenty eight just keep repeating the sequence. Copyright IBM Corp. 2008,

the DS5000 the next step is setting up the arrays.

160 Figure 38. DS5300 cabling guide Figure 39. Example of DS5100/5300 with eight enclosures Setting up arrays After cabling the DS5000 the next step is setting up the arrays. When setting up arrays always pick the dries for each array manually. 128 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

161 Loop pair numbering is based off of the port numbers used on Controller B (1, 3, 5 and 7) and should be used to build arrays. The LUNs from these arrays will be mapped haing a preferred path to Controller A. The enclosures on the een numbered ports will hae the LUNS with preferred path to Controller B. When selecting the dries for an array use a zigzag or barber pole pattern to make sure there are an equal number of dries selected from een and odd drie slots. This layout is critical for both balanced loads and performance reasons. In the figure aboe I hae renumbered the enclosures using the DS5000 Storage Manager to simplify identifying where the enclosures are connected. To ensure that you do not hae two enclosures on the same drie loop pair with the same ones digit. (11, 21), start all odd numbered enclosures with the Channel B port number as the "tens digit" and start the ones digit at "1" and increase the "1" digit as the enclosures are daisy chained away from the Controller "B" port. Start all een numbered enclosures with the Channel B port number as the "tens digit", start the ones digit at "5" and increase the "1" digit as the enclosures are daisy chained away from the Controller "B" port. Loop pair numbering is based off of the port numbers used on Controller B. Expansion Enclosures off the odd numbered Ports (1, 3, 5 and 7) will be used to build odd numbered arrays. The LUNs in these arrays will hae the preferred path set to Controller "A" and Expansion Enclosures off the een numbered Ports (2, 4, 6, and 8) will be used to build een numbered arrays. The LUNs in these arrays will hae the preferred path set to Controller "B" This will proide balance across both controllers. As an example, on Controller B port 1, the first enclosure will be 11, second 12, third 13 and since Port 1 and 2 are in the same loop pair start Controller B port 2 with 25 for the first and second will be 26. This will eliminate any problem when the controller is setting up the alpha addresses on the loop pair. Use the same pattern for creating Arrays and LUNs on the other loop pairs. Chapter 12. Specific guidelines for DS

162 Figure 40. Example of Drie selection for a RAID 5 4+P and a RAID Cabling layouts Single node attached to single DS IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

163 Figure 41. DS5300 Direct attach with dual path from 1 Node to 1 DS5300 Chapter 12. Specific guidelines for DS

164 Figure 42. Single Node attached to a single DS5300 through a switch 132 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

165 Figure 43. Single node attached to one DS5300 through dual switches Dual nodes attached to a single DS5300 Chapter 12. Specific guidelines for DS

166 Figure 44. DS5300 Direct attach with single path from two node cluster 134 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

167 Figure 45. DS5300 attached through a switch to a two node cluster Chapter 12. Specific guidelines for DS

168 Figure 46. DS5300 attached through dual switches to a two node cluster Dual nodes attached to dual DS5300s 136 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

169 Figure 47. Dual DS5300s Direct attached to a two node cluster Chapter 12. Specific guidelines for DS

170 Figure 48. Dual DS5300s attached through a switch to a two node cluster 138 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

171 Figure 49. Dual DS5300s attached through dual switches to a two node cluster SAN fabric zoning Use World Wide Port Name (WWPN) zoning. A zone can hae multiple targets (Storage Ports), but we do not want multiple initiators (HBAs) in the same zone. Each Node HBA port will hae a zone that includes its WWPN and the WWPN for the ports on Controller "A" and Controller "B" that it will use. Keep the total number of Storage WWPNs that a HBA port will see to eight (8) or less: four (4) WWPNs on the preferred Controller and four (4) ports on the non preferred Controller. Table 13. DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch Defined configuration: cfg: bas_config zone_1; zone_2; zone_3; zone_4 cfg: zone_config zone_1; zone_2; zone_3; zone_4 Chapter 12. Specific guidelines for DS

172 Table 13. DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) Effectie configuration: cfg: bas_config zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) 140 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

173 Table 13. DS4000/DS5000 Zoning for 2 Node ProtecTIER with single DS5300, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) Table 14. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch Defined configuration: cfg: bas_config zone_1; zone_2; zone_3; zone_4 cfg: zone_config zone_1; zone_2; zone_3; zone_4 zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1)) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Chapter 12. Specific guidelines for DS

174 Table 14. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Effectie configuration: cfg: bas_config zone: zone_1 WWPN of ProtecTIER Node 1 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_2 WWPN of ProtecTIER Node 1 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) 142 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

175 Table 14. DS4000/DS5000 Zoning for 2 Node ProtecTIER with dual DS5300s, Single Switch or Dual Switch (continued) Defined configuration: zone: zone_3 WWPN of ProtecTIER Node 2 HBA 6 Port 1 WWPN (S6P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) zone: zone_4 WWPN of ProtecTIER Node 2 HBA 7 Port 1 WWPN (S7P1) WWPN of DS5300 #1 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #1 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #1 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #1 Controller "B" Channel 2 (A CH2) WWPN of DS5300 #2 Controller "A" Channel 4 (A CH4) WWPN of DS5300 #2 Controller "A" Channel 3 (A CH3) WWPN of DS5300 #2 Controller "B" Channel 1 (B CH1) WWPN of DS5300 #2 Controller "B" Channel 2 (A CH2) Storage Subsystem Firmware Leel The Array FW leel should be equal to or greater than the FW ersion listed in the ProtecTIER Interoperability Matrix. Important: It is critical to stop all IO to DS5000 series storage when DDM (drie) Firmware is upgraded. Therefore you must stop the tfd serice on the ProtecTIER node/s to guarantee that I/O is stopped. Storage Subsystem Host Type You should create a dedicated Storage host group for mapping rather than using the default one LNXCLVMWARE must be used as host type for the Storage Subsystem Follow these instructions to change and erify the host type: Shutdown both nodes. Important: Do not start the process if the node/s is/are still actie. Open the DS Storage Manager client. Wait until the storage Subsystem is ready for use. The indication is the green button near the storage subsystem name. Chapter 12. Specific guidelines for DS

window. The new window contains information about the storage subsystem: Figure 51.

176 Figure 50. Storage manager window showing subsystem ready for use Double click on the storage subsystem name will open new window. The new window contains information about the storage subsystem: Figure 51. Storage subsystem information window Go to the tab: Mappings Stand on the storage subsystem name (Verify it contains all the LUNs ) right click choose change choose "Host Operating System" 144 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

177 Figure 52. Host Operating System menu cascade You will get a new window: Figure 53. Change host operating system window Change the parameter under "Select operating system" to "LNXCLVMWARE" and press OK. Once the aboe has been done, Stand on the storage subsystem name right click choose "properties: Chapter 12. Specific guidelines for DS

178 Figure 54. Properties menu cascade You should get a new window with information about eery LUN that belonging to this storage subsystem: 146 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

179 Figure 55. LUN information window RAID configuration Verify that the field "Host type" changed to "LNXCLVMWARE" for eery LUN. Reboot both controllers: - Stand on "Logical/Physical" tab select one of the controllers go to "Adanced" tab Recoery Reset Controller Repeat the step aboe for the other controller. Power on node A. Once VTFD is running power on node B. Segment size The recommended segment size is 128KB (default). It is critical to use RAID for data protection and performance. Only create one LUN per Raid Group. I.e. One LUN that spans the entire RAID group. The only exception is for the single 1GB Meta Data LUN. The 1GB MD LUN should be created on any of the Meta Data RAID group. Chapter 12. Specific guidelines for DS

180 A repository requires a minimum of 4 LUNs, one for Cluster database, one for User Data and two for Meta Data. At least 27 LUNs (1 Cluster, 2 Meta Data and 24 User Data) gie best performance. The same policy goes for Meta Data. Metadata It is recommended to use RAID10 groups for Meta Data LUNs (with layout per planning requirements) with at least 4+4 members. Een if SATA dries are used for User Data LUNs, it is highly recommended to use Fiber-channel disks for Meta Data LUNs. Use of SATA disks for Meta Data will affect performance. The recommended number of Meta Data RAID groups is determined by the capacity planning tool during the pre-sales process. This number can range from 2 to 10 or more RAID groups (based on repository size, factoring ratio, and performance needs). The 1GB MD LUN might be created on any of the Meta Data RAID group. The size of the required Meta Data LUNs/filesystems is a function of the nominal capacity of the repository (physical space and expected factoring ratio) and should be determined prior to the system installation by FTSS/CSS. When expanding the repository, it's important to use the same tier of RAID groups (spindle type & quantity) for MD or UD as the existing, respectiely. For example, if the original two MetaData LUNs were built on RAID 4+4 Groups, new Meta Data RAID groups added must be at least 4+4 to maintain the same leel of performance. User data It is recommended that RAID5 with at least fie disk members (4+1) per group Fiber Channel User Data: It is recommended that RAID5 with at least fie disk members (4+1 per group) be used. SATA User Data: It is recommended to use RAID5 with at least fie disk members (5+1 or 7+1 per group) or RAID6 (enabling dual-parity to sustain dual disk failure) with 6+2 disk members. It is recommended that at least 24 User Data RAID groups are created for optimal performance. The size of User Data RAID Groups/LUNs should be consistent. For example, do not mix 7+1 SATA User Data LUNs with 3+1 SATA LUNs. Smaller disk groups will hold back the performance of the larger groups and will degrade the oerall system throughput. Using storage from 2+2 or 4+1 RAID Groups, for example, for the expansion could result in performance degradation due to IOPS bottlenecks. General notes: Due to ProtecTIER's nature the storage resources are being heaily used, therefore it should be dedicated solely to ProtecTIER's actiity. Disk based replication it is not supported due to ProtecTIER's design limitations, for such purposes we recommend to use ProtecTIER's natie replication aailable from ersion 2.3 onwards If using SAN P2P topology to connect the TS7650G to the disk array, create a dedicated zone for ProtecTIER backend ports. Do NOT mix the back-end ports (Qlogic) with the front-end PT ports (Emulex) or any other SAN deices in the same zone. 148 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

181 If possible, dedicate the storage array to the TS7650G. If not possible, use zoning and LUN masking to isolate the TS7650G from other applications. The TS7650G should neer share RAID Groups or LUNs with other applications. ProtecTIER is a random read oriented application % of I/O on a typical TS7650G enironment is random reads at 60Kb block size; therefore, suitable performance optimizations/tuning recommended by the disk endor for this I/O profile should be implemented. LUN mapping should be interleaed between controllers (i.e. LUN 0 on controller A, LUN 1 on B, 2 on A, etc) in order to establish load-balancing to a similarly effectie degree. Storage Manager is used to configure the storage array. The spreadsheet proided by the pre-sales team for this purpose will hae the detailed information for your specific installation. If you do not hae this information, contact your pre-sales team. See the documentation proided with your storage system for specific information on procedures for creating arrays and LUNs. Storage Manager Rules and standard settings Storage Manager is used to configure the storage array. The spreadsheet proided by the pre-sales team for this purpose will hae the detailed information for your specific installation. If you do not hae this information, contact your pre-sales team. See the documentation proided with your storage system for specific information on procedures for creating arrays and LUNs. RAID10 must be used for metadata RAID5 and preferably 4+P should be used for user data Use the DS5000 Settings table and your spreadsheet to check the appropriate settings. Table 15. DS5000 Settings Option Default setting ProtecTIER setting Host Type LNXCLVMWARE AVT Disabled User label Media scan frequency (in 30 days) Default logical drie Unnamed Hot spare settings Cache block size (in Kb) 32 Start cache flushing at (in 80% 50% percentage) Stop cache flushing at (in 80% 50% percentage) Host Topology Chapter 12. Specific guidelines for DS

182 Table 15. DS5000 Settings (continued) Option Default setting ProtecTIER setting Option Default setting ProtecTIER setting Flush write cache after (in seconds) Write cache without batteries Disabled Write cache with mirroring Enabled Read cache Enabled Write cache Enabled Enable background media Disabled scan Media scan with redundancy Enabled Modification priority High Pre-read redundancy check Disabled 150 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

183 Chapter 13. Specific guidelines for XIV Storage System XIV Storage system hardware Use the XIV Storage Manager to configure the XIV storage system. The spreadsheet proided by the pre-sales team for this purpose will hae the detailed information for your specific installation. If you do not hae this information, contact your pre-sales team. XIV supports configurations of 6, 9, 10, 11,12 13, 14, 15 modules. Modules 1-3,10 15 hae disks only and are known as Data Modules. Modules 4 9 hae disks and Host interfaces and are known as Interface modules. Table 16 on page 152 displays how many Interface Modules and FC ports are aailable, by configuration. 2TB system proides additional capacity while throughput remains the same as with 1TB system. Copyright IBM Corp. 2008,

184 Figure 56. XIV hardware Table 16. Interface module state chart Total number of modules Interface Module 9 Empty Disabled Disabled Enabled Enabled Enabled state Interface Module 8 Empty Enabled Enabled Enabled Enabled Enabled state Interface Module 7 Empty Enabled Enabled Enabled Enabled Enabled state Interface Module 6 Disabled Disabled Disabled Disabled Disabled Enabled state Interface Module 5 Enabled Enabled Enabled Enabled Enabled Enabled state Interface Module 4 Enabled Enabled Enabled Enabled Enabled Enabled state FC ports Net capacity 27TB 43TB 50TB 54TB 61TB 66TB (decimal) 1TB - 2TB 55TB 87TB 102TB 111TB 125TB 134TB 152 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

185 XIV Storage manager GUI Note: Disabled means FC ports are not enabled for Host access Figure 57. XIV Storage manager GUI XIV System zoning Fibre Channel connectiity to XIV Direct host to XIV connectiity is not permitted Implementation must make use of a SAN fabric - Single or dual SAN switches/directors - Dual fabric configurations are recommended Fibre connectiity to XIV is ia the Patch Panel Interface Modules 4 9proide FC ports for Host connectiity Each Interface Module has two 2-port FC cards 8-24 FC ports aailable Ports 1 and 3 recommended for ProtecTIER to ensure redundancy - Ports 1/2 and ports 3/4 are on the same HBA Speed - 4Gb Supports 1Gb, 2Gb, 4Gb Predefined to auto negotiate speed Speed may be configured manually if required Chapter 13. Specific guidelines for XIV Storage System 153

186 Figure 58. XIV Patch panel and TS7650 rear iew showing ports XIV to TS7650G Zoning For each TS7650G node For each TS7650G disk attachment port, multiple XIV ports get zoned All P1 ports from all XIV Interface Modules get zoned to the P1 port on both Slot 6 and 7 HBAs. All P3 ports from all XIV Interface Modules get zoned to the P2 port on both Slot 6 and 7 HBAs in each TS7650G node. Each HBA in XIV makes a connection to both TS7650G HBAs. Always use single target to single initiator zoning Each LUN will be seen by each TS7650G node through a maximum of 24 paths. Table 17. Zoning Zone # XIV Port ProtecTIER TS7650 Port Zone # XIV Port ProtecTIER TS7650 Port 1 M4P1 S6P1 13 M4P1 S7P1 2 M5P1 S6P1 14 M5P1 S7P1 3 M6P1 S6P1 15 M6P1 S7P1 4 M7P1 S6P1 16 M7P1 S7P1 5 M8P1 S6P1 17 M8P1 S7P1 6 M9P1 S6P1 18 M9P1 S7P1 7 M4P3 S6P2 19 M4P3 S7P2 8 M5P3 S6P2 20 M5P3 S7P2 9 M6P3 S6P2 21 M6P3 S7P2 154 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

187 Table 17. Zoning (continued) Zone # XIV Port ProtecTIER TS7650 Port Zone # XIV Port ProtecTIER TS7650 Port 10 M7P3 S6P2 22 M7P3 S7P2 11 M8P3 S6P2 23 M8P3 S7P2 12 M9P3 S6P2 24 M9P3 S7P2 Best practice is to create twele 1:1 zones to connect a single ProtecTIER node to a full XIV with six Interface Modules (IM) Zone 01: XIV IM4P1 <> PT BE0 Zone 02: XIV IM6P1 <> PT BE0 Zone 03: XIV IM8P1 <> PT BE0 Zone 04: XIV IM5P1 <> PT BE1 Zone 05: XIV IM7P1 <> PT BE1 Zone 06: XIV IM9P1 <> PT BE1 Zone 07: XIV IM4P3 <> PT BE2 Zone 08: XIV IM6P3 <> PT BE2 Zone 09: XIV IM8P3 <> PT BE2 Zone 10: XIV IM5P3 <> PT BE3 Zone 11: XIV IM7P3 <> PT BE3 Zone 12: XIV IM9P3 <> PT BE3 One initiator and one target per zone Each ProtecTIER backend port sees 3 XIV ports Each XIV Interface Module is connected redundantly to 2 different ProtecTIER backend ports 12 paths (4 x 3) to a LUN from a single ProtecTIER node Chapter 13. Specific guidelines for XIV Storage System 155

188 Figure 59. XIV to ProtecTIER sample zoning XIV System GUI installation Configuration of the XIV Storage System Once the SSR has completed the physical installation of the XIV Storage System, the following tasks must be performed by the storage administrator to configure the XIV Storage System using the XIV Management Software. The XIV Storage Manager can be installed on a Windows, Linux, AIX, HPUX, or Solaris workstation that will then act as the management console for the XIV Storage System. Storage Manager GUI software is proided at the time of installation, or is optionally downloadable from the following website: ftp.software.ibm.com/storage/xiv/gui/ Installation of the XIV Storage System GUI Perform the following steps to install the XIV Storage Management software: 1. Locate the XIV Storage Manager installation file (either on the installation CD or a copy you downloaded from the Internet). Running the installation file first shows the welcome window displayed in Figure 4-1. Click Next. 156 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Keep the default installation folder or change it accordingly to your needs. When done, click Next. Figure 61.

189 Figure 60. XIV storage management setup wizard 2. A Setup dialog window is displayed (Figure 4-2) where you can specify the installation directory. Keep the default installation folder or change it accordingly to your needs. When done, click Next. Figure 61. Setup dialog window 3. Perform a FULL installation 4. Specify the Start Menu Folder 5. Create Desktop icons 6. Installation is complete when the following appears. click Finish. Chapter 13. Specific guidelines for XIV Storage System 157

Storage System. Default user is admin and the default corresponding password is adminadmin. Figure 63. XIV GUI login window Adding an XIV Storage System to the GUI 1.

190 Figure 62. Setup completion window Launching the XIV GUI Upon launching the XIV GUI application, a login window prompts you for a user name and its corresponding password before granting access to the XIV Storage System. Default user is admin and the default corresponding password is adminadmin. Figure 63. XIV GUI login window Adding an XIV Storage System to the GUI 1. To connect to an XIV Storage System, you must initially add the system to make it isible in the GUI by specifying its IP/Hostname addresses. IP Addresses were defined as part of the SSR physical installation and setup. 2. To add the system: a. Make sure that the management workstation is set up to hae access to the LAN subnet where the XIV Storage System resides. 158 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

191 1) Verify the connection by pinging the IP address of the XIV Storage System. b. If this is the first time you start the GUI on this management workstation and no XIV Storage System had been preiously defined to the GUI, the Add System Management dialog window is automatically displayed: 1) Enter the IP Address fields. 2) Click Add to add the system to the GUI. Figure 64. Add system management window XIV / ProtecTIER performance metrics Current XIV configurations include: 6, 9,10, 11, 12,13,14, 15 module configurations. 6, 9, 12, and 15 modules were tested according to PT steady state benchmark. Performance benchmarks deried for planning/sizing XIV XIV modules require dual-node cluster to achiee claimed figures. Performance metrics are alid for both 1TB and 2TB systems. 2TB system proides additional capacity while throughput remains the same as with 1TB system. Table 18. Sizing XIV modules TBs Chapter 13. Specific guidelines for XIV Storage System 159

192 Figure 65. Throughput XIV / ProtecTIER performance numbers XIV and Diligent teams tested performance of solution using two media serers, 1-2 (one and two) ProtecTIER serers and 1 XIV rack (6 to 15 modules). Use ProtecTIER Capacity Planning Tool to size the XIV repository (Meta Data and User Data). Table 19. Performance numbers XIV Modules in Rack ProtecTIER nodes used Test performance achieed Factoring repository growth MBps 375 MBps MBps 475 MBps MBps 625 MBps MBps 675 MBps Planning the file systems For Meta Data, use a drie capacity of 600 GB For User Data, use 32 user data file systems for best performance Diide remaining XIV capacity planned for the ProtecTIER repository by 32 to calculate the LUN size for the user data file systems ProtecTIER uses multiple file systems in parallel to enhance performance and bypass file system performance limitations User Data file system example Repository Size: 20TB - 20TB / 32 = 0.625TB - Add 10% to the TB capacity as XIV creates LUNs in multiples of 17GB (decimal) - Create 32 user data LUNs on XIV, each of TB (decimal) 160 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

193 Configuring XIV System storage Procedure 1. Create a Storage Pool' 2. Create the olumes within the Storage Pool 3. Create the host 4. Map the olumes to the host Create storage pool A Storage Pool in XIV is a logical entity to manage groups of olumes. All Meta data and User data is spread across all physical dries with XIV. About this task To create a storage pool, do the following: Procedure 1. Scroll the mouse oer the Pool Function icon and it will expand to show the storage pool window 2. Select Storage Pools Figure 66. Storage pool window Select Add Pool Chapter 13. Specific guidelines for XIV Storage System 161

Name the pool with a meaningful name 7. Select Add Figure 68. Add pool window 8.

194 Figure 67. Storage pool window 4. Select Type: Regular Pool 5. Use the total system capacity as the pool size 6. Name the pool with a meaningful name 7. Select Add Figure 68. Add pool window 8. The created pool will appear on the storage pool window 162 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Procedure 1. Select Volumes by Pools Figure 70. Selecting Volumes by Pools 2. Select the ProtecTIER pool 3.

195 Figure 69. Storage pool window showing new pool Create olumes within the pool About this task Note: Create the Meta Data olume(s) before User Data olumes Procedure 1. Select Volumes by Pools Figure 70. Selecting Volumes by Pools 2. Select the ProtecTIER pool 3. Select Add Volumes from the Volumes by Pools window Figure 71. Volumes by Pools window 4. Create Meta Data olume a. Ensure the Storage Pool is the ProtecTIER pool b. The Meta Data olume is 600 GB. XIV will round that up to 601 GB c. Name the olume d. Select Create Chapter 13. Specific guidelines for XIV Storage System 163

Create User Data olumes a. Use 32 olumes for the file systems b. XIV will determine the maximum olume size that will fit in the Pool c.

196 Figure 72. Create olume window 5. The new olume will appear in the storage management window under the ProtecTIER pool Figure 73. Volumes by Pools window showing Meta Data olume 6. Repeat steps 3 5tocreate additional Meta Data olumes as needed 7. Create User Data olumes a. Use 32 olumes for the file systems b. XIV will determine the maximum olume size that will fit in the Pool c. Type in large alue and XIV will resize to the largest possible alue when defining 32 olumes 164 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Volumes by Pools window showing User Data olumes Create the ProtecTIER host Procedure 1.

197 Figure 74. Create olumes window with User Data input d. d. Select Create' 8. The created User Data olumes will appear in the storage manager window below the Meta Data olume(s) Figure 75. Volumes by Pools window showing User Data olumes Create the ProtecTIER host Procedure 1. Roll the mouse oer the function icons to Hosts and Clusters 2. Select Hosts and Clusters from the expanded Hosts and Clusters window Chapter 13. Specific guidelines for XIV Storage System 165

Figure 76. Hosts and Clusters menu 3. Select Add Host Figure 77. Hosts and Clusters window The Add Hosts window will appear. 4. Fill in the name of the host.

198 Figure 76. Hosts and Clusters menu 3. Select Add Host Figure 77. Hosts and Clusters window The Add Hosts window will appear. 4. Fill in the name of the host. Leae the Type field at default 5. Select Add Figure 78. Add Host window 6. The new host will appear in the Hosts and Clusters window 166 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

If the zoning has already been setup, you can erify connectiity from the Hosts and Clusters', Host Connectiity' screen that all the ports are aailable i.

199 Figure 79. The Hosts and Clusters window showing the ProtecTIER host 7. Add Ports to the host a. Right click the Host and select Add Port. The Add Port window will appear Figure 80. Host drop down menu b. If the zoning has already been setup, you can erify connectiity from the Hosts and Clusters', Host Connectiity' screen that all the ports are aailable i. Select the correct Port Name from the Add Port drop down ii. Click Add c. If the zoning has not been setup: i. Type in the 16 digit WWPN as displayed from the Host ii. Click Add d. Repeat this process for all the Host ports attached to XIV Figure 81. Add Port window Chapter 13. Specific guidelines for XIV Storage System 167

The added ports will appear in the Hosts and Clusters window Figure 82. Hosts and Clusters window showing the ports Mapping the olumes to the host Procedure 1.

200 The added ports will appear in the Hosts and Clusters window Figure 82. Hosts and Clusters window showing the ports Mapping the olumes to the host Procedure 1. Right click on the ProtecTIER host and select Modify LUN Mapping Figure 83. Host drop down menu with Modify LUN Mapping 2. Select the olumes to be mapped. Press the SHIFT key to select all the olumes at once. 3. When all the olumes are selected, press Map 168 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

LUN Mapping. b. The View LUN Mapping window will appear. Figure 85.

201 Figure 84. Selecting olumes to be mapped 4. Verify the olumes are mapped a. From the Hosts and Clusters screen, right-click on the ProtecTIER host and select View LUN Mapping. b. The View LUN Mapping window will appear. Figure 85. View LUN Mapping window c. Verify all LUNs are in the list Chapter 13. Specific guidelines for XIV Storage System 169

202 XIV Storage System useful links The IBM XIV Storage System Information Center is publicly aailable at: and contains all product documentation, including: Introduction and Planning Guide Pre-installation Network Planning Guide Host Attachment Guides Theory of Operation Guide XCLI Reference Guide The IBM XIV Software Download Site is publicly aailable at: ftp://ftp.software.ibm.com/storage/xiv and contains: Public XIV-related software Most host attachment packages GUI/CLI 170 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

203 Part 3. Enironment OS Copyright IBM Corp. 2008,

204 172 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

205 Chapter 14. Driers Specification for AIX all platform ersions to work with ProtecTIER This chapter describes the Robot and Tape drier installation on AIX platforms in order to work with ProtecTIER's VTL. Releant OS leel platforms: AIX ersion 5.2 AIX ersion 5.3 AIX ersion 6.1 Driers Location and Specifications to work with ProtecTIER's VTL Tioli Storage Manager (TSM) backup application on all AIX OS ersions requires IBM Tape deice driers both for the TS3500 Library, And for LTO2/LTO3 tape-dries. Legato (Networker) backup application on all AIX OS ersions requires IBM Tape deice driers the LTO2/LTO3 tape-dries. For all other Backup Applications on AIX platforms use the natie SCSI pass-through drier on all existing VTL emulations both for TS3500 libraries and the LTO2/LTO3 tape-dries. For Control Path Failoer (CPF)/Data Path Failoer (DPF) features implementation (possible only) with TSM backup application on all AIX platforms download IBM deice-drier from: ftp://ftp.software.ibm.com/storage/dedrr/aix/archie/ CPF is a redundancy IBM drier mechanism that in the eent of a path failure to the tape library (Robot) control path, failoer will occur to the alternate path. CPF is only supported on Microsoft, Solaris & AIX. DPF for Tape dries is exactly the same a redundancy IBM drier mechanism for the Tape-dries. In the aboe ftp directory choose the corresponding platform directory name, And download the latest Atape*.bin file. DPF is only supported on AIX. Installation Highlights Detailed Installation user-guide can be downloaded from: ftp:// ftp.software.ibm.com/storage/dedrr/doc/archie/ibm_tape_drier_iug.pdf The aforementioned installation-guide (Chapter 3) contains detailed steps to install/upgrade/un-installing the drier for all aboe AIX Platforms. For additional detailed notes on IBM-Tape drier specifically per each Backup application: Please refer to ProtecTIER Support-Matrix official release at: ftp://index.storsys.ibm.com/tape/ts7650_support_matrix.pdf CPF/DPF features can only be implemented with TSM Backup application, And on all (the aboe) AIX platforms. CA Backup application has a limitation of only one medium changer that can be presented to it both for Virtual and Physical libraries. Copyright IBM Corp. 2008,

206 174 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

207 Chapter 15. Driers Specification for Solaris Platform to work with ProtecTIER This chapter describes the Robot and Tape drier installation on Sun Solaris platforms in order to work with ProtecTIER's VTL. Software code and/or Hardware enironment releancy: Solaris 8 Solaris 8 SPARC Solaris 9 Solaris 9 SPARC Solaris 10 Solaris 10 SPARC & x64 Solaris 10 x86 Solaris 10 x86_64 Driers location and specifications to work with ProtecTIER's VTL Tioli Storage Manager (TSM) application on all Solaris platforms requires IBM Tape deice driers. HP Data Protector's backup application requires Solaris sst drier for TS3500 (medium-changer). All other Backup Applications on Solaris use either the natie drier or IBM Tape deice driers For all existing VTL emulations. For CPF/DPF features implementation with TSM backup application on all Linux platforms download IBM deice-drier from: ftp://ftp.software.ibm.com/storage/ dedrr/solaris/archie/ In the aforementioned ftp directory, download the latest IBMtape*.bin file. Control Path Failoer (CPF) is a redundancy IBM drier mechanism. In the eent of a path failure to the tape library (Robot) control path, failoer will occur to the alternate path. Data Path Failoer (DPF) is also a redundancy IBM drier mechanism for the tape dries. Installation highlights Detailed Installation user-guide can be downloaded from: ftp:// ftp.software.ibm.com/storage/dedrr/doc/archie/ibm_tape_drier_iug.pdf Chapter 5 of this Installation Guide contains detailed steps to install/upgrade the drier for all aboe Linux Platforms. For additional detailed notes on IBM-Tape drier for each Backup application, please see the ProtecTIER Support-Matrix official release at: ftp://index.storsys.ibm.com/tape/ts7650_support_matrix.pdf CPF/DPF features can only be implemented with TSM Backup application and on all Linux platforms. CA Backup application has a limitation of only one medium changer that can be presented for both Virtual and Physical libraries. Copyright IBM Corp. 2008,

208 176 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

209 Chapter 16. Driers Specification for Linux Platform to work with ProtecTIER This chapter describes the Robot and Tape drier installation on RedHat Linux platforms in order to work with ProtecTIER's VTL. Releant OS leel platforms: RedHat 4 Update 7 SLES10 RedHat ES4.0 RedHat R5 Driers Location and Specifications to work with ProtecTIER's VTL Tioli Storage Manager (TSM) application on all Linux platforms requires IBM Tape deice driers. For all other Backup Applications on Linux platforms use the natie SCSI pass-through drier for all existing VTL emulations. For CPF/DPF features implementation with TSM backup application on all Linux platforms download IBM deice-drier from: ftp://ftp.software.ibm.com/storage/ dedrr/linux/archie/. Control Path Failoer (CPF) is a redundancy IBM drier mechanism. In the eent of a path failure to the tape library (Robot) control path, failoer will occur to the alternate path. Data Path Failoer (DPF) is also a redundancy IBM drier mechanism for the tape dries. In the aforementioned ftp directory, do the following: 1. Select the corresponding platform directory name. 2. Download the latest lin-tape*.bin file. Installation highlights Detailed Installation user-guide can be downloaded from: ftp:// ftp.software.ibm.com/storage/dedrr/doc/archie/ibm_tape_drier_iug.pdf Chapter 5 of this Installation Guide contains detailed steps to install/upgrade the drier for all aboe Linux Platforms. For additional detailed notes on IBM-Tape drier for each Backup application, please see the ProtecTIER Support-Matrix official release at: ftp://index.storsys.ibm.com/tape/ts7650_support_matrix.pdf CPF/DPF features can only be implemented with TSM Backup application and on all Linux platforms. CA Backup application has a limitation of only one medium changer that can be presented for both Virtual and Physical libraries. Copyright IBM Corp. 2008,

210 178 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

211 Part 4. Backup Application Copyright IBM Corp. 2008,

212 180 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

213 Chapter 17. Backup application serers General recommendations Many backup serers hae features and settings that can be used to optimize performance when writing data to real tape cartridges. Since ProtecTIER presents to the backup serer a irtual tape library with irtual dries and cartridges, some of the settings that are optimized for real tape are no longer required and may een hae a detrimental effect on the ProtecTIER factoring ratio and performance. The following recommendations are fairly generic and are common to all backup serers. You should check the current settings of your backup serer and apply those that can be implemented. Some of the backup serer-specific sections of this chapter hae more detailed recommendations than these general topics. When this occurs, the specific recommendation should take precedence and be followed closely. These are general recommendations that are not specific to any backup serer. More specific recommendations are made in the following sections. Interoperability Check the IBM Interoperability Matrix to ensure the ersion of backup serer and the operating system you are running on are supported for ProtecTIER. You can iew the matrix at: Software compatibility Make sure your backup serer ersion, platform, and operating system ersion are on the supported hardware and software list for ProtecTIER. You can iew the list at: Software currency Ensure the backup serer is at the latest patch or maintenance leel. This can impact oerall factoring performance. Ensure that the operating system of the platform the backup serer is running on is at the latest patch or maintenance leel. This can impact oerall HyperFactor performance. Tape library zoning The backup serer should hae a dedicated HBA port or ports for the ProtecTIER irtual tape library. This port or ports can be shared with a physical tape library; howeer, the physical tape library must not be in the same SAN zone as the irtual tape library. Copyright IBM Corp. 2008,

214 Compression Compression will effectiely scramble the data sent to ProtecTIER, making pattern matching difficult. As can be expected, this will hae an effect on data matching rates, een if the same data is sent each time. ProtecTIER will compress the data it sends to the Back End physical disk, once it has been receied by the irtual tape dries and de-duplicated. We recommend that you disable any compression features for the ProtecTIER storage pool as defined in backup serer. Encryption Encryption will make each piece of data sent to ProtecTIER unique, including duplicate data. As can be expected, this will hae an effect on data matching rates and the factoring performance, because een if the same data is sent each time, it will appear differently to the deduplication engine. It is strongly recommend that you disable any encryption features for the ProtecTIER storage pool in the backup serer. Multiplexing Do not use the multiplexing feature of any backup application with the ProtecTIER storagepool. Although ProtecTIER will work with these features, the benefits (disk saings) of the HyperFactor algorithm and compression will be greatly reduced. We strongly recommend that you disable any multiplexing features in the backup serer for the ProtecTIER storage pool. Tape block sizes In order to optimize the backup serer, set the block size for data sent to the (irtual) tape dries to be at least 256 KB or greater. Other factors Another factor that affects performance in a ProtecTIER enironment is the type of data being targeted for backup. Some data is well suited to data deduplication and other data is not. For example, small files (less than 32 KB in size) commonly found in operating systems do not factor ery well, although the built-in compression may reduce their stored size. It may therefore be necessary to re-ealuate your current backup workloads and decide which backups might not be good candidates for ProtecTIER deduplication. Known configuration changes suited to specific data types are discussed later in this chapter. Best Practices for Specific Backup Applications IBM Tioli Storage Manager The following IBM Tioli Storage Manager serer and client options should be checked and, if necessary, changed to enable optimum performance of the ProtecTIER Software: 1. Use 256K I/O for the irtual tape dries this proides the best factoring ration. 182 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

215 2. Client compression should be disabled. 3. Ensure serer option MOVEBATCHSIZE is set at 1000 (the default alue). 4. Ensure serer option MOVESIZETHRESHOLD is set at 2048 (the default alue). 5. When using Windows based TSM serers, the Tioli TSM tape and libraries for Windows must be used. Natie Windows driers for the emulated P3000 and DLT7000 dries will not function. 6. Gien that ProtecTIER acts as a irtual tape library as well as a data deduplication deice, the adantages associated with disk backup oer tape backup apply here too. The following points should also be considered when using ProtecTIER with IBM Tioli Storage Manager: ITSM disk pools: For some large enironments with seeral IBM Tioli Storage Manager serers in place, you do not need to assign dedicated ITSM disk storage pool(s) to each serer. With ProtecTIER, you can either share a irtual library or you can create irtual libraries for eery serer. LAN-free backups are easier: As ProtecTIER is a irtual tape library, it has the major adantage of presenting greatly increased tape resources to the backup serer. This then positions you to be able to perform LAN-free backups to ProtecTIER without much regard to the limitations normally applied to these backups, such as tape drie aailability. If you hae many LAN-free clients already, then it is possible your LAN-free backup windows were dictated not entirely by business needs but also by hardware aailability. With ProtecTIER and its maximum of 256 irtual tape dries per ProtecTIER node, you can almost completely eliminate any hardware restrictions you may hae faced preiously, and schedule your backups as and when they are required by your business needs. Data streams: You may be able to reduce your current backup window by taking full adantage of ProtecTIER's throughput performance capabilities. If tape drie aailability has been a limiting factor on concurrent backup operations on your ITSM serer, you can define a greater number of irtual dries and reschedule backups to run at the same time to maximize the number of parallel tape operations possible on ProtecTIER serers. Note: If you choose to implement this strategy, you may need to increase the alue of the MAXSESSIONS option on your ITSM serer. Reclamation: You should continue to reclaim irtual storage pools that are resident on ProtecTIER. The thresholds for reclamation may need some adjustment for a period until the system reaches steady state (see, Steady state on page 29 for an explanation of this term). When this point is reached, the fluctuating size of the irtual cartridges should stabilize and you can make a decision on what the fixed reclaim limit ought to be. Number of cartridges: This is a decision with seeral factors to be considered: In ProtecTIER, the capacity of your repository is spread across all your defined irtual cartridges. If you define only a small number of irtual cartridges in ProtecTIER Manager, you may end up with cartridges that hold a large amount of nominal data each. While this may reduce cartridge complexity, it could also affect restore operations in that a cartridge required for a restore may be in use by a backup or housekeeping task. Preemption can resole this issue, but it may instead be better to define extra cartridges so that your data is spread oer more cartridges and dries to make the best use of your irtual tape enironment. Chapter 17. Backup application serers 183

216 Reuse delay period for storage pool cartridges: When deciding how many irtual cartridges to define, remember to consider the current storage pool reusedelay alue. This is usually equal to the number of days your ITSM database backups are retaining before expiring them. The same delay period should apply to your storage pools that store data on ProtecTIER irtual cartridges and you may need to increase the number defined to ensure that you always hae scratch cartridges aailable for backup. Collocation: When using a irtual library, you should consider implementing collocation for your primary storage pools. If you begin a restore while another task (for example, a backup or cartridge reclamation) is using the irtual cartridge, you may not be able to access the data on it immediately. Using collocation will mean all your data is contained on the same set of irtual cartridges. Because you do not hae any of the restrictions of physical cartridges normally associated with this feature (such as media and slot consumption), you can enable the option quite safely. Consider these points when determining how many irtual cartridges are to be created. Remember that you can always create additional irtual cartridges at any time. Physical tape: Depending on your data protection requirements, it may still be necessary to copy the de-duplicated data to physical tape. This can be achieed by using standard ITSM copy storage pools that hae deice classes directing data to physical libraries and dries. Symantec/Veritas NetBackup The following configuration options in Symantec NetBackup should be checked and if necessary changed to assist with the optimum performance of ProtecTIER: 1. Ensure Multiplexing is disabled. 2. The NUMBER_DATA_BUFFER file should be at least 32 and the SZ_DATA_BUFFER file should be at least ( recommended). On an AIX system, these buffers can be configured by creating the files on the NetBackup media serer: /usr/open/netbackup/db/config/size_data_buffers /usr/open/netbackup/db/config/number_data_buffers 3. Client compression should be disabled. 4. Tape Encryption (for images going to the TS7650G) should be disabled. 5. When creating the irtual TS7650G library, select either the DTC emulation (for creating a P3000 library) or V-TS3500 (identical to TS3500 emulation). This is a requirement from Symantec for NetBackup support. 6. When using Windows Master or Media serers, consider using NetBackup deice driers oer Natie windows deice driers, per Veritas recommendations. Legato Networker EMC NetWorker, formerly Legato NetWorker, is a centralized, automated backup and recoery product for heterogeneous enterprise data. The NetWorker Serer runs on all major operating systems, such as on AIX, Linux, Windows, SUN Solaris, and HP-UX. The NetWorker Storage Node, which is a kind of LAN-free client with proxy node capability, runs on all major operating systems. The proxy node capability of the 184 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

217 Storage Node can receie data from other NetWorker clients oer the LAN and store the data directly to the storage deice. Only the Meta Data will be handled by the NetWorker Serer. The NetWorker Client sends the backup data either to the NetWorker Serer or to a NetWorker Storage Node. There are different clients aailable for the integration of special applications, like NetWorker for IBM DB2. A NetWorker Domain consists of one NetWorker Serer, seeral NetWorker Clients, and seeral NetWorker Storage Nodes can exist in one NetWorker Domain. There is no data exchange or storage resource sharing outside one NetWorker Domain. In addition, if tape dries must be shared between one or more Storage Nodes and the NetWorker Serer, then additional licenses are required. Therefore, ProtecTIER might be a great solution for sharing physical tape resources. The following configuration options in EMC NetWorker should be checked and if necessary changed to assist with optimum performance of ProtecTIER: 1. Use 512K I/O for the irtual tape dries this proides the best factoring ratio. 2. If possible, use NetWorker 7.3, which allows multiplexing to be completely disabled. 3. Disable CDI on all irtual tape dries. 4. Disable client compression. 5. Set parallelism to 1 on all irtual tape dries. Chapter 17. Backup application serers 185

218 186 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

219 Chapter 18. Deploying replication with specific back up applications As a general rule, the preferred method of operation is to imitate the procedure used with physical cartridges. Implement the time-frame mode of operation such that for eery 24 hour cycle there is a backup window and then a replication window. The user should make sure there is enough bandwidth and time allotted so that there will be no oerlap and no replication backlog. A typical operational flow is described below: Perform regular daily backups to the ProtecTIER system during the defined backup window. The system should be set up such that replication will then start and should be able to finish before the next backup cycle starts. At this point the user should hae a complete and easily recoerable set of their latest daily backup including the backup-application catalog image. In case of a disaster the user can reert back to that last completed set of backups so the RPO is within the 24 hour window that is typical for the SLA. Determining which cartridges at the DR site to restore Once the DR site backup application serer is recoered, the user will need to reiew the status of the replicated cartridges to ensure their replication consistency with the backup catalog or database. Assessing cartridge status and syncing with the catalog Once the DR site backup application serer is recoered, the user will need to reiew the status of the replicated cartridges to ensure their replication consistency with the backup catalog or database. The following explains the process for assessing cartridge status on the DR site and synchronizing the backup application catalog with the cartridges. Before running a restore for disaster recoery, you must erify that the list of associated cartridges are marked as "In-Sync" with the primary site, otherwise an earlier full backup image must be used for recoery. The easiest way to determine the time of the last full backup is if you hae a specific time each day where your replication backlog is zero (i.e. there is no pending data to replicate and backups are not running). If this is not the case, you can assess the cartridges by recoering the backup application catalog and scanning it to find the last full backup where its associated cartridges completed replication. Recoering the backup application catalog There are seeral ways to obtain a copy of the catalog at the DR site: From a catalog backup on a irtual cartridge that will be replicated to the DR site. From disk-based replication, or by other means. Copyright IBM Corp. 2008,

220 If the catalog is backed up to a irtual cartridge, check on the DR site that this cartridge appears as In-Sync with the primary site. If the cartridge is not In-Sync, you will need to compare the cartridge's last sync time with the time of the last full backup. To recoer the backup application catalog from a backup on a irtual cartridge, you must work with the replicated cartridges on the hub to get an updated copy of the catalog back to the DR site. From the Systems Management window, select the Replica properties iew on the Cartridges tab and use the following guidelines for each cartridge before running the procedure for recoering the catalog: Note: The procedure for recoering the selected catalog backup depends on the backup application and is documented in the backup application official documentation. If the cartridge is replicated, either a red `X` or a green checkmark will appear in the In-Sync column. If the In-Sync property has a green checkmark, then nothing further needs to be erified and this cartridge is alid for recoery. If the cartridge is not marked In-Sync, refer to the Last sync time column. This column displays the last time each cartridge's data was fully replicated to the DR site. The cartridge marked with the most recent Last sync time date should be used to recoer the backup application catalog. Note: The sync time is updated during replication, and not only when replication for this cartridge is finished. Recoering the data Once recoered, scan the backup application catalog and search for the full backup image you want to recoer: Get the start and end backup time of the full backup image. View the list of cartridges associated with this full backup. Use the PTCLI inentory filter command to filter the cartridges according to the following properties: In-Sync Last update time Last sync time All the cartridges marked as In-Sync are alid for recoery. For those cartridges not marked as In-Sync, compare between the last update time, which represents the last time the replica was updated, and the last sync point destination time. If the last update time is less than or equal to the last sync point destination time, the replica cartridge has consistent point in time. Otherwise, the cartridge is incomplete, or in-transit. If the cartridge has consistent point in time, ensure this time stamp is larger than the full backup image end time. This will indicate that the cartridge contains all the required data for this recoery operation. Otherwise, the user will hae to use a preious full backup image for recoery. 188 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Figure 86. Cartridge status report (in Excel) NetBackup (NBU) The user may hae a case where the cartridge sync point is after the backup start time, but before the end of the backup.

221 Figure 86. Cartridge status report (in Excel) NetBackup (NBU) The user may hae a case where the cartridge sync point is after the backup start time, but before the end of the backup. This may happen in cases where replication is working in parallel to the backup. If the backup has many cartridges, the first cartridges may finish replicating before the backup ends and they get a sync point earlier than the backup end time. As such, if the last sync time flag on one (or more) of the cartridges indicates a time later than the backup start time, but earlier than the backup complete time, those cartridges need further inspection. Scan the backup application catalog for each of those cartridges and get the backup start time and the backup complete time. If the last sync time flag on all the cartridges indicates a time later than the backup complete time, your backup image was fully replicated. Remember: When processing the cartridge list to find a complete set of DR tapes, you must keep track of the date/time discrepancies. Compare the date/time alues of the source master backup serer and the source ProtecTIER system. The destination enironment may be in a different time zone or may be set to the incorrect date/time and as such, unreliable. Thus, use the source date/time, rather than the destination sync time when comparing cartridge states to the backup catalog/database. The destination sync time should only be used to determine which cartridges are whole. In addition, there could be a time difference between the source backup serer and the source ProtecTIER serer. Your Administrator should be aware of the discrepancy, measure it regularly and communicate the delta to the DR Administrator or operator(s). For instance, if the backup serer is 2 hours behind, a cartridge may hae a sync time that precedes its backup complete time, i.e. it will appear as a preious, old backup. If there is uncertainty regarding the time differences, compare the nominal size of the cartridge to the Catalog/DB alue as an additional (not a substitute) layer of erification. This section describes the utilization of the IBM ProtecTIER (TS7650) IP replication system in a NetBackup enironment and discusses the ramifications of possible scenarios as it relates to Disaster Recoery (DR). The chapter proides some background on setting up the NetBackup application for Backup and Restore, the different issues required to set up NetBackup for DR and how IBM ProtecTIER can be deployed optimally with NetBackup for DR purposes. Chapter 18. Deploying replication with specific back up applications 189

222 An oeriew of the operations of ProtecTIER IP replication in a NBU enironment is proided as well as a description of the arious scenarios that could occur. Operational best practices are gien where applicable. NetBackup Background NetBackup is an Open Systems Enterprise backup software solution. NBU's architecture has three main building blocks: Clients: the machines with the data that needs backing up Media Serers: the machines connected to backup deices Master Serer: the machine controlling the backups Collectiely, Master, Media and Clients are known as a NBU Domain, typically one Master will control multiple Media Serers (typically 4 30) and backup many Clients (typically 10 1,000 +). As the Master serer is the critical box in the domain, it is usual for this to be clustered, usually deploying other software aailable from the endor (host based Volume Manager for disk mirroring and cluster control ia Veritas Cluster serer). Setting up NetBackup for Backup and Restore NBU deployments typically use a schema of weekly full backups and daily incremental. There are two types of incremental: Cumulatie: backs up eerything since last full Differential: backs up eerything since last backup Most backup and restore deployments now use Differential incremental backups, since they are smaller and faster, howeer, Cumulatie backups are now becoming more common when people are thinking of DR. From an operational standpoint, significantly more data is backed-up than restored. Typically in an operational backup enironment a Backup Policy will be daily backing up the entire serer, howeer, a typical restore may be for a single file. The product was engineered with this in mind. There are configurations within the product to significantly improe backup performance (Multiplexing, Multiple Data Streams, Fragment Size, and Buffer Size) which hae a consequence of making full restores slower. For example if Multiplexing is set to 5 then the back-ups from 5 clients will end up on the same tape. If the user is only restoring single files for a client, then the product will know which tape fragment to go to and modern high speed tape dries with Fast Block Locate make the process as fast as possible. In contrast restoring an entire client will inole multiple reads and tape skips such that restoring all 5 clients will proe time consuming, as restores are sequential. Setting up NetBackup for disaster recoery When thinking of NBU for disaster recoery planning, the user should consider a number of key issues: 1. NBU architecture: does the NBU domain span across the primary and DR sites or are they two separate domains? This is a key step to understand and has strong implications on DR. 190 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

223 2. Classification of Clients (RTO): when a company plans for DR, each serer will be gien a Recoery Time Objectie (RTO) depending on the importance of its application and the associated data to the business. Serers with ery short RTOs (typically less than 24 hours) will not use backup systems for DR, these serers will typically use clustering, olume mirroring or some form of data replication to maintain business continuity. Serers with RTOs typically greater than 24 hours will use tape for DR. Serers will then be prioritized into RTO bands of typically 24, 36, 48, or 72 hours depending on business requirements. Note: Usually only production serers are thought of for DR, Test and Deelopment are usually out of scope for DR, although ProtecTIER makes DR protection affordable for all applications in any gien enironment. 3. Classification of Clients (RPO): running alongside RTO is Recoery Point Objectie (RPO). This is the point in time to which the serer must be recoered. For the majority of serers using a tape DR position, the RPO will be the point of the last complete backup before the disaster. For example: if a disaster strikes at 9:00am, the RPO will be the preious night's backup. To cater to these disaster recoery requirements the following architecture has become common: 1. Architect NBU with a single domain spanning both sites (NBU Clustered). The Master uses host based replication to mirror the NBU databases and a Clustering product to manage host failoer. This is important as it means that in the eent of a DR eent the Master operations can seamlessly failoer to the DR site. As the NBU databases hae been replicated all of the backup information is known at the DR site and so Restores can begin immediately. 2. Cross Site backups: two main options: Connect Clients from one site ia IP to Media serers on the DR site. Backups then reside in the DR site Library ready for restore. The primary downside is that large IP pipes are required and backups limited to speed of cross site network, since whole data being transmitted. Stretched Tape SAN: local Client backs up to local Media serer which then sends the data across the SAN to the DR site. Backups then reside in the DR site Library for ready for restore. Downside large SAN pipes required and backups limited to speed of cross site SAN, since whole data being transmitted Downside of both options: as normal backups are now resident in the DR library any regular restores will be significantly slower, since data has to come from a DR library. 3. Multiplexing turned off: to achiee the best restore performance (to hit RTOs) NetBackup needs to be configured without Multiplexing. 4. Dedicated olume pools of for RTO tiers or een clients: to achiee optimum restore times (and gien sufficient media in libraries) haing indiidual olume pools per client will achiee optimum restore performance. In this way there is no contention between media when doing restores. In the physical tape world where tape dries are limited this is often impractical (howeer, worth noting for irtual world). It should be stressed this is not just a theoretical concept, systems in current production hae implemented cross site backups with client backups going to dedicated olume pools, although this was limited to 30 clients with low RTOs since the implication of separate olume pools is that you need separate Backup Policies per client. If the NBU configuration at the DR site is not in the same domain as the primary site, then a different strategy is required. Since the DR site has no knowledge of Chapter 18. Deploying replication with specific back up applications 191

224 the backups, tapes etc., that hae been used by the primary site, the first operation is to get a copy of the NBU catalog from the primary site and load into the Master on the DR site. This can either be done ia disk replication or tape backup. Note: NBU catalog backups are different from regular backups and need special handling to restore. Not haing the catalog aailable at the DR site means that eery tape would hae to be imported to build the catalog, which is impractical and is not considered a iable option. With the catalog in place at the DR site, the tapes can be loaded into the library, the library inentoried, and restores can commence in a ery short time frame. Optimal ProtecTIER deployments with NBU for disaster recoery The following are key concepts that need to be discussed with the NBU architects and senior administrators within the user's organization: 1. In normal operation - backup to local VTL, this proides quick backups and quick restores. 2. As ProtecTIER's VTL replication is at the cartridge leel, and only the deduplicated data is being transferred oer the wire, it will significantly reduce the bandwidth needed compared with traditional cross site replication/backups. 3. Hae serers for DR (usually production serers) split into their RTO classifications and plan for separate olume pools and backup policies. 4. For serers with low RTO requirements consider indiidual olume pools and backup policies. 5. Turn multiplexing off for ALL backups requiring DR; since MPX is done at either Storage Unit leel or Backup Policy leel, this is easy enough. Note it is best practice to disable MPX for all backups going to ProtecTIER VTL. 6. Use large fragment sizes: also configured at the Storage Unit leel, this will improe restore performance of whole file systems. 7. Disable Storage Checkpoints: storage checkpoints are a mechanism where pointers are added to the backup stream so that if the backup failed a rerun of the backup would start from the last storage checkpoint, as opposed from the start of the backup. Storage Checkpoints will hae an aderse effect on the deduplication ratios. 8. Disable software compression (if used) as this may reduce the efficiency of the ProtecTIER deduplication and affect its factoring ratio. Once the user's architects and the administrators hae understood the basic concepts, they need to apply them to their architecture, deciding whether to hae one domain spanning both sites, or two separate domains. If single domain - same catalog is shared across sites, and is always updated with the whereabouts of all cartridges: 1. ProtecTIER replicates cartridges per the policies set by the user cartridges are copied onto a irtual shelf at the DR site ProtecTIER. 2. Cartridges can also be moed using the replication policy AND utilizing the isibility control switch, so they will reside and be isible to the NBU application at the DR site (although the actual data will be aailable to ProtecTIER on both sites): 192 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

225 Eject (export) cartridge from primary library Inject (import) to inentory at the DR site library This operation can be set / done using NBU's Vault or manually. Either way it can be automated from within the NBU enironment 3. If disaster occurs the user either needs to inject the cartridges from the DR site shelf into the DR site library and inentory (see #1) or if the isibility switch control method was used, the user is ready to begin restoring and or performing local backups at the DR site. 4. Once the disaster situation has cleared and the primary site is back on-line, the user should use the failback procedure (as explained in detail in the IBM System Storage with ProtecTIER User Guide) to moe their main operation back to the primary site, including potential newly created cartridges from the DR site that will be replicated to the Primary site. If separate (multiple) domains approach is used: 1. ProtecTIER replicates cartridges per the policies set by the user cartridges are copied to the irtual shelf at the DR site. 2. User should perform catalog backups to irtual tape at the end of its backup window and replicate it at the end of each replication cycle to the DR site. This approach will ensure that at the end of eery day (assuming 24 hour backup/replication cycle); the DR site will hold a full set of replicated cartridges with a matching NBU catalog allowing for an RPO of one day. 3. When disaster strikes the user's FIRST step is to get NBU's catalog back on the DR site NBU serer by restoring the cartridge(s) containing the catalog. 4. The second step is to inject the cartridges from the DR shelf at the DR site ProtecTIER into the library and perform an inentory. Once the NBU serer is up and running with the DR repository, restores and local backup operations can resume at the DR site. 5. Once the disaster situation has cleared and the primary site is back on-line the user should use the failback procedure (as explained in details in the IBM System Storage with ProtecTIER User Guide) to moe their main operation back to the primary site, including potential newly created cartridges from the DR site that will be replicated to the Primary site. Disaster recoery scenarios ProtecTIER replication significantly reduces cross site backup traffic since it only replicates deduplicated data, improes ease of operation (by enabling simple inject and inentory actions), and makes recoery in the eent of a disaster or DR test easy to plan and implement. Deploying ProtecTIER into a NBU enironment will make the business more secure and remoe significant headaches from NBU architects and administrators. The following section proides a number of scenarios, detailing the necessary disaster recoery steps: Single Domain Enironment 1. Master Clustered: All backups are completed, and all of the replication operation is completed as well. Disaster strikes - as Master Clustered, then the NBU catalog database at the DR site is up-to-date no NBU recoery action is necessary. Within ProtecTIER, the user should moe tapes from the irtual shelf to import slots. Within NBU, the library needs to be inentoried, remembering to select the Chapter 18. Deploying replication with specific back up applications 193

226 option to import tapes. Once the inentory operation is complete, restores can begin as well as local backups at the DR site. 2. Master Clustered: All backups are completed, howeer some or all of the replication operation is incomplete. Disaster strikes - as Master Clustered, then the NBU catalog database at the DR site is up-to-date howeer, since replication was not complete, the user should roll back to the preious night catalog and cartridges set (RPO of one day). Once Inentory operation is complete, restores can begin as well as local backups at the DR site. Multiple Domain Enironment 3. Master not Clustered: All backups are completed, and all of the replication operation is completed as well. Disaster strikes - as Master not Clustered then the catalog database at the DR site is NOT up-to-date, which means that NBU catalog recoery action is necessary. The first operation is to identify the latest backup catalog tape and load (import) it into the ProtecTIER library at the DR site. Once the library is inentoried, a standard NetBackup Catalog Recoery operation can begin. When recoery is completed then restores can begin as well as local backups at the DR site. 4. Master not Clustered: All backups are completed, howeer, some or all of the replications operation is incomplete. Disaster strikes - as Master not Clustered then the catalog database at the DR site is NOT up-to-date, which means that NBU catalog recoery action is necessary The first operation is to identify the preious nights NBU backup catalog tape and load (import) it into the ProtecTIER library at the DR site. Once the library is inentoried, a standard NetBackup Catalog Recoery operation of that preious night's catalog can begin. Once recoery is completed, restores can begin (RPO of one day) as well as local backups at the DR site. Note: When working in a single-domain NBU enironment (NBU Master Clustered) AND utilizing the isibility control switch option within ProtecTIER to moe cartridges from the primary site directly into a DR site library - then the catalog is ALWAYS up-to-date with the whereabouts of all cartridges in both the primary and DR repositories. How to determine what is aailable for restore at the disaster recoery site This section suggests ways for the users to determine what catalog and data sets are complete or not, matched, and readily aailable to restore at the secondary/dr site. Which database copy at the DR site is alid Before running a restore for disaster recoery, the user must erify that the list of associated cartridges is completely replicated to the DR site; otherwise an earlier full backup image must be used for recoery (usually the preious night's). The easiest way to determine the time of the last full backup is if the user has a specific time each day where the replication backlog is zero (there is no pending data to replicate). 194 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

227 If this is not the case, then the user can assess the cartridges by recoering the backup application catalog and scanning it to find the last full backup where its associated cartridges completed replication. The best practice for ensuring that a copy of the catalog is aailable at the DR site is to use the natie replication function of ProtecTIER. Each day the catalog should be backed up on a irtual cartridge following the daily backup workload such that it will be replicated to the DR site at the end of each replication cycle. If the catalog is backed up to a irtual cartridge, through the cartridge iew of the library in ProtecTIER Manager, query each of the cartridges used for catalog backup to find the most recent sync dates marked on the cartridges. Assuming there are multiple backup copies, you need to find the latest backup that finished replication. To recoer the backup application catalog from a backup on a irtual cartridge, you must work with the replicated cartridges to get an updated copy of the catalog to the DR site: 1. Each cartridge has a Last Sync Time that displays the last time the cartridge's data was fully replicated to the DR site. (The sync time will be updated during the replication, not only when the replication for this cartridge is finished.) 2. The cartridge marked with the most recent Last Sync Time date should be used to recoer the backup application catalog. Determine which cartridges at the DR site are alid for restore Once the DR site NetBackup serer is recoered, you will need to reiew the status of the replicated cartridges to ensure their replication consistency with NetBackup catalog. To achiee this please use the aailable ProtecTIER system Replicated Cartridges Status Report as explained in details in the Introduction section of this chapter. Eject and Inject commands from the backup application Although the process can be manually scripted to enable automation, the easiest way of using NBU CLI commands for automating this process is by utilizing the Vault serice within the NBU software. How to eject a cartridge from a library It can be initiated by using wizard command from NBU GUI or by using ault option. If you're using ault command, first run your ault policy: >/usr/open/netbackup/bin/ltrun<ault policy name> At the end of the backup, eject the cartridge using command: >/usr/open/netbackup/bin/ltinject<ault policy name> How to insert a cartridge to a library Inject Carts from a CAP is performed by a simple Robot Inentory selecting the Import from Load port option. For automation of this process you can use cli commands. To update the Media Manager olume: >/usr/open/olmgr/bin/mupdate -rt dlt r Chapter 18. Deploying replication with specific back up applications 195

228 Tioli Storage Manager (TSM) Link for NBU procedure to recoer a master serer from existing DB copy You hae two options for re-creating the NBU backup catalog: online and offline. The Hot Online Catalog Backup procedure can be found under "Online, hot catalog backup method" section from NBU help. The Offline, cold catalog backup procedure can be found under "Offline, cold catalog backup method" section from NBU help. Consult the official NetBackup application documentation for more details. This section describes the utilization of the ProtecTIER IP replication system in a TSM enironment and discusses the ramifications of possible scenarios as it relates to disaster recoery situation. An oeriew of the operations of ProtecTIER IP replication in a TSM enironment is proided as well as a description of the arious scenarios that could occur. Operational best practices, where applicable, are also offered. Technical oeriew Figure 24 illustrates a typical TSM enironment using ProtecTIER. The TSM enironment is straightforward. The TSM serer(s) are connected to storage deices (disk, real tape or irtual tape) which are used to store data backed up from the clients it is sering. Eery action and backup set that TSM processes is recorded in the TSM database. Without a copy of the TSM database, a TSM serer cannot restore any of the data that is contained on the storage deices. ProtecTIER proides a irtual tape interface to the TSM serer(s) and allows the creation of two storage pools the ACTIVE TSM pool and the ONSITE TAPE Pool (called PT_TAPE_POOL) in the example in Figure 24 below. The user can also maintain another storage pool to create real physical tapes to take offsite (called OFFSITE_TAPE) in our example. The user has sized the ProtecTIER system to store all actie and about 30 days inactie client files on irtual tape. The customer also created an ACTIVE TSM Pool which is also hosted on the ProtecTIER system, which contains the most recent (actie) file backed up from all client serers. The ACTIVE Pool is where client restores will come from. The adantages of this architecture is that it has eliminated the use of physical tape in the data center and allows restores to occur much faster as they are coming from the ProtecTIER disk based irtual tape s. real tape. 196 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

229 Figure 87. Typical TSM Enironment with ProtecTIER (Pre Replication) ProtecTIER IP replication in the TSM enironment ProtecTIER's IP replication functionality proides a powerful tool enabling users to design robust disaster recoery architecture. Thanks to the deduplication of data users can now electronically ault backup data with much less bandwidth, thus changing the paradigm of how data is taken offsite for safe keeping. ProtecTIER IP replication can eliminate the expensie and labor extensie handling, transporting and storing of real physical tapes for DR purposes. Figure 25 illustrates how ProtecTIER's IP replication functionality can be used in a TSM enironment. This user has chosen to use ProtecTIER to replicate all of the irtual tapes in the PT_TAPE_POOL offsite for DR purposes. The user also backs up their TSM DB to irtual tapes. These DB backup irtual tapes are also replicated to Site B. In the eent of a disaster, the user now has the ability to restore their TSM serer in Site B, which is then connected to a ProtecTIER irtual tape library which contains the TSM Database on irtual tape as well as all of the client ACTIVE files on irtual tapes. Offsite physical tape is now only used for long term archiing. The physical tapes are also contained in the TSM database that is protected on irtual tape in Site B. Chapter 18. Deploying replication with specific back up applications 197

230 Figure 88. ProtecTIER replication and TSM TSM database replication status When designing a ProtecTIER replication enironment, one of the most important questions to consider is what is the recoery point objectie (RPO)? In other words, how much lag time is acceptable for a backup, written to irtual tape in Site A, to be completely replicated to Site B? The RPO for tape-based DR is typically 24 hours. For example, in a generic user case, backups begin at 6 p.m. on Monday eening and the tape courier picks up the box of physical tapes at 10am Tuesday morning for transport to the ault. Therefore, on a typical day, there can be a 14 hour delay between the time the first eening's backup begins and when the data is safely offsite. Howeer, if a disaster occurs before the courier arries - for example, a fire destroys the entire set of Monday's backup tapes early Tuesday morning - the customer will recoer the applications from Sunday's replication workload. Sunday's workload is by default, a day behind, proiding a 24 hour RPO. With ProtecTIER replication, it is possible to get the backups offsite almost immediately with enough bandwidth. Because ProtecTIER is always working within the backup application paradigm, the RPO typically remains 24 hours. The tremendous improements enabled by ProtecTIER are in the recoery time (restoring data rapidly from disk) and reliability of the DR operation. With TSM, nothing can be restored without a good copy of the TSM database. Therefore, the status of the irtual tape(s) holding the TSM database is of utmost importance. 198 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

231 Scenario One below (Figure 26) illustrates a situation where the user strategy is to utilize ProtecTIER replication in a scheduled time frame mode once the backup cycle is completed. This allows the data to begin replicating to the offsite location immediately following the backup window's completion. In this first scenario, assuming the replication window ended at 6am, the disaster strikes at 6:15 am, 15 minutes after the last night's replication cycle was completed. The status of all irtual cartridges is 100% completely replicated when the disaster eent occurs. The DR restoration actiities in Scenario One are therefore straightforward. The user brings up the TSM serer using TSM DB backup that occurred at 5am Monday on Tape 5. The TSM DB has knowledge of all prior tapes, and restores can begin immediately from Tapes 2-4, which hold the last client backups. Figure 89. Scenario one, replication complete Scenario Two below (Figure 27) illustrates the second possible scenario for disaster recoery. This disaster eent occurs at 5:15am, shortly (45 min) before the nightly replication cycle has completed, which means that some of last night's backups hae not yet been 100% replicated to Site B. At 5:15 am, Mon morning, tapes 4 and 5 hae not been completely replicated when the link goes down due the disaster eent. Attempts to restore the TSM DB are unsuccessful as the DB was not entirely replicated yet. The user must therefore restore from the last preious TSM DB backup, which is on tape 1 from Sunday at 5 p.m. Since tapes 2 and 3 were created after Sunday at 5 p.m., they are not in the TSM DB and cannot be used. Clients A, B and C must therefore be restored from tapes that exist in the TSM DB as of Sunday at 5 p.m. Chapter 18. Deploying replication with specific back up applications 199

232 Figure 90. Scenario two, replication incomplete Scenario Three below (Figure 28) illustrates another possibility where the most recent TSM DB irtual tape gets replicated, but not all associated tapes hae completed replication when the disaster eent occurs. As the drawing shows, the disaster strikes at 5:30 am (30 min prior to the anticipated replication cycle completion). At this point, tapes 1, 2, 3 and 5 hae replicated 100% and tape 4, due to the size of the backup dataset stored on it, had not completely finished replication when the disaster eent occurs. The TSM serer is restored using the TSM DB held on Tape 5 (backed up at 5am Monday morning). Howeer, since Tape 4 had not completed replication when the disaster occurred, the tape must be audited and the TSM DB fixed to represent exactly what is on the tape. The TSM command would be: # audit olume 4 fix=yes. The audit/fix would need to be performed for eery tape that had not been fully replicated when the disaster struck. 200 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

233 Figure 91. Scenario three, replication incomplete, DB complete Reclamation considerations Another important consideration when using ProtecTIER IP replication in a TSM enironment is the effect that Reclamation will hae. Reclamation is the TSM process that moes expired data off of tapes and moes any unexpired data to other tapes, thus freeing up space and returning empty tapes to the scratch pool. All reclamation tape moement is recorded in the TSM DB and it must be replicated to reflect an accurate tape enironment in the eent of a disaster. If the disaster strikes while reclamation is running and the DB backup aailable for restore doesn't reflect the data on tape moements, an audit/fix of the olumes may be required. The effect of reclamation can be delayed by the REUSEDELAY. When the user defines or updates a sequential access storage pool, the user can use the REUSEDELAY parameter. This parameter specifies the number of days that must elapse before a olume can be reused or returned to scratch status after all files hae been expired, deleted, or moed from the olume. When the user delays reuse of such olumes and they no longer contain any files, they enter the pending state. Volumes remain in the pending state for as long as specified with the REUSEDELAY parameter for the storage pool to which the olume belongs. Delaying reuse of olumes can be helpful under certain conditions for disaster recoery. When files are expired, deleted, or moed from a olume, they are not Chapter 18. Deploying replication with specific back up applications 201

234 actually erased from the olumes. The database references to these files are remoed. Thus the file data may still exist on sequential olumes if the olumes are not immediately reused. This preents a situation where reclamation has run after the last TSM database backup to irtual tape, and reclaimed tapes hae been replicated. If the disaster occurs at a point after the reclamation ran and these tapes fully replicated, but this isn't reflected in the TSM database, the user could get a mismatch. A disaster may force the user to restore the database using a database backup that is not the most recent backup. In this case, some files may not be recoerable because the serer cannot find them on current olumes. Howeer, the files may exist on olumes that are in pending state. The user may be able to use the olumes in pending state to recoer data by doing the following: 1. Restore the database to a point-in-time prior to file expiration. 2. Use a primary or copy storage pool olume that has not been rewritten and contains the expired file at the time of database backup. If the user backs up their primary storage pools, set the REUSEDELAY parameter for the primary storage pools to 0 to efficiently reuse primary scratch olumes. For their copy storage pools, the user should delay reuse of olumes for as long as they keep their oldest database backup. Users can also disable the reclamation by changing the reclaim parameter of the update stg command: update stgpool<stg POOL NAME> reclaim=100 Consider disabling reclamation altogether if the user has deployed ProtecTIER with small irtual tape cartridge sizes (for example 50GB or less). This is because there is mathematically less to reclaim on a smaller olume than a larger one, and eerything on a smaller olume is likely to expire together, just requiring the tape to go immediately to the scratch pool. Summary Assure catalog/database backups are performed to irtual tape and replicated along with the daily workload each day. A database backup should be performed and replicated at the end of each backup cycle. A separate tape pool should be created for database backups Consider adjusting TSM reclamation processing to assure actions are in sync with the replicated database. This includes setting the REUSEDELAY to proide a 2 or 3 day delay in the reuse of tapes that hae been reclaimed. Consider deploying additional network bandwidth to assure a back-log does not occur. Database synchronization becomes a greater issue if the back-log exceeds one backup cycle (typically hours). How to determine what is aailable for restore at the DR site This section suggests ways for the users to determine what catalog and data sets are complete (or not), matched, and readily aailable to restore at the secondary/dr site. 202 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

235 Which database copy at the DR site is alid Before running a restore for disaster recoery, you must erify that the list of associated cartridges are completely replicated to the DR site, otherwise an earlier full backup image must be used for recoery. The easiest way to determine the time of the last full backup is if you hae a specific time each day where your replication backlog is zero (there is no pending data to replicate). If this is not the case, you can assess the cartridges by recoering the backup application catalog and scanning it to find the last full backup where its associated cartridges completed replication. There are seeral ways to obtain a copy of the catalog at the DR site: From a catalog backup on a irtual cartridge that will be replicated to the DR site From disk-based replication, or by other means If the catalog is backed up to a irtual cartridge, through the cartridge iew of the library in ProtecTIER Manager, query each of the cartridges used for catalog backup to find the most recent sync dates marked on the cartridges. Assuming there are multiple backup copies, you need to find the latest backup that finished replication. To recoer the backup application catalog from a backup on a irtual cartridge, you must work with the replicated cartridges to get an updated copy of the catalog to the DR site: 1. Each cartridge has a last sync time that displays the last time the cartridge's data was fully replicated to the DR site. (The sync time will be updated during the replication, not only when the replication for this cartridge is finished.) 2. The cartridge marked with the most recent last sync time date should be used to recoer the backup application catalog. Determine which cartridges at the DR site are alid for restore Once the DR site TSM serer is recoered, you will need to reiew the status of the replicated cartridges to ensure their replication consistency with TSM DB. To achiee this please use the aailable ProtecTIER system Replicated Cartridges Status Report as explain in the Introduction section of this chapter. Eject and Inject commands from the backup application 1. How to eject cartridge from a library Use command checkout libolume. It can be inoked through WEB GUI or from CLI. For example: TSM:PUMA_SERVER1>CHECKOUT LIBVOL <name of library> REMOVE=BULK FORCE=yes CHECKLABEL=YES VOLL How to Inject/Import (Insert) cartridge to a library: Use Add cartridge command from WEB GUI or use checkin libolume command from CLI. For example: TSM:PUMA_SERVER1>CHECKIN libol <name of library> search=bulk checklabel=barcode status=scratc And run command reply with ID process for finishing the import operation: Chapter 18. Deploying replication with specific back up applications 203

236 TSM:PUMA_SERVER1>reply 2 Note: Full TSM DB recoery procedure can be found under Chapter 19, "Disaster Recoery Manager", in theibm Tioli Storage Manager Implementation Guide. Reclamation considerations Another important consideration when using ProtecTIER IP replication in a TSM enironment is the effect that Reclamation will hae. Reclamation is the TSM process that moes expired data off of tapes and moes any unexpired data to other tapes, thus freeing up space and returning empty tapes to the scratch pool. All reclamation tape moement is recorded in the TSM DB and it must be replicated to reflect an accurate tape enironment in the eent of a disaster. If the disaster strikes while reclamation is running and the DB backup aailable for restore doesn't reflect the data on tape moements, an audit/fix of the olumes may be required. The effect of reclamation can be delayed by the REUSEDELAY. When the user defines or updates a sequential access storage pool, the user can use the REUSEDELAY parameter. This parameter specifies the number of days that must elapse before a olume can be reused or returned to scratch status after all files hae been expired, deleted, or moed from the olume. When the user delays reuse of such olumes and they no longer contain any files, they enter the pending state. Volumes remain in the pending state for as long as specified with the REUSEDELAY parameter for the storage pool to which the olume belongs. Delaying reuse of olumes can be helpful under certain conditions for disaster recoery. When files are expired, deleted, or moed from a olume, they are not actually erased from the olumes. The database references to these files are remoed. Thus the file data may still exist on sequential olumes if the olumes are not immediately reused. This preents a situation where reclamation has run after the last TSM database backup to irtual tape, and reclaimed tapes hae been replicated. If the disaster occurs at a point after the reclamation ran and these tapes fully replicated, but this isn't reflected in the TSM database, the user could get a mismatch. A disaster may force the user to restore the database using a database backup that is not the most recent backup. In this case, some files may not be recoerable because the serer cannot find them on current olumes. Howeer, the files may exist on olumes that are in pending state. The user may be able to use the olumes in pending state to recoer data by doing the following: 1. Restore the database to a point-in-time prior to file expiration. 2. Use a primary or copy storage pool olume that has not been rewritten and contains the expired file at the time of database backup. If the user backs up their primary storage pools, set the REUSEDELAY parameter for the primary storage pools to 0 to efficiently reuse primary scratch olumes. For their copy storage pools, the user should delay reuse of olumes for as long as they keep their oldest database backup. Users can also disable the reclamation by changing the reclaim parameter of the update stg command: update stgpool <stg pool name> reclaim= IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

237 Consider disabling reclamation altogether if the user has deployed ProtecTIER with small irtual tape cartridge sizes (for example 50GB or less). This is because there is mathematically less to reclaim on a smaller olume than a larger one, and eerything on a smaller olume is likely to expire together, just requiring the tape to go immediately to the scratch pool. Summary Assure catalog/database backups are performed to irtual tape and replicated along with the daily workload each day. A database backup should be performed and replicated at the end of each backup cycle. A separate tape pool should be created for database backups. Consider adjusting TSM reclamation processing to assure actions are in sync with the replicated database. This includes setting the REUSEDELAY to proide a 2 or 3 day delay in the reuse of tapes that hae been reclaimed. Consider deploying additional network bandwidth to assure a back-log does not occur. Database synchronization becomes a greater issue if the back-log exceeds one backup cycle (typically hours). How to determine what is aailable for restore at the DR site This section suggests ways for the users to determine what catalog and data sets are complete (or not), matched, and readily aailable to restore at the secondary/dr site. Which database copy at the remote site is alid Before running a restore for disaster recoery, you must erify that the list of associated cartridges are completely replicated to the remote site, otherwise an earlier full backup image must be used for recoery. The easiest way to determine the time of the last full backup is if you hae a specific time each day where your replication backlog is zero (there is no pending data to replicate). If this is not the case, you can assess the cartridges by recoering the backup application catalog and scanning it to find the last full backup where its associated cartridges completed replication. There are seeral ways to obtain a copy of the catalog at the remote site: From a catalog backup on a irtual cartridge that will be replicated to the remote site From disk-based replication, or by other means If the catalog is backed up to a irtual cartridge, through the cartridge iew of the library in ProtecTIER Manager, query each of the cartridges used for catalog backup to find the most recent sync dates marked on the cartridges. Assuming there are multiple backup copies, you need to find the latest backup that finished replication. To recoer the backup application catalog from a backup on a irtual cartridge, you must work with the replicated cartridges to get an updated copy of the catalog to the remote site: Chapter 18. Deploying replication with specific back up applications 205

238 EMC/Legato NetWorker 1. Each cartridge has a last sync time that displays the last time the cartridge's data was fully replicated to the remote site. (The sync time will be updated during the replication, not only when the replication for this cartridge is finished). 2. The cartridge marked with the most recent last sync time date should be used to recoer the backup application catalog. Determine which cartridges at the remote site are alid for restore Once the DR site TSM serer is recoered, you will need to reiew the status of the replicated cartridges to ensure their replication consistency with TSM DB. To achiee this please use the aailable ProtecTIER system Replicated Cartridges Status Report as explained in the Introduction section of this chapter. Eject and Inject commands from the backup application 1. How to eject cartridge from a library Use command checkout libolume. It can be inoked through WEB GUI or from CLI. For example: TSM:PUMA_SERVER1>CHECKOUT LIBVOL <name of library> REMOVE=BULK FORCE=yes CHECKLABEL=YES VOLLIST=<olume barcode> 2. How to Inject/Import (Insert) cartridge to a library: Use Add cartridge command from WEB GUI or Use checkin libolume command from CLI. For example: TSM:PUMA_SERVER1>CHECKIN libol <name of library> search=bulk checklabel=barcode status=scratch WAITTIME=60 And run command reply with ID process for finishing the import operation: TSM:PUMA_SERVER1>reply 2 Note: Full TSM DB recoery procedure can be found under Chapter 19, "Disaster Recoery Manager", in the IBM Tioli Storage Manager Implementation Guide. Legato NetWorker, from EMC, is a leading enterprise backup and recoery application. ProtecTIER enables an elegant disaster recoery solution within a NetWorker enironment. EMC Legato NetWorker and ProtecTIER Replication in a LAN/WAN Enironment Figure 29 below shows a two-site disaster recoery configuration which also supports high aailability. One of the ProtecTIER single or dual-node clusters is located at a production site and one ProtecTIER single or dual-node cluster is located at a DR site. The ProtecTIER systems are connected through a WAN. Backup workloads are written to the ProtecTIER system at the production site and then are replicated to the ProtecTIER system at the disaster recoery site. 206 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

239 Asynchronous data replication is supported between production and disaster recoery site. Once synchronized, only unique data such as changes to index, metadata and new data from Legato Backup Serer at the production site are replicated to the disaster recoery site. As a result, LAN or WAN bandwidth requirements are significantly reduced. Figure 92. Two site disaster recoery using IP Replication Replicating NetWorker Database (Bootstrap) Backups EMC Legato NetWorker bootstrap contains the media database, the resource database and NetWorker's client file index and is ery critical to the recoery of the NetWorker serer. In most cases the bootstrap backups are usually written or saed to a local-attached physical tape drie connected to the NetWorker serer and best practice from EMC is to hae the bootstrap backups from NetWorker done at least 2-3 times on a daily basis, with one backup send to offsite for disaster recoery and the rest copies kept onsite. With the integration of the ProtecTIER replication, bootstrap backups can be sent to the ProtecTIER system. When bootstrap backup is done to the ProtecTIER system, a second copy is done immediately ia the IP replication link to the disaster recoery site. This practice eases the system administration management of handling physical tapes for bootstrap backups. Legato disaster recoer procedures at DR Site Should the production site become unaailable, user operations can be restarted by using the replica data from the production site and using the following manual procedures. Chapter 18. Deploying replication with specific back up applications 207

240 During disaster recoery, the NetWorker database and configuration files must be made aailable at the disaster recoery site during the disaster recoery procedures. Legato's DR procedures used during disaster recoery stages with ProtecTIER: 1. The Legato standby serer at DR site must hae the exact same OS and patch leel as the Legato serer at the production site. 2. The Legato standby serer OS detects the ProtecTIER irtual tape library and tape dries. 3. Original directory location to which NetWorker serer was installed. 4. NetWorker serer installation media. Install the same ersion of the NetWorker serer software into its original location. 5. The Backup or clone olumes that contain the NetWorker serer bootstrap and indexes. See Determine which cartridges at the remote site are alid for restore on page Names of any links to NetWorker directories such as, /nsr to /ar/nsr. 7. Reinstall any NetWorker patches that were installed prior to the disaster. 8. Configured the new NetWorker serer to see the ProtecTIER irtual tape library and tape dries by using the jbconfig command. 9. From the ProtecTIER Replication Manager, moe the cartridges from the irtual shelf to the I/O slots of the selected VTL library. For example: To insert cartridge with bootstrap data, use the following command: nsrib d <olume name>. To eject cartridge from the library, use the following command: nrib w <olume name>. Full information about these commands can be found under EMC Legato NetWorker Commands Reference document. 10. Inentory the selected Virtual Tape Library by using the nsrjb i command. This helps to determine whether the olumes required to recoer the bootstrap are located inside the ProtecTIER irtual tape library. 11. If the user has replicated the bootstrap olumes from production to disaster recoery site, use the following command*: nsrjb ln S slot f deice name where: slot= the slot where the first olume is located, deice_name= the pathname for the first drie. You can obtain the deice_name by using the inquire command. * Please refer to the latest official EMC NetWorker Disaster Recoery documentation or contact your local EMC support for the exact bootstrap recoery steps 12. Use the scanner B command to determine the sae set ID of the most recent bootstrap on the media. For example on Linux: scanner B /de/nst0 Note: If you do not locate the sae set ID of the most recent bootstrap on the most recent media, run the scanner B command on preceding media to locate the sae set ID of the most recent bootstrap. 13. Record both the bootstrap sae set ID and the olume label from the output. 14. Use the mmreco command to recoer the NetWorker serer's bootstrap. (media database and resource database). 208 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

241 15. Restart the NetWorker serices on Legato serer. 16. Restore the Indexes of the clients (only if needed). 17. Perform test backup and recoery with the standby ProtecTIER irtual tape library. 18. Begin normal operations at DR site. Note: Before proceeding with DR procedures, please refer to the latest official EMC NetWorker Disaster Recoery documentation or contact your local EMC support. Determining which cartridges at the DR site are alid for restore Once the DR site Legato serer is recoered with the required catalog, the user will need to reiew the status of the replicated cartridges to ensure their replication consistency with the Legato backup application catalog. To achiee this please use the aailable ProtecTIER system Replicated Cartridges Status Report as explained in the Determining which cartridges at the DR site to restore on page 187 CommVault This section introduces the best practice for managing operations on CommVault's Backup application in order to replicate cartridges from primary/spoke site to DR/hub site. Also, it discusses reading the replicated data by restore operation on the DR site. Prerequisites Both primary/spoke and DR/hub PTs are connected and well identified by the CommVault backup application host. Replication actiities are running and finishing successfully. PT Manager Repositories View Replication's Actiities actie jobs Visibility is defined on all replication's policies. For instance, the replication destination is the DR library. PT Manager Repositories View Replication's Policies Policy's name Target: Chapter 18. Deploying replication with specific back up applications 209

242 local site ProtecTIER serer1 Replication wan DR site ProtecTIER serer2 Commault serer Figure 93. Typical Scenario References CV DR strategy and settings: CommVault_BestPracticesforDR.pdf CV recoer, restore and retriee using Data Agents: documentation.commault.com/commault/release_8_0_0/books_online_1/ english_us/getting_started/getting_started.htm CV Command Line Interface configuration and command usage: books_online_1/english_us/features/cli/cli.htm Instructions Eject media from a library using the UI 1. Choose on CV UI: [Storage Resources] [Libraries] [Local's site library] [Media By Groups] [Assigned] 2. Select the required cartridges and choose [Export]. 3. Define the [New outside Storage Location] and choose "OK". Track the change: [CV UI] - The selected Media moed to the New Outside Storage Location. 210 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

from the local library into the shelf (subject to the isibility). Figure 95.

243 Figure 94. (CommVault UI) Selected Media moed to New Outside Storage Location [PT GUI] -The selected cartridges moed from the local library into the shelf (subject to the isibility). Figure 95. (ProtecTIER GUI) Selected Cartridges moed from local library into the shelf (subject to isibility) PT GUI] On the DR site library cartridges are in the I/E slots. Chapter 18. Deploying replication with specific back up applications 211

244 Figure 96. (ProtecTIER GUI) On the DR site library cartridges in the I/E slots Import (Insert) a media into a library using the UI: 1. Choose on CV UI : [Storage Resources] [Libraries] [Remote's site library] choose [Import Media] [Continue] Track the changes: [CV UI] - The Media will moe into the DR site Library and Appears on : [Storage Resources] [Libraries] [Remote's site library] [Media by groups] [Assigned] 212 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

245 Figure 97. Cartridges moed into DR site Library, CV GUI [PT GUI] the cartridges moe into DR site Library Chapter 18. Deploying replication with specific back up applications 213

$Figure 98. Cartridges moed into DR site Library, PT GUI Eject (export) and Import Cartridge using Commault's CLI: 1. From Windows enironment Enter: <CV installation path> Simpana\Base.$

246 Figure 98. Cartridges moed into DR site Library, PT GUI Eject (export) and Import Cartridge using Commault's CLI: 1. From Windows enironment Enter: <CV installation path> Simpana\Base. Login to the CommSere by: qlogin.exe. Note: The user & password are the same as the login to the Commault's Console. Run the EXPORT command: Example: qmedia.exe export -b <barcode(s)> -el <exportlocation> qmedia.exe export b XZXZX1L3 el Comm_Shelf [CV GUI] - The selected cartridge exported successfully. 214 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

[PT GUI] The selected moed from the Local library to the shelf. Figure 99. Selected cartridge exported successfully Run the IMPORT command: qlibrary.exe import -l <library> Example: qlibrary.

247 [PT GUI] The selected moed from the Local library to the shelf. Figure 99. Selected cartridge exported successfully Run the IMPORT command: qlibrary.exe import -l <library> Example: qlibrary.exe import l Lib_Remote [CV GUI] - The selected cartridge located in the "Assigned Media" at the DR site Library. [PT GUI] - The selected cartridge located inside the DR site Library. Chapter 18. Deploying replication with specific back up applications 215

248 Figure 37 - selected cartridge located inside the DR site Library Figure 100. Selected cartridge located inside the DR site Library HP Data Protector best practice with ProtecTIER/TS7650x This section describes settings and parameters that should be modified in HP Data Protector Enironments to enable the maximum performance and optimum factoring for ProtecTIER. ProtecTIER is a irtual tape product that presents itself to backup serers as a standard Tape Library with a robot, cartridges and tape dries. As data is written to this irtual library; it is examined looking for identical blocks of information which has been already added to the repository. This identical data is not written again but it is referenced as duplicate hence significantly reducing the amount of disk space required. This process terminology is known as Hyper Factoring or deduplication. ProtecTIER and deduplication (Hyperfactor engine) For the deduplication process to work effectiely the data being sent to the irtual tape library must not be manipulated, in other words it must not be modified as it passes from the disk drie on the client to the irtual tape library. Any change made to it will affect the Hyperfactor engine's ability to recognize subsequent ersions of the same data. Certain features in some backup enironments alter the data in ways that impact the deduplication. These features are: 216 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

249 Compression When a data stream is compressed, a small change in the stream will make the resulting output data stream completely different and unable to be recognized as similar or identical data. Multiplexing Multiplexing is a common method of improing performance to a limited number of physical tape dries. Data from seeral clients is combined together as it passes through the backup application for writing to tape. When the same clients are backed up a second time een if there is no change to the data itself the combined or multiplexed output will not be the same as the first backup, therefore reducing the possibility for deduplication. Encryption Encrypted data presents the same problem as compressed data for deduplication purposes. Since the data is modified from ersion to ersion, the deduplication is affected. For additional information and better understanding the concepts of data deduplication, natie replication and reducing storage hardware requirements see IBM System Storage TS7650 and TS7650G with ProtecTIER located at: HP DataProtector best practice with ProtecTIER The follow recommendations might be implemented on HP DP backup application side as Best Practices. The implementation of the recommendation listed in this documentation can improe performance, factoring (deduplication) and cause cost-effectie usage of the storage. 1. Robotic Barcode Reader / SCSI resere/release - Enable robotic Barcode Reader Support and use barcode as medium label Enable SCSI resere / release robotic control to resere the robotic control only for Data Protector operations. Chapter 18. Deploying replication with specific back up applications 217

250 Figure 101. Robotic Barcode Reader 2. Increase the block size written to the tape deice - It's recommended to increase the block size of the tape deice from default alue 64kb to minimum of 256 kb and up to 1MB as optimal. ProtecTIER support all alues up to 1 MB block size. Increasing the block size will reducing the amount of oerhead of headers and might increase factoring and performance. 218 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

251 Figure 102. Increase block size The increase of block size on a Windows Media Agent client, should be modified in windows registry. Please refer to HP DP User Guide for the releant information ( search for "MaximumSGList" ). In order to increase the block size aboe 256kb for HPUX, AIX, Linux, Solaris on the HP DP add the following ariable to the file /opt/omni/.omnirc OB2LIMITBLKSIZE=0. To allow block size of 1024kb, the parameter of st_large_recs needs to be enabled (set to a non-zero alue). 3. Enable Lock Name - To preent a collision when Data Protector may try to use the same physical deice in seeral backup sessions at the same time, enable use LOCK NAME in the deice configurations. Chapter 18. Deploying replication with specific back up applications 219

multiplexing, compression & encryption for tape deices.

252 Figure 103. Enable Lock Name 4. Disable Compression, Encryption and CRC chksum - Disable the use of multiplexing, compression & encryption for tape deices. Disable the option to perform CRC erification. 220 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

253 Figure 104. Disable compression, encryption and CRC 5. MultiPath deices configuration: robotic and deices - includes the ability to use Multipath deice configuration for robotic and dries. This option can be used with CPF/DPF option on ProtecTIER. Chapter 18. Deploying replication with specific back up applications 221

254 Figure 105. Multipath Deice Figure 106. Multipath Deice B 222 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Use of Mirror - The use of Data Protector Object mirror functionality enables writing the same data to seeral media simultaneously during a

255 6. Load Balancing - The Load Balancing feature will use the optimum resources usage required for the tape deices workload. Figure 107. Load Balancing 7. Use of Mirror - The use of Data Protector Object mirror functionality enables writing the same data to seeral media simultaneously during a backup session. On some cases this feature can replace the aulting or migrating the data between libraries and can decrease the usage of resources and increase performance during backup jobs. Figure 108. Use of Mirror Chapter 18. Deploying replication with specific back up applications 223

256 8. Use media sets/pools to manage cartridges - Try using different data types and sources separated to different media pools sets. This may help in global planning and effectie analysis and tuning per data type used. Figure 109. Use media sets 9. Troubleshooting logs - The following information might assist in preliminary analysis troubleshooting: a. CLI output commands from the DataProtector host serer: #debre de #sanconf list_driers b. Data Protector log files are located in the following directories: For Windows systems: <Data_Protector_home>\log For HP-UX: /ar/opt/omni/log and /ar/opt/omni/serer/log For other UNIX systems: /usr/omni/log 224 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

257 Part 5. Data Types Certain features in some backup enironments alter the data in ways that impact the deduplication. These features are: Compression When a data stream is compressed, a small change in the stream will make the resulting output data stream completely different and unable to be recognized as similar or identical data. Multiplexing A common method of improing performance to a limited number of physical tape dries. Data from seeral clients is combined together as it passes through the backup application for writing to tape. When the same clients are backed up a second time - een if there is no change to the data itself - the combined or multiplexed output will not be the same as the first backup. Therefore, reducing the possibility for deduplication. Encryption Encrypted data presents the same problem as compressed data for deduplication purposes. Since the data is modified from ersion to ersion, the deduplication is affected. Copyright IBM Corp. 2008,

258 226 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

259 Chapter 19. RMAN Oracle Best Practice Tuning with ProtecTIER This chapter describes settings and parameters that should be modified in RMAN enironments to enable the maximum performance and optimum factoring for ProtecTIER. Deduplication background ProtecTIER is a irtual tape product that presents itself to backup serers as a standard Tape Library with a robot, cartridges and tape dries. As data is written to this irtual library; the data is being examined looking for identical blocks of information which has been already added to the repository. This identical data is not stored again there but referenced as duplicate hence significantly reducing the amount of disk space required. This process is known as HyperFactor or deduplication ProtecTIER and deduplication best practice For the deduplication process to work effectiely the data being sent to the irtual tape library must not be manipulated, in other words it must not be modified as it passes from the disk drie on the client to the irtual tape library. Any change made to this data will reduce or eliminate HyperFactor engine ability to recognize subsequent ersion of it. RMAN settings tuning RMAN has parameters that make use of Multiplexing. IBM has determined the following RMAN settings will eliminate multiplexing and proide the best opportunity for factoring to occur while proiding the best performance. MAX_OPEN_FILES and FILESPERSET controls the multiplexing leel: MAX_OPEN_FILES =1 Controls the number of data files to be read from the DB concurrently through a single backup, therefore it needs to be set to 1 preenting multiplexing. FILESPERSET=4 Defines the number of data files to be included in a backup set. PARALLELISM =32 (up to 64 ) Controls the number of tape dries that will be used for backup. Use more parallel backup streams to achiee higher performance. Please make sure to hae the amount of irtual tape dries aailable within ProtecTIER library to match the number of parallel streams configured within RMAN. Copyright IBM Corp. 2008,

260 228 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

261 Chapter 20. Lotus Domino Tuning with ProtecTIER Domino Lotus This chapter describes settings and parameters that should be modified in Lotus Domino enironments to enable the optimal factoring for ProtecTIER. Factoring and Deduplication background: ProtecTIER is a irtual tape product that presents itself to backup serers as a standard Tape Library with a robot (media changer), cartridges (olumes) and tape dries. As data is written to this irtual library the data is examined by HyperFactor algorithm. New data which is found t be identical to old data will not stored again -there-by significantly reducing the amount of disk space required. This process terminology is known as Hyper Factoring or deduplication. ProtecTIER and deduplication best practices: For this process to work effectiely the data being sent to the irtual tape library must be original data. In other words, it must not be modified as it passes from the disk drie on the client to the irtual tape library. Any change made to this data will reduce or eliminate Hyper factor's ability to recognize subsequent ersions of it. Lotus Notes is a common serer. Customers usually run backup policies from the Domino serer which stores all the data in the tape, or in our case, Virtual Tape. Running the legacy backup commands and using the general recommended method will cause a low factoring of the data een if the change is low. In other words - it will reduce the benefit of using the ProtecTIER solution and disk space utilization. The root cause is the inherent compaction of the database which reshuffles NSF files inside the database. The Domino compaction optimizes the disk space and allows better utilization of the database space. As a result, the impact of the Domino delete operation can significantly degrade Deduplication efficiency. The solution is based on a new feature on the Domino enironment called DAOS (Domino Attachment and Object Serice). This feature is aailable from Domino 8.5 and aboe. The concept is to diide the database.nsf files (database items) and.nlo files (attachment items). The.NLO will hold a single instance of each attachment and the.nsf will re-use it. Therefore, the enhancement implemented in Domino 8.5 reduces the delte impact since the DAOS layout will not hold each attachment multiple times. The compaction effect is mitigated and the.nlo change is marginal. Considerations for applying the DAOS solution for integrating Domino and ProtecTIER User must upgrade Domino serers/clients to ersion 8.5. User a new flag in the backup command: -daos ON. For example: ncompact.exe <Full path of the NSF DIR> <compact type> -n - -i -ZU -daos ON. Copyright IBM Corp. 2008,

262 It is recommended to run the DAOSest util to assess the benefit of using the DAOS. Note: Before implementing the DAOS, please consult your Domino representatie. Changing the Domino Serer Database This task should only be performed once. Procedure 1. Ensure Domino serer is running: Windows: Choose Start Programs Lotus Applications Lotus Domino Serer. Unix: /opt/lotus/bin/serer 2. Open the nlnotes.exe. 3. Go to the Workspace window. 4. Open the file names.nsf by browsing. The location is usually E: Lotus Domino Data. 5. Choose the following: Configuration Serers All Serer Documents. Then, open the document related to your Domino serer. 6. Double click on the page to change it to edit mode. Then, go to the Transactional Logging tab. 7. Make sure the following parameters are set: Log path: logdir Logging style: Circular Maximum log space :512M. 8. Sae and Close. 9. Shut down the Domino serer: exit <password>. 10. While it is down, add the following line to the notes.ini file: CREATE_R85_DATABASE= Start the Domino serer again using the password. It may take seeral minutes for the start up sequence, this time only. 12. Return to steps5 through 7 to edit the serer document again. Open the DAOS tab. You may need to scroll oer to the right to see the tab. 13. Update the Parameters Store Attachments in DAOS: ENABLED Minimum size: 4096 DAOS base path: daos defer deletion: 30 days 14. Restart the Domino Serer: restart serer [password]. Results In the next compaction, you will see the DAOS directory created in the Domino Data Directory. Within that, you will see: 0001/<really_long_name>.nlo Recommendations for the backup command This section offers best practice recommendations for the backup command. Use the flag -daos ON while running the NSF backup command. Instead of running a single backup command, run two commands: one for the NSF and one for the NLO backup. 230 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

$1. Lotus Domino serer must run. 2. Open the command line window in the Domino Serer Machine: Start run cmd 3. cd c:\lotus\domino 4. Type: ncompact.$

263 1. Lotus Domino serer must run. 2. Open the command line window in the Domino Serer Machine: Start run cmd 3. cd c:\lotus\domino 4. Type: ncompact.exe <Full path of the NSF Dir> <compact type> -n - -i -ZU -daos ON Note: The flag -i is not supported in compaction B. Figure 110. Lotus Domino Serer command line Backing up NSF file To back up the NSF file, choose one of the following options: From the preiously-installed TDP (Tioli Data Protector): 1. Through the TDP GUI (Graphical User Interface), choose the location of your NSF files. 2. Select the files you would like to back up and click the Backup button. From the command line: 1. cd c:\program Files\Tioli\TSM\domino 2. Start cmd /c domdsmc selectie <NSF file name> /subdir=yes /logfile=domsscham.log>>domselan.log Note: The command line will search the defined NSF file under E:\Lotus\Domino\data\mail\ Backing up NLO files (from DAOS directory) To back up the NLO files, use the TSM BU command -dsmc: 1. cd C:\Program Files\Tioli\TSM\baclient 2. dsmc incremental E:\Lotus\Domino\data\DAOS\0001\password& Chapter 20. Lotus Domino Tuning with ProtecTIER 231

264 232 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

265 Chapter 21. DB2 Best Practices tuning with ProtecTIER This chapter describes settings and parameters that should be modified in DB2 enironments to enable the maximum performance and optimum factoring for ProtecTIER. ProtecTIER is a irtual tape product that presents itself to backup serers as a standard Tape Library with a robot, cartridges and tape dries. As data is written to this irtual library, it is examined for identical blocks of information that hae preiously been added to the repository. This identical data is not written again but is referenced as duplicate - hence significantly reducing the amount of disk space required. This process terminology is known as HyperFactoring or deduplication. For the deduplication process to work effectiely the data being sent to the irtual tape library must not be manipulated. In other words, it must not be modified as it passes from the disk drie on the client to the irtual tape library. Any change made to it will affect the HyperFactor engine's ability to recognize subsequent ersions of the same data. DB2 settings and tuning The most recommended method for improing deduplication is by upgrading the DB2 database in order to enable a special feature embedded that modifies the data to make it "deduplication friendly". For V9.7, install FixPack 3. For V9.5, consult with DB2 support for a special patch on the specific build you are running that includes dedupe_deice setting. Example of a command using dedupe_deice: db2 backup db mydb use tsm open 10 sessions dedupe_deice. When using the dedupe_deice setting, each tablespace is backed up to a different tape drie. Therefore, if the database contains tablespace(s) significantly larger than others (aboe 30% of the entire database size), it might prolong the completion of the backup. If this affects the backup window, consult DB2 support in order to assist you in splitting the larger tablespace(s) in to a smaller size. If you are unable to implement the aforementioned tasks, IBM has determined the following alternatie DB2 settings to proide the best opportunity for deduplication to occur while proiding the best performance: Table 20. Recommended DB2 settings DB2 Parameter Recommended Value Description buffer This alue requires 1.2G of memory. If this is too eleated, use 4097 instead. parallelism minimum * Change the alue accordingly to allow to read the data in the required backup rate. Copyright IBM Corp. 2008,

266 Table 20. Recommended DB2 settings (continued) DB2 Parameter Recommended Value Description sessions minimum * Change the alue accordingly to allow to read the data in the required backup rate. buffers parallelism + sessions + * The "minimum" alue should be selected to allow acceptable backups window time. Value 1 will be the beset for deduplication but it might increase backup times in large multi-tablespace databases. Run with these parameters all the time. Do not modify them if it is not necessary. Examples of a command: db2 backup db <databasename> use tsm open 8 sessions with 18 buffers buffer parallelism 8. DB2 settings and tuning - TSM TSM proides additional features to enhance deduplication when backing up DB2 databases. Send DB2 data to TSM diskpool and then backup each TSM diskpool image using a single session (to preent multiplexing). On DB2, perform the same backup but instead redirect it to a file(s) (TSM diskpool image(s)). Example of command: db2 backup <databasename>...file_path1 file_path2...file_path8 with 18 buffers buffer parallelism 8. Table 21. TSM recommended DB2 settings DB2 Parameter Recommended Value Description file_path Path Path to TSM diskpool location. buffers n files + n parallelism + 2 buffer As described in Table 20 on page 233 parallelism n amount of files used for the backup 234 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

267 Chapter 22. Recommendations for specific source data types The other factor that affects performance in a ProtecTIER enironment is the data that is being targeted for backup. Some data, like databases and MS-Exchange , is highly compressible and also factors quite well. Other data, like ideo or seismic data, cannot be compressed or factored ery well. The following recommendations are for the data type or application that sends backup data to your backup serer. Oracle database data When using database backup tools such as Oracle Recoery Manager (RMAN), make sure that the multiplexing-like option is disabled. For example, prior to ProtecTIER being installed, RMAN by default sends a total of nine files requiring backup to three channels. This equates to three file streams each containing three files (3 x 3 = 9)to three physical tape dries. After ProtecTIER is implemented, RMAN will need to be altered to send nine file streams each containing one file (9 x 1 = 9)to nine irtual tape dries in the irtual tape library. This is achieed by changing your RMAN backup commands to only send one file per stream/backupset using this option: For each backup command, set the alue FILESPERSET=1. SQL Litespeed data SQL Litespeed backups may not factor ery well. This is due to the fact that Litespeed compresses and encrypts the SQL database before backing it up. This is for your information only. There are no actions that can be taken to improe this process. Copyright IBM Corp. 2008,

268 236 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

269 Part 6. General Best Practices Copyright IBM Corp. 2008,

270 238 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

271 Chapter 23. IBM i BRMS and ProtecTIER Replication The utilization of ProtecTIER in IBM i enironments opens new possibilities for System i users to implement and deploy their backup solutions. With ProtecTIER, tape deices and media are irtual and reside on the disk space controlled by the system; the needed space for backups is minimized by ProtecTIER's deduplication. ProtecTIER proides flexibility and easy management to users with complex backup enironments since they can define as many tape deices and cartridges as suitable for them, and they can share the ProtecTIER system amongst System i and open serers. The IT centers that look for keeping their copies of cartridges at a DR site, can employ the ProtecTIER replication function which will be discussed in detail in this section. System i specific features This section describes some of the System i specific features which are important to know in order to understand the way IBM i backups are performed. Integrated Database DB2 for IBM i is the relational database manager that is fully integrated into the System i product. Because it is integrated with the system, DB2 for IBM i is easy to use and manage. DB is used by a ariety of applications, from traditional host-based application to client/serer solutions to business intelligence applications. Object based architecture Once of the differences between the IBM i operating system and other operating systems is the concept of objects. Anything that one can change in the operating system is a type of object. For example, database files, programs, libraries, queues, user profiles, and deice descriptions are all types of objects. By treating eerything as an object, the operating system can proide all of these items with an interface that defines what actions users can perform, and how the operating system needs to treat the encapsulated data. Additionally, this interface allows for standardized commands across different system elements; the commands for working with user profiles and data files are similar. Libraries On the IBM i operating system, objects are grouped in special objects called Libraries. Libraries are essentially containers, or organizational structures for other objects. Libraries can contain many objects, and can be associated with a specific user profile or application. The only library that can contain other libraries is called QSYS. It contains all other libraries on the system. Integrated file system The Integrated File System (IFS) is a part of the IBM i that supports stream input/output and storage management similar to personal computer and UNIX operating systems, while proiding an integrating structure oer all information stored on your system. The IFS enhances the already extensie data management Copyright IBM Corp. 2008,

272 capabilities of IBM i with additional capabilities to better support client/serer, open systems, and multimedia, and similar. Backups in an IBM i enironment A backup and recoery strategy of an IBM i data center is based on the following decisions: Determine what to sae and how often to sae it. Determine your sae window. This is the amount of time: Objects being saed can be unaailable for use. The entire system can be unaailable for IBM i sae system functions. Consider recoery time and choose aailability options. Test deeloped backup and recoery strategy. After determining what to sae and how to sae it, decide on an approach similar to the following: Daily sae the libraries and objects which regularly change, such as application libraries, user profiles, configuration objects, parts of IFS and so on. Sae entire system eery week. Alternatiely, sae all user libraries (*ALLUSR) eery week. Typically, the objects that regularly change hae to be restored more often and in a shorter period of time, compared to objects that do not change frequently. IBM i supports parallel and concurrent saes which enable sae to multiple tapes in parallel and thus reduce the sae window. Parallel Saes and Restores The ability to sae or restore a single object or library/directory across multiple backup deices from the same job. Concurrent Saes and Restores The ability to sae or restore different objects from a single library/directory to multiple backup deices or different libraries/directories to multiple backup deices at the same time from different jobs. How ProtecTIER replication works At the setup of ProtecTIER replication, the user first determines the time frame for replication: It can be anytime with preference to backups, either at its own time/scheduled window or concurrently with the backup operation. All replication then adheres to this global time frame setting. Next, set a replication policy: the policy is named and assigned a priority (low, med, high) with a destination and a bar code range. For destination there are two choices hence two behaiors: Shelf type policy and Library-type policy. The cartridges will replicate to the DR/destination system's Shelf until they are ejected from the source VTL. With the Shelf-type policy, an eject at the source library moes the cartridge to the source Shelf. The replica cartridge at the DR site does not moe. So, after this action, the cartridge is isible on both local Sheles (no library). With the Library-Type policy, an eject at the source library moes the cartridge to the source Shelf and also automatically moes the replica cartridge from the 240 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

273 DR site Shelf to the defined destination library's IO station. After this action, the cartridge is on the local system Shelf (hidden from the library/brms host), but isible at a DR site library. The automatic isibility switch will occur after the cartridge was completely replicated to the DR site Shelf. If you eject a cartridge from the source library before replication is complete, the replica will stay on the DR site Shelf until replication is complete, and then moe the DR site library IO station. If, for any reason (disaster, failure of local ProtecTIER), you can not eject the local cartridge after successful replication, you can manually moe the DR site replica from the Shelf to the DR site VTL. Backup Recoery and Media Serices (BRMS) You can plan, control and automate the backup, recoery, and media management serices for your IBM i systems with Backup Recoery and Media Serices (BRMS). Type your text here. BRMS concepts and terms We briefly describe some of the common BRMS terms: Media : A tape cartridge (olume) that will hold the saed data. Media identifier: A name gien to a physical piece of media. Media class: A logical grouping of media with similar physical, logical, or both of these characteristics (for example, density). Control group: A group of items (for example, libraries or stream files) to back up as well as the attributes associated with how to back them up. Policies: A set of defaults that are commonly used (for example, deice or media class). Generally used defaults are in the BRMS system policy. Backup-related defaults are in the BRMS backup policy. Following picture shows the concepts of BRMS: Chapter 23. IBM i BRMS and ProtecTIER Replication 241

274 Backup Recoery and Media Serices (BRMS) Policies Control Groups Job Scheduler BRMS Backup Archied copies Backup copies Delete Media Inentory The IBM i library QBRM and QUSRBRM contains BRMS related objects and management information. Figure 111. Backup Recoery and Media Serices Examples of BRMS with ProtecTIER Below we describe some examples of using ProtecTIER for System i backups. 1. Saing a System i library to ProtecTIER In this example we sae a System i library to ProtecTIER. The library can contain application database files, programs, logical files, etc. Once the Virtual tape library (VTL) is created on ProtecTIER and assigned to the ports connected to System i, System i automatically recognizes the VTL (robot, tape dries and cartridges) proided that the auto-configuration of deices is enabled in IBM i. To bind what the IBM i knows about the new tape library and what BRMS also needs to know, we run the Initialize BRMS command INZBRM *DEVICE, which not only picks up new deices, but also remoes those that are no longer reported. We adjust the BRMS Location corresponding to the VTL, enroll the irtual media to BRMS, update the System and Backup policies, and Control group for the System i library we want to sae. Next step is to initialize irtual media, and we are ready to sae to the VTL. Backup is initialized by BRMS command Start Backup for a particular Control group. 242 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

275 Following pictures show and example of Control group for a System i library, and starting the backup of that library ia the Control group:? Create Backup Control Group Entries I5PFE Group...:PROTECTIER Defaultactiity...*BKUPCY Text...*NONE Type information, press Enter. Weekly Retain Sae SWA Backup List ASP Actiity Object While Message Sync Seq Items Type Deice MTWTFSS Detail Actie Queue ID 10 QDEXDATA01 *SYSBAS FFFFFFF *NONE F3=Exit F5=Refresh F10=Change item F11=Display exits F12=Cancel F14=Display client omit status F24=More keys Bottom Figure 112. Create Backup Control Group Entries Start Backup using BRM (STRBKUBRM) Type choices, press Enter. Control group > PROTECTIER *BKUGRP, *SYSGRP, *SYSTEM... Scheduletime... *IMMED hhmm,*immed Submit to batch *YES *YES, *CONSOLE, *CTLSBS, *NO Starting sequence: Number... *FIRST ,*FIRST Library... *FIRST Name,*FIRST Append to media *CTLGRPATR *CTLGRPATR, *BKUPCY, *NO... Jobdescription... *USRPRF Name,*USRPRF Library... Name,*LIBL,*CURLIB Jobqueue... *JOBD Name,*JOBD Library... Name,*LIBL, *CURLIB Actiity... *CTLGRPATR *CTLGRPATR,*FULL,*INCR Retention: Retention type *CTLGRPATR *CTLGRPATR, *DAYS, *PERM Retain media > F3=Exit F4=Prompt F5=Refresh F12=Cancel F13=How to use this display F24=More keys More... Figure 113. Start Backup using BRM 1. Restoring an object from ProtecTIER Chapter 23. IBM i BRMS and ProtecTIER Replication 243

276 In many occasions it is not needed to restore the entire saed system nor een a system i library, but only an object. In this example, howeer, we restore the System i library. Restoring an object requires ery similar steps and employs the same BRMS menus. We restore the System i library by using the BRMS Recoery menu, where we specify which library we want to restore. BRMS then looks for the cartridges that contain the saed library, and lists them. We can now decide from which cartridge to restore the library. The decision is based on the date and time of sae, type of sae, etc. Select Recoery Items I5PFE Selectaction...:*ALL Selectolume...: Type options, press Enter. 1=Select 4=Remoe 5=Display 7=Specify object Saed Sae Sae Sae Parallel Volume File Expire Opt Item Date Time Type Deices Serial Sequence Date 1 QDEXDATA01 6/03/09 18:38:21 *FULL I /08/09 Figure 114. Select Recoery Items As seen in Figure 41: Select Recoery Items, BRMS presents a list of cartridges from which the required object can be restored. Note: In the example, there is only one cartridge containing the requested library QDEXDATA01, but in usual backup policy in a data center we expect that such list will contain many cartridges. When restoring from incremental backups, BRMS lists all the needed tapes to restore: the cartridge with full backup as well as the cartridges containing incremental saes. Sae to multiple irtual cartridges in parallel With parallel saes to ProtecTIER, performance of the sae profit from better utilization of attached storage system, and links to the host serer. Thus the sae speed significantly increases and the sae window is accordingly reduced. Therefore we definitely recommend using BRMS parallel sae to multiple irtual tape dries. We achiee parallel sae by specifying the number of parallel deice resources in BRMS control group, as is shown in the example below: 244 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

277 Change Backup Control Group Attributes Group...:PROTECTIER Type information, press Enter. Media policy for: Fullbackups...*BKUPCY Incrementalbackups...*BKUPCY Backupdeices...*BKUPCY Name,F4forlist Name,F4forlist Name,F4forlist Parallel deice resources: Minimumresources , *NONE, *AVAIL Maximumresources , *AVAIL, *MIN Signoffinteractieusers...*BKUPCY *YES,*NO,*BKUPCY Signofflimit... *BKUPCY 0-999minutes,*BKUPCY Defaultweeklyactiity...*BKUPCY MTWTFSS(F/I), *BKUPCY Incrementaltype....*BKUPCY *CUML,*INCR,*BKUPCY Forcefullbackupdays...*BKUPCY 0-365,*NOMAX,*BKUPCY F3=Exit F4=Prompt F12=Cancel Figure 115. Change Backup Control Group Attributes Setting-up BRMS for Disaster Recoery More... When planning Disaster Recoery solution with restoring BRMS saes on a DR site, we recommend to take into consideration the following points: The Recoery Time Objectie (RTO) - the duration of time and a serice leel within which a business process must be restored after a disaster (or disruption) in order to aoid unacceptable consequences associated with a break in business continuity. Typically, the RTO with tape solutions is some longer than the RTO with software mirroring or replication solution. The Recoery Point Objectie (RPO) - the point in time to which you must recoer data as defined by the organization. Typically, the RPO with tape solutions is some longer than the RTO with software mirroring or replication solution. Order of saing IBM i libraries - When saing *ALLUSR, consider to sae the libraries QGPL, QUSRSYS, and QSYS2 ahead of other libraries. It may be a good idea to sae QUSRBRM after the other libraries so that it reflects the latest BRMS status. Restore of entire system should be started from Alternate Installation Deice (can be tape or optical media). After IPL is initiated from the Alternate installation deice, continue with restoring data from irtual cartridges in ProtecTIER. To achiee desired performance consider parallel or concurrent sae to multiple irtual tape dries. See description in the section Backups in IBM i. Type of incremental saes: Cumulatie - When cumulatie backup is specified, all changed objects since the last full backup are included in the saed objects. Incremental - All changed objects since the last incremental backup are included in the sae. Chapter 23. IBM i BRMS and ProtecTIER Replication 245

278 Append to media - BRMS normally chooses an expired olume for output operations, unless you specify APPEND (*YES) on the backup policy or control group attributes. In this case BRMS tries to selects and actie olume to which it appends the new data. When selecting an actie olume for APPEND (*YES), BRMS tries to choose a olume with the same media class, same expiration date, same system name, same moe policy, and same expiration date. If no olume is aailable that matches these criteria, BRMS looks for a olume with the earliest expiration date. This selection method ensures that the oldest olumes are chosen for appending. Compression and compaction we don't recommend to use compression or compaction when saing to irtual cartridges on ProtecTIER. Deploying ProtecTIER with BRMS for disaster recoery Here we describe two ways to deploy ProtecTIER replication for disaster recoery solution with IBM i. Single domain synchronized BRMS information between production and DR site On local site there are production System i serer and ProtecTIER connected to it. On DR site there is a DR IBM I system and the DR ProtecTIER connected to the DR system. Replication with Library-type policy is setup between the primary and DR ProtecTIER. BRMS is installed on both production and DR serer, both BRMSes are in the BRMS network, so the information about media, deices etc. is synchronized between the two BRMSes. In production BRMS, a control group is setup for saing to the irtual cartridges that are being replicated. Depending on the backup policy the control group can be setup for full saes or incremental saes. During writing and appending to the replicated cartridges the replicas reside at the Shelf of the DR ProtecTIER. Once the local irtual cartridge is ejected by BRMS command RMVTAPCTG the DR site replica is moed from the DR site Shelf to the IO station of DR site VTL; at the same time the local cartridge is moed to the local Shele. At the same time we change the BRMS location of the cartridge by command MOVMEDBRM. As soon as the replicated cartridge is moed to VTL on the DR site ProtecTIER it is seen by both BRMSes in the BRMS location in which it was moed. In case of disaster on local site perform the following steps: Note: It is assumed that the backup strategy described in the section Backups in IBM i is used. 1. From ProtecTIER.cs report file determine which is the latest set if full backup and incremental backups. 2. Obsere the ProtecTIER.cs report of replicated cartridges to determine if cartridges you want to restore hae consistent data. If they don't contain consistent data, you may want to restore from an earlier set of backups, as is described in the section Determine which replicated cartridges are alid for restore. 3. Restore the IBM i system from the consistent replicas of cartridges in DR ProtecTIER. 246 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

279 During the outage on production system the DR system performs the daily saes to the DR site ProtecTIER. Failback: Once production system and connected ProtecTIER are up and running again, establish the replication from the DR site to the primary ProtecTIER. Depending on the status of local ProtecTIER replicate all the cartridges or just the ones that were written during the outage. Once all the needed irtual cartridges are replicated to production ProtecTIER, start daily saes from production system, terminate the replication from the DR to the primary ProtecTIER, and establish the replication from local ProtecTIER to the DR site ProtecTIER. This scenario is shown on the picture below: Run a BRMS control group to sae backup items to ProtecTier (TS7650 local) Setup the TS7650 to do the remote copy 1 isible copy System i BRMS Control group Sae DR site System i 111 TS7650 Local Tcp/ip Replication TS7650 Remote Location 111 Figure 116. Establishing replication and Failback 2008 IBMCorporati on Separate domains no BRMS information about the replicated cartridges on DR site In this scenario production IBM i System perform backups to locally attached ProtecTIER with BRMS. A ProtecTIER is located on the DR site, replication is established from the local to the DR ProtecTIER, the Library type policy of replication is used. Optionally, an FC link from local system to DR ProtecTIER is established for restore from DR site tapes if needed. Note: connection ia such FC link is officially supported only when it doesn't exceed 100 km. Chapter 23. IBM i BRMS and ProtecTIER Replication 247

280 In BRMS, a control group is setup for saing to the irtual cartridges that are being replicated. Depending on the backup policy the control group can be setup for full saes or incremental saes. During writing and appending to the replicated cartridges the replicas reside at the Shelf of DR site ProtecTIER. Once the local irtual cartridge is ejected by BRMS command RMVTAPCTG the DR site replica is moed from the DR site Shelf to the IO station of DR site VTL; at the same time the local cartridge is moed to the local Shele. At this time we change the BRMS location of the cartridge by using command MOVMEDBRM. In case of disaster, restore the entire system to a serer at a DR site by performing these steps: 1. From the ProtecTIER.cs report file determine which is the latest set if full backup and incremental backups. 2. Once the latest aailable consistent set of cartridges is determined, we recommend following the BRMS Recoery report to restore the system. Following the BRMS Recoery report, it may be necessary to perform the following steps: 1. Restore System I Licensed Internal Code. Since Alternate IPL is currently not supported from the irtual cartridges in ProtecTIER, you need to start the restore from a media in an alternate installation deice, such as CD or DVD reader. 2. Restore the operating system. 3. Restore the BRMS product and associated libraries QBRM and QUSRBRM. 4. Restore BRMS related media information. 5. Restore user profiles. 6. Restore System libraries QGPL, QUSRSYS, QSYS2. 7. Restore configuration data. 8. Restore IBM product libraries. 9. Restore user libraries. 10. Restore document library. 11. Restore Integrated File System (IFS) objects and directories. 12. Restore of Spooled Files, Journal Changes, and Authorization Information. Alternatiely, if the FC connection from the production system to DR site ProtecTIER exists, and depending on the type of failure on local site, you may restore the needed objects from DR site ProtecTIER to the production system. During the outage on production system the DR system performs the daily saes to DR site ProtecTIER. Failback to local ProtecTIER can be done the same way as is described in preious case Single domain. The picture below illustrates this scenario: 248 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

281 BRMS Control group Sae 111 System i Running BRMS moement RMVTAPCTG triggers insert to remote VTL. Use MOVMEDBRM to change their location. Run a BRMS control group to sae backup items to ProtecTier (TS7650) to the cartridges setup for replication Setup the TS7650 to do the remote copy and only 1 isable. TS7650 Local Tcp/ip TS7650 Remote Location 111 Figure 117. Failback to local ProtecTIER 2008 IBMCorporati on Examples of BRMS policies and control groups for ProtecTIER replication in a single domain BRMSes on local IBM i and on DR site IBM i are in the BRMS network. On local ProtecTIER we create the source VTL (TS7650SRC1 in our example) and we create the target VTL (TS7650TGT1) on DR site ProtecTIER. Both VTLs are known to the BRMS network. 1. After the irtual cartridges are added to BRMS and initialized, we setup the BRMS moe policy to moe the cartridges from location TS7650SRC1 to location TS7650TGT1. The cartridge will stay in location TS7650TGT1 until it expires. The BRMS moe policy is shown on the screenshot below: Chapter 23. IBM i BRMS and ProtecTIER Replication 249

282 Figure 118. Change Moe Policy 2. We create BRMS media policy to use this moe policy: Figure 119. Display Media Policy 3. We create a BRMS backup control group to use this media policy, and the VTL on local ProtecTIER: 250 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

283 Figure 48 - Change Backup Control Group Attributes Figure 120. Change Backup Control Group Attributes 4. On ProtecTIER we setup the replication in Library mode, as is shown on the screenshot below: Figure 121. Library Mode On the following picture the left screenshot represents the production IBM i system, and the right screenshot is taken from DR site system. The moed cartridge BRM002 is in Inserted status on the DR site systems: Chapter 23. IBM i BRMS and ProtecTIER Replication 251

284 Figure 122. On left, production IBM i System. On right, DR site serer 5. Verify media moe on DR site system, which will make the cartridge aailable. Verify is shown on the following screenshot: Figure 123. Verify media moe on DR site system 252 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

285 6. The cartridge is now aailable in the DR site system, as can be seen on the picture below: Figure 52 - Cartridge aailable in DR site system Figure 124. Cartridge aailable in DR site system Chapter 23. IBM i BRMS and ProtecTIER Replication 253

286 254 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

287 Chapter 24. V7000 (Storwize) Best Practices with ProtecTIER This section describes the V7000 (Storwize) best practices with ProtecTIER ersion 2.5. Deice Mapper Multipath There is no need to manually configure the multipath specific settings for the storage (multipath.conf), this is being done automatically by ProtecTIER's installation script (autorun). Fibre Channel Connection Topology The V7000 (storwize) supports redundant connections to the clustered ProtecTIER nodes. To ensure full protection against the loss of any one fiber channel path from the node serers to the V7000, always use redundant host connections by connecting each host to the appropriate single-ported host channels on both controllers. These configurations hae host and drie path failoer protection and are best practices for redundancy and stability. Eery ProtecTIER Node must be configured to obtain 2 paths to each controller. V7000 does not support direct attachment to the ProtecTIER serer, a fiber switch is required for attaching V7000 to ProtecTIER. Figure 125. Example of redundant SAN fabric fibre channel configuration Copyright IBM Corp. 2008,

288 Figure 126. Example of a configuration with one FC switch If zoning is done correctly, In both cases Multipath ll output would look similar to this: mpath88 ( ) dm-118 IBM,2145 [size=100g][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=50][actie] \_ 4:0:0:88 sdhl 133:176 [actie][ready] \_ round-robin 0 [prio=10][enabled] \_ 3:0:0:88 sdcm 69:160 [actie][ready] mpath73 ( ) dm-88 IBM,2145 [size=100g][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=50][actie] \_ 3:0:0:73 sdbw 68:160 [actie][ready] \_ round-robin 0 [prio=10][enabled] \_ 4:0:0:73 sdgw 132:192 [actie][ready] mpath90 ( ) dm-125 IBM,2145 [size=100g][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=50][actie] \_ 4:0:0:90 sdhn 133:208 [actie][ready] \_ round-robin 0 [prio=10][enabled] \_ 3:0:0:90 sdcq 69:224 [actie][ready] Storage Subsystem Firmware Leel It is recommended that Array FW leel is equal to or greater than the FW ersion listed in the ProtecTIER Interoperability Matrix. 256 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

Host type options 2. Choose Fiber-Channel host, then Create Host. Figure 128.

289 Host Type To set up a host from the V7000 GUI, do the following: 1. Select All Hosts under Host Type, and then select a new host. Figure 127. Host type options 2. Choose Fiber-Channel host, then Create Host. Figure 128. Choose Host Type 3. Use Generic as host type. 4. Add your WWPNs (World Wide Port Names). Chapter 24. V7000 (Storwize) Best Practices with ProtecTIER 257

290 Figure 129. Add WWPNs RAID Configuration It is critical to use RAID for data protection and performance. For V7000, you will create 2 Pools, for metadata and userdata: Metadata It is recommended to use Balanced RAID10 groups for Meta Data Mdisks (with layout per planning requirements) with at least 4+4 members The recommended number of Meta Data RAID groups (Mdisks) is determined by the capacity planning tool during the pre-sales process. This number can range from 2 to 10 or more RAID groups (Mdisks) (based on repository size, factoring ratio, and performance needs). The 1GB MD Volume might be created on any of the Meta Data Mdisk. The size of the required Meta Data LUNs/filesystems is a function of the nominal capacity of the repository (physical space and expected factoring ratio) and should be determined prior to the system installation by FTSS/CSS. When expanding the repository, it's important to use the same tier of RAID groups (spindle type & quantity) for MD or UD as the existing, respectiely. For example, if the original two MetaData LUNs were built on RAID 4+4 Groups, new Meta Data RAID groups added must be at least 4+4 to maintain the same leel of performance. User Data 258 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

It is recommended that RAID5 with at least fie disk members (4+1) per MDisk. It is recommended that at least 7 User Data Volumes are created for optimal performance.

291 It is recommended that RAID5 with at least fie disk members (4+1) per MDisk. It is recommended that at least 7 User Data Volumes are created for optimal performance. The size of User Data Volumes should be consistent. Use Generic olume type when create olumes for both MD and UD. Figure 130. Select a preset olume type Total number of olumes for both MD and UD should not exceed 130. General Notes Due to ProtecTIER's nature, the storage resources are being heaily used, therefore, it should be dedicated solely to ProtecTIER's actiity. Disk based replication it is not supported due to ProtecTIER's design limitations, for such purposes it is recommended to use ProtecTIER's natie replication aailable with 2.5 When using SAN P2P topology to connect the TS7650G to the disk array, create a dedicated zone for ProtecTIER backend ports. Do NOT mix the backend ports (Qlogic) with the frontend PT ports (Emulex) or any other SAN deices in the same zone. If possible, dedicate the storage array to the TS7650G. If not possible, use zoning and LUN masking to isolate the TS7650G from other applications. The TS7650G should neer share Pools/Mdisks/Volumes with other applications. ProtecTIER is a random read oriented application % of I/O on a typical TS7650G enironment is random reads at 60Kb block size; therefore, suitable performance optimizations/tuning recommended by the disk endor for this I/O profile should be implemented. Chapter 24. V7000 (Storwize) Best Practices with ProtecTIER 259

292 260 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

293 Part 7. Appendixes Copyright IBM Corp. 2008,

294 262 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

295 Accessibility The publications for this product are in Adobe Portable Document Format (PDF) and should be compliant with accessibility standards. If you experience difficulties when you use the PDF files and want to request a Web-based format for a publication, send your request to the following address: International Business Machines Corporation Information Deelopment Department GZW 9000 South Rita Road Tucson, Arizona U.S.A In the request, be sure to include the publication number and title. When you send information to IBM, you grant IBM a nonexclusie right to use or distribute the information in any way it beliees appropriate without incurring any obligation to you. Copyright IBM Corp. 2008,

296 264 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

297 Notices This information was deeloped for products and serices offered in the U.S.A. IBM may not offer the products, serices, or features discussed in this document in other countries. Consult your local IBM representatie for information on the products and serices currently aailable in your area. Any reference to an IBM product, program, or serice is not intended to state or imply that only that IBM product, program, or serice may be used. Any functionally equialent product, program, or serice that does not infringe any IBM intellectual property right may be used instead. Howeer, it is the user's responsibility to ealuate and erify the operation of any non-ibm product, program, or serice. IBM may hae patents or pending patent applications coering subject matter described in this document. The furnishing of this document does not gie you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drie Armonk, NY U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such proisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATIONS "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm Web sites are proided for conenience only and do not in any manner sere as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it beliees appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled enironment. Therefore, the results obtained in other operating enironments may ary significantly. Some measurements may hae been made on deelopment-leel systems and there is no guarantee that these measurements will be the same on generally aailable systems. Furthermore, some measurement may hae been Copyright IBM Corp. 2008,

298 estimated through extrapolation. Actual results may ary. Users of this document should erify the applicable data for their specific enironment. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly aailable sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objecties only. This information is for planning purposes only. The information herein is subject to change before the products described become aailable. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of indiiduals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AIX DS4000 Enterprise Storage Serer ESCON FICON i5/os iseries IBM ProtecTIER pseries S/390 SereRAID System x System Storage TotalStorage Wake on LAN z/os zseries IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ((R) or (TM)), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or 266 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

299 Electronic emission notices common law trademarks in other countries. A current list of IBM trademarks is aailable on the Web at "Copyright and trademark information" at Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Jaa and all Jaa-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Toralds in the United States, other countries, or both. Other company, product, and serice names may be trademarks or serice marks of others. This section contains the electronic emission notices or statements for the United States and other countries. Federal Communications Commission statement This explains the Federal Communications Commission's (FCC) statement. This equipment has been tested and found to comply with the limits for a Class A digital deice, pursuant to Part 15 of the FCC Rules. These limits are designed to proide reasonable protection against harmful interference when the equipment is operated in a commercial enironment. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instruction manual, might cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case the user will be required to correct the interference at his own expense. Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. IBM is not responsible for any radio or teleision interference caused by using other than recommended cables and connectors, or by unauthorized changes or modifications to this equipment. Unauthorized changes or modifications could oid the user's authority to operate the equipment. Notices 267

300 This deice complies with Part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) this deice might not cause harmful interference, and (2) this deice must accept any interference receied, including interference that might cause undesired operation. Industry Canada compliance statement This Class A digital apparatus complies with Canadian ICES-003. Cet appareil numérique de la classe A est conform à la norme NMB-003 du Canada. European Union Electromagnetic Compatibility Directie This product is in conformity with the protection requirements of European Union (EU) Council Directie 2004/108/EC on the approximation of the laws of the Member States relating to electromagnetic compatibility. IBM cannot accept responsibility for any failure to satisfy the protection requirements resulting from a non-recommended modification of the product, including the fitting of non-ibm option cards. Attention: This is an EN Class A product. In a domestic enironment this product might cause radio interference in which case the user might be required to take adequate measures. Responsible Manufacturer: International Business Machines Corp. New Orchard Road Armonk, New York European community contact: IBM Technical Regulations, Department M456 IBM-Allee 1, Ehningen, Germany Tel: tjahn@de.ibm.com Australia and New Zealand Class A Statement Attention: This is a Class A product. In a domestic enironment this product might cause radio interference in which case the user might be required to take adequate measures. Germany Electromagnetic compatibility directie Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A EU-Richtlinie zur Elektromagnetischen Verträglichkeit Dieses Produkt entspricht den Schutzanforderungen der EU-Richtlinie 2004/108/EG zur Angleichung der Rechtsorschriften über die elektromagnetische Verträglichkeit in den EU-Mitgliedsstaaten und hält die Grenzwerte der EN Klasse A ein. Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur on der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für 268 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

301 die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der IBM erändert bzw. wenn Erweiterungskomponenten on Fremdherstellern ohne Empfehlung der IBM gesteckt/eingebaut werden. EN Klasse A Geräte müssen mit folgendem Warnhinweis ersehen werden: "Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im Wohnbereich Funk-Störungen erursachen; in diesem Fall kann om Betreiber erlangt werden, angemessene Mabnahmen zu ergreifen und dafür aufzukommen." Deutschland: Einhaltung des Gesetzes über die elektromagnetische Verträglichkeit on Geräten Dieses Produkt entspricht dem "Gesetz über die elektromagnetische Verträglichkeit on Geräten (EMVG)." Dies ist die Umsetzung der EU-Richtlinie 2004/108/EG in der Bundesrepublik Deutschland. Zulassungsbescheinigung laut dem Deutschen Gesetz über die elektromagnetische Verträglichkeit on Geräten (EMVG) (bzw. der EMC EG Richtlinie 2004/108/EG) für Geräte der Klasse A Dieses Gerät ist berechtigt, in übereinstimmung mit dem Deutschen EMVG das EG-Konformitätszeichen - CE - zu führen. Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller: International Business Machines Corp. New Orchard Road Armonk,New York Tel: Der erantwortliche Ansprechpartner des Herstellers in der EU ist: IBM Deutschland Technical Regulations, Department M456 IBM-Allee 1, Ehningen, Germany Tel: tjahn@de.ibm.com Generelle Informationen: Das Gerät erfüllt die Schutzanforderungen nach EN und EN Klasse A. People's Republic of China Class A Electronic Emission statement Notices 269

302 Taiwan Class A compliance statement Taiwan contact information This topic contains the product serice contact information for Taiwan. IBM Taiwan Product Serice Contact Information: IBM Taiwan Corporation 3F, No 7, Song Ren Rd., Taipei Taiwan Tel: f2c00790 Japan VCCI Class A ITE Electronics Emission statement Japan Electronics and Information Technology Industries Association (JEITA) Statement jjieta 270 IBM System Storage TS7600: Best Practices Guide for ProtecTIER V. 2.5

303 Korean Class A Electronic Emission statement Russia Electromagnetic Interference (EMI) Class A Statement rusemi Notices 271

Installation Roadmap Guide

IBM System Storage TS7600 with ProtecTIER Installation Roadmap Guide for the TS7650G GC53-1154-06 IBM System Storage TS7600 with ProtecTIER Installation Roadmap Guide for the TS7650G GC53-1154-06 Note: