TSM Node Replication Deep Dive and Best Practices Matt Anglin TSM Server Development
Abstract This session will provide a detailed look at the node replication feature of TSM. It will provide an overview of configuring replication, and explore various models in which replication may be deployed. Best practices and performance optimization will also be discussed. 2
Agenda Overview Preparing For Replication Performing A Replication Checking The Results Altering Replication 3
What Is TSM Replication? 3 2 1 4 Source Server Target Server 1. Initial replication all objects are copied to target Backup, Archive, and Space Management Objects 2. Deleted objects are deleted from target 3. Modified objects are updated on the target 4. Newly stored objects are copied during next replication 4
Branch Office Model All branches replicate to central location 5
Disaster Recovery Model N4 N5 N6 N1 N2 N3 N1 N2 Nn Primary R/W Node N3 N4 Nn Replicated R/O Node N5 N6 6
Advantages Of TSM Replication Stgpool 1 DB DB Stgpool 1 Stgpool 2 AIX Windows 1. Different Operating Systems on source and target 2. Storage Pool layout can be different on each server 3. Meta-data and data always kept in sync 4. Data can be replicated from any storage pool 5. Objects without data are also replicated 7
Characteristics Highly parallel Can run multiple replications at once Examination of objects in a file space overlaps with batches being sent to target server Highly resilient Multiple levels of retry to achieve success Will attempt to get data from any storage pool Deduplication aware 8
Replication and Deduplication If the data on the source server is... Identified/ Deduplicated And the destination storage pool on the target server is... Enabled for deduplication The result is... Only extents that are not stored in the destination storage pool are transferred. Identified/ Deduplicated Not enabled for deduplication Files are reassembled by the source server and replicated in their entirety. Not Identified/ Deduplicated Enabled for deduplication Files are sent by the source server and replicated in their entirety. Server-side deduplication will occur on the target. Not Identified/ Deduplicated Not enabled for deduplication Files are replicated in their entirety. 9
Agenda Overview Preparing For Replication Performing A Replication Checking The Results Altering Replication 10
Hardware Requirements CPU/RAM minimum recommendations With deduplication, 8 CPU cores, 64GB RAM Best practice: 8 CPU cores, 128GB RAM Without deduplication, 4 CPU cores, 32GB RAM Best practice: 4 CPU cores, 64GB RAM Best practices assume complete server replication Requirements less if replicating less 11
Database And Log Requirements Active and Archive Log At least 64GB Active Log, preferably 128GB A file space with 15,000,000 objects can use 8GB log Archive log sized to 2 times daily log usage Database A 300GB DB on a source server will require an additional 300GB of DB on the target server In addition to current size of target DB Plan DB size and growth appropriately 12
Tasks Create/Verify the server definitions Set the replication target for the server Set server-level RULE defaults (optional) Determine which nodes, file spaces, and data types are to be replicated Assign appropriate rules, or use defaults Enable replication for the nodes Replicate! 13
Define Servers Replication uses SERVER-SERVER sessions, but they can also look like normal client sessions The servers must be cross-defined Similar to Library Client/Library Manager Session authentication performed at server level Can use SSL, if desired SET REPLSERVER to specify target server 14
Replication Rules ALL_DATA_HIGH_PRIORITY High Priority Backup, Archive, and Space Management ACTIVE_DATA_HIGH_PRIORITY High Priority Backup Only ALL_DATA Normal Priority Backup, Archive, and Space Management (Default) ACTIVE_DATA Normal Priority Backup Only Default None 15
Setting Default Rules Examples of setting rules for the server SET BKREPLRuledefault ACTIVE_DATA SET ARREPLRuledefault ALL_DATA_HIGH_PRIORITY SET SPREPLRuledefault ALL_DATA DEFAULT cannot be used for the server rules Examples of setting rules for a node UPDATE NODE XX BKREPLRULE=ALL_DATA UPDATE NODE YY ARREPLRULE=DEFAULT 16
Replication Rule Hierarchy 17
Populating Target Server Two basic methods to populate target server Method 1 Replicate from scratch Best if source and target are in close proximity All eligible data is sent Method 2 Synchronize and Replicate Best for large distances or if bandwidth is limited Use media-based Export/Import to populate target Replication with SYNC links the source and target objects 18
Sync Example SYNCSEND SEND SYNCRECEIVE S1 Linked Pair S1 T1 S2 Linked Pair S2 T2 S3 Linked Pair S3 S4 Linked Pair S4 S5 Linked Pair S5 Source Server Target Server 1. Export/Import used to transfer data to target server 2. REPLMODE set appropriately on each server 3. Replication will linked the common objects and change the REPLMODE 4. Subsequent replications will copy new objects 19
Replication Mode Determines role for a node (source/target) Normal modes SEND the node is the source of replication RECEIVE the node is the target of replication Cannot be set directly SYNC modes SYNCSEND the node is a synced source SYNCRECEIVE the node is a synced target 20
Replication State Indicates whether replication is enabled Used to temporarily stop replicating Applies to: Client Nodes Replication Rules Source Server Client Nodes 21
TSM Symposium 2013: Tivoli Storage Manager: Future Expectations Vendor Talks Managing The Policy Make sure target server has same policy constructs as the source for all nodes Arch CG MC_1 Arch CG MC_1 Bkup CG Domain_A Domain_B Domain_X Source Server Bkup CG Domain_A Domain_B Domain_Y Target Server Replication does not replicate the policy 22
Managing The Policy Replication does not replicate the policy Use use EXPORT/IMPORT or Enterprise Configuration If a policy construct is missing on the target server, the default construct is used 23
Version Control And Retention Managed by the source server only Expiration of replicated nodes disabled on target Source server instructs target to delete objects during replication If you don t replicate, objects don t get deleted Might be useful for taking snapshots 24
TSM Symposium 2013: Tivoli Storage Manager: Future Expectations Vendor Talks Target Management Class Fields Attributes Honored Attributes Ignored Management Class Attributes Management Class Attributes Default Mgmt Class Migration Destination Space Management Technique Auto-Migrate on Non-Use Migration Requires Backup Copygroup Attributes Copy Destination TOC Destination Copygroup Attributes Versions Data Exists Versions Data Deleted Retain Extra Versions Retail Only Version Copy Mode Copy Serialization Copy Frequency 25
Restrictions on Target Server A replicated node is Read-Only Cannot store new data from a client or application Expiration does not delete objects Cannot rename the node Data can be deleted from target with: DELETE VOL DISCARDD=YES AUDIT VOL FIX=YES DELETE FILESPACE **Data deleted from target will be sent during next replication 26
Planning Considerations Need to plan for the daily change rate The use of ACTIVE_ONLY doesn t alter this Are your RAM, CPU, and disks sufficient? How much data needs to be initially replicated to get to the steady-state? Do you have the time and bandwidth to replicate it from scratch? Would it be better to use Export/Import? 27
Agenda Overview Preparing For Replication Performing A Replication Checking The Results Altering Replication 28
Replicate Node Command The REPLICATE NODE command accepts: Multiple nodes and/or node groups Specific file spaces belonging to a node The data type to replicate The priorities to include in the replication REPLICATE NODE starts a single process Process ends when ALL nodes and file spaces are complete Can be scheduled as part of daily maintenance 29
Replicate Node Processing Each node and file space specified is examined List of node/file space/data-type is created Source and target exchange information Authentication validated Replication GUIDs exchanged Once set, can never be changed Can t clone a DB and start replication as new server 30
Replicate Node Processing For each node being processed Target node is registered, if necessary Replication State and Mode are verified Each node is enabled for replication Source in SEND mode, target in RECEIVE mode Target node is synchronized to source Attributes, including passwords, are replicated If you move a node to a different domain on the target, replication will move it back 31
Replicate Node Processing As file spaces are examined, batches created Options to control batch sizes: REPLSIZETHRESH option controls size of replication batches in terms of megabytes REPLBATCHSIZE option controls maximum number of objects in each replication batch Batches are processed even while file spaces are being examined Each batch uses 1 session MAXSESSIONS keyword controls number of batches 32
Cancelling Replication Replication can be cancelled only from the source server CANCEL PROCESS Will cancel a particular replication process CANCEL REPLICATION Will cancel all replications currently running 33
Agenda Overview Preparing For Replication Performing A Replication Checking The Results Altering Replication 34
Status of Current Replications As batches complete, statistics are updated Q PROCESS shows statistics as of last batch 6 Replicate Node Replicating node(s) IRONMAN3. File spaces complete: 8. File spaces identifying and replicating: 0. File spaces replicating: 1. File spaces not started: 0. Files current: 0. Files replicated: 0 of 5. Files updated: 0 of 0. Files deleted: 0 of 0. Amount replicated: 0 KB of 41,720 KB. Amount transferred: 0 KB. Elapsed time: 0 Day(s), 0 Hour(s), 1 Minute(s). 35
Other Statistics Completion messages: ANR0327I Replication of node IRONMAN3 completed. Files current: 0. Files replicated: 5 of 5. Files updated: 0 of 0. Files deleted: 0 of 0. Amount replicated: 41,720 KB of 41,720 KB. Amount transferred: 20,860 KB. Elapsed time: 0 Day(s), 0 Hour(s), 1 Minute(s). ANR0986I Process 6 for Replicate Node running in the BACKGROUND processed 5 items for a total of 42,721,280 bytes with a completion state of SUCCESS at 14:04:03. 36
Q PROCESS Output Details Replicating Node(s): list of nodes being processed in this replication File Spaces complete: file spaces that have completed replication File spaces examining file spaces that are determining what files to process and replicating: and are also replicating data File spaces replicating: file spaces that are replicating data only File spaces not started: file spaces that have not yet begun Files current: files that do not need replicated, updated or deleted Files replicated: files that have been replicated out of a total number to replicate Files updated: files that have been updated out of a total number to update Files deleted: files that have been deleted out of a total number to delete Bytes Replicated: bytes replicated out of a total number of bytes to replicate Bytes Transferred: bytes sent to the target server (shows dedup savings ) 37
QUERY REPLICATION Shows history of replications ANR2017I Administrator SERVER_CONSOLE issued command: QUERY REPLICATION ironman3 Node Name Filespace Name FSID Start Time End Time Status Phase ------------- -------------- ---- ------------------- ------------------- --------- ------- IRONMAN3 /space 5 05/11/12 13:35:12 05/11/12 13:35:31 Ended None IRONMAN3 /space 5 05/11/12 13:42:59 05/11/12 13:43:10 Ended None IRONMAN3 /space 5 05/11/12 14:03:50 05/11/12 14:04:03 Ended None IRONMAN3 /hsmfs 6 05/11/12 13:35:12 05/11/12 13:35:31 Failed None IRONMAN3 /hsmfs 6 05/11/12 13:42:59 05/11/12 13:43:10 Ended None IRONMAN3 /hsmfs 6 05/11/12 14:03:50 05/11/12 14:04:03 Ended None IRONMAN3 /lspace 7 05/11/12 13:35:12 05/11/12 13:35:31 Failed None IRONMAN3 /lspace 7 05/11/12 13:42:59 05/11/12 13:43:10 Ended None IRONMAN3 /lspace 7 05/11/12 14:03:50 05/11/12 14:04:03 Ended None 38
Q REPLICATION F=D Node Name: IRONMAN Filespace Name: /space FSID: 2 Start Time: 02/08/11 21:44:19 End Time: 02/08/11 21:48:14 Status: Completed Process Number: 4 Command: replicate node ironman Phase: Completed Process Running Time: 0 Day(s), 0 Hour(s), 4 Minute(s) Completion State: Complete Reason for Incompletion: None Backup Last Update Date/Time: Backup Target Server: Backup Files Needing No Action: 0 Backup Files To Replicate: 0 Backup Files Replicated: 0 Backup Files Not Replicated Due To Errors: 0 Backup Files Not Yet Replicated: 0 Backup Files To Delete: 0 Backup Files Deleted: 0 Backup Files Not Deleted Due To Errors: 0 Backup Files To Update: 0 Backup Files Updated: 0 Backup Files Not Updated Due To Errors: 0 39
Q REPLICATION F=D Backup Bytes To Replicate (MB): 0 Backup Bytes Replicated (MB): 0 Backup Bytes Transferred (MB): 0 Backup Bytes Not Replicated Due To Errors (MB): 0 Backup Bytes Not Yet Replicated (MB): 0 Archive Last Update Date/Time: 02/08/11 21:48:14 Archive Target Server: NIGLINA Archive Files Needing No Action: 0 Archive Files To Replicate: 39,416 Archive Files Replicated: 39,206 Archive Files Not Replicated Due To Errors: 210 Archive Files Not Yet Replicated: 0 Archive Files To Delete: 0 Archive Files Deleted: 0 Archive Files Not Deleted Due To Errors: 0 Archive Files To Update: 0 Archive Files Updated: 0 Archive Files Not Updated Due To Errors: 0 Archive Bytes To Replicate (MB): 4,335 Archive Bytes Replicated (MB): 4,335 Archive Bytes Transferred (MB): 0 Archive Bytes Not Replicated Due To Errors (MB): 0 Archive Bytes Not Yet Replicated (MB): 0 40
Q REPLICATION F=D Total Files Needing No Action: 314,400 Total Files To Replicate: 39,416 Total Files Replicated: 39,206 Total Files Not Replicated Due To Errors: 210 Total Files Not Yet Replicated: 0 Total Files To Delete: 0 Total Files Deleted: 0 Total Files Not Deleted Due To Errors: 0 Total Files To Update: 0 Total Files Updated: 0 Total Files Not Updated Due To Errors: 0 Total Bytes To Replicate (MB): 4335 Total Bytes Replicated (MB): 4330 Total Bytes Transferred (MB): 1298 Total Bytes Not Replicated Due To Errors (MB): 5 Total Bytes Not Yet Replicated (MB): 0 Estimated Percentage Complete: 100 Estimated Time Remaining: Estimated Time Of Completion: 41
Agenda Overview Preparing For Replication Performing A Replication Checking The Results Altering Replication 42
Disabling Replication DISABLE REPLICATION Disables OUTBOUND replication (not inbound) ENABLE REPLICATION to resume Set REPLSTATE=DISABLED UPDATE NODE X REPLSTATE=DISABLED UPDATE FILESPACE X Y REPLSTATE=DISABLED UPDATE REPLRULE ALL_DATA STATE=DISABLED All file spaces that are disabled will be skipped when REPLICATE NODE is issued 43
Removing Replication REMOVE REPLNODE <nodename> Deletes all replication information from the DB Can be run on source, target, or both Sets the REPLSTATE and REPLMODE to NONE Does not delete any data 44
Removing Replication UPDATE FILESPACE X Y REPLST=PURGEDATA Puts a file space into a special purge state During next replication All objects in the file space are deleted from the target The file space REPLMODE set to NONE The file space REPLSTATE set to DISABLED If PURGEDATA not used, target data not deleted 45
Disabling vs Removing Replication Disabling is temporary All replication data is retained on both servers Nodes on target server are still Read-Only When you re-enable, it picks up where it left off Removing is permanent All replication information is deleted from both servers Nodes on target server become Read-Write Must do a SYNC operation to re-establish pair Or delete all data from the target server 46
Controlling Server Sessions Can enable or disable server sessions Useful on source and target to control replication DISABLE SESSIONS SERVER XX DIRECTION=OUT ENABLE SESSIONS SERVER XX DIRE=BOTH >>-DISAble SESSions-----------------------------------------> >--SERVer--+---------------------------------------------+-->.--DIRection--=--Both------. '--server_name--+--------------------------+--' +--DIRection--=--Both------+ +--DIRection--=--INbound---+ '--DIRection--=--OUTbound--' 47
Reversing Direction On current source/new target REMOVE REPLNODE ZZZ UPDATE NODE ZZZ REPLMODE=SYNCRECEIVE REPLSTATE=ENABLED On current target/new source REMOVE REPLNODE ZZZ UPDATE NODE ZZZ REPLMODE=SYNCSEND REPLSTATE=ENABLED SET REPLSERVER=<new target>, if necessary REPLICATE NODE ZZZ 48
The DB Restore Problem S1 S2 S3 Linked Pair Linked Pair Linked Pair S1 S2 S3 Source Server Target Server 1. Start with 2 files on source and target 2. Back up database at time T1 3. Store another file at time T2 4. Replicate the file 5. Restore database to time T1 6. Replication will delete S3 on the target 49
Database Restore Two choices Accept the current situation ENABLE REPLICATION Copy the most recent data to the source server Follow the Reversing Direction instructions ENABLE REPLICATION REPLICATE NODE Follow the Reversing Direction instructions again To re-establish original source/target roles 50
Tuning For Performance Be sure to test replication throughput Adjust MAXSESSIONS to determine point of diminishing returns Network, CPU, and RAM will impact throughput Make sure sufficient mount points are available for replication For FILE, set mount limit to at least the product of NUMOPENVOLSALLOW and MAXSESSIONS 51
Best Practices If possible, allow sufficient time for IDENTIFY to process all data before replicating Allows replication to benefit from deduplication Run storage pool reclamation after replication Run storage pool migration after replication Faster to replicate from disk than tape Replication and storage pool backup can be run simultaneously 52
Best Practices Don t run all nodes in a single replication Replicate large nodes by themselves With a smaller value for MAXSESSIONS (1-3) Set up and use NODE GROUPS as way to manage replications Apply maintenance as it becomes available 53
Thank You 54