Real-time Replication in the Real World Richard E. Baum C. Thomas Tyler 2
Agenda Provide an overview of replication solutions Discuss relevant new 2009.2 features Review some real-world solutions 3
Terminology High Availability (HA) Typical Goal: Keep Perforce online 24x7 Disaster Recovery (DR) Business continuity Murphy s Law Insurance Recovery Point Objective (RPO) Targeted max data loss in various failure scenarios Recovery Time Objective (RTO) Targeted max time to recover from a failure 4
Terminology Archive Files Contains all versioned and shelved files Metadata All data in db.* files under P4ROOT Read-Only Replica Copy of live Perforce DBs for read-only operations 5
Terminology Offline Checkpoint Checkpoint created from replicated db.* files. Perforce SDP (Server Deployment Package) Server management scripts from Perforce Consulting DRBD (Distributed Replicated Block Device) Keep your eyes open for emerging technologies! 6
7
8
High Availability Thinking We re willing to invest in a more sophisticated deployment architecture to reduce unplanned downtime. We will not accept data loss for any Single Point of Failure (SPOF). Downtime is extremely expensive for us. We are willing to spend a lot to reduce the likelihood of downtime, and minimize it when it is unavoidable. 9
High Availability Technologies Metadata: Journal Truncation (p4d -jj) p4 replicate DAS/RAID or fast SAN for metadata Archive Files: SAN p4 export for metadata-driven archive updates 10
To Cluster, or Not To Cluster? Perforce is not a cluster-aware application Adds complexity and cost Can reduce downtime Simplifies automation of some failover tasks DNS Switchover Automatically mounting SAN Volumes Perforce SDP designed to simplify cluster failover 11
Sample HA Deployment (w/san) 12
Sample HA Deployment (w/das) 13
14
15
Disaster Recovery Thinking We re willing to invest in a more sophisticated deployment architecture to ensure business continuity in event of a disaster. We need to ensure accessibility of our intellectual property, even in the event of a sudden and total loss of one of our data centers. 16
Disaster Recovery Technologies Metadata: Journal Truncation (p4d -jj) p4 replicate Archive Files: Rsync/Robocopy Block-level WAN replication solutions p4 export for metadata-driven archive updates 17
Sample DR Deployment 18
Read-Only Replica Thinking We have automation that interacts with Perforce, such as continuous integration build systems or reports, that impact performance on our primary server. We re willing to invest in a more sophisticated deployment architecture to improve performance and increase our scalability. 19
Read-Only Replica Technologies Metadata: p4 replicate with filtering wrappers Optional p4broker for a transparent solution Users always point to same P4PORT Archive Files: Shared storage with primary server 20
Sample RO Replica (One Server) 21
Sample RO Replica (2 Servers + Broker) 22
Tools for Metadata Replication Classic journal truncation (p4d -jj) p4jrep (deprecated) p4 replicate (New in 2009.2) p4 export (New in 2009.2) 23
Replication Example #1 to Journal #!/bin/bash P4MASTERPORT=perforce.myco.com:1742 CHECKPOINT_PREFIX=/p4servers/master/checkpoints/myco P4ROOT_REPLICA=/p4servers/replica/root REPSTATE=/p4servers/replica/root/rep.state p4 -p $P4MASTERPORT replicate \ -s $REPSTATE \ -J $CHECKPOINT_PREFIX \ -o /p4servers/replica/logs/journal 24
Replication Example #2 to DBs #!/bin/bash P4MASTERPORT=perforce.myco.com:1742 CHECKPOINT_PREFIX=/p4servers/master/checkpoints/myco P4ROOT_REPLICA=/p4servers/replica/root REPSTATE=/p4servers/replica/root/rep.state p4 -p $P4MASTERPORT replicate \ -s $REPSTATE \ -J $CHECKPOINT_PREFIX -k \ p4d -r $P4ROOT_REPLICA -f -b 1 -jrc - 25
Replication Example #3 - Filtering #!/bin/bash P4MASTERPORT=perforce.myco.com:1742 CHECKPOINT_PREFIX=/p4servers/master/checkpoints/myco P4ROOT_REPLICA=/p4servers/replica/root REPSTATE=/p4servers/replica/root/rep.state p4 -p $P4MASTERPORT replicate \ -s $REPSTATE \ -J $CHECKPOINT_PREFIX -k \ grep --line-buffered -v '@db\.have@' \ p4d -r $P4ROOT_REPLICA -f -b 1 -jrc - 26
Archive File Replication Solutions File level Rsync/Robocopy Filesystem or block-level (DRBD, etc.) Commercial WAN replication solutions Metadata-driven using p4 export 27
Replication Race Metadata vs. Archive Files Which data gets there first? Perfect Consistency Could mean a higher recovery point objective (RPO). Recovery state is clean for all recovered data. Minimum Data Loss More metadata is preserved. p4 verify errors point to lost archive files. 28
Example 1: Classic DR Pre-2009.2 Servers Classic Journal Truncation Commercial WAN replication technology Relaxed 8 hour recovery point objective (RPO) 29
Example 1: Classic DR 30
Example 1: Classic DR Core approach was very straightforward: On the primary server Run p4d -jj every 8 hours Deposit journal files on same volume as archive files (gaining the benefit of free file transfer) On the DR server Replay outstanding journals using p4d jr Perforce instance on spare always up Its daily job is running p4 verify 31
Example 2: Real-Time Replication Suitable for HA or DR Using p4 replicate Wraps the p4 replicate utility Replication engine runs continuously Leave changes in journal for later replay, or Replay changes directly to replica P4ROOT Recovery Point Objective (RPO): As low as 2 seconds for metadata. WAN replication for archive files 32
Example 2: Real-Time Replication 33
Failover Automation Only automate tasks behind FAILOVER button Allow only a trained Perforce administrator to push the button. 34
Failover Automation 35 35
Failover Automation Perforce is not a cluster-aware application Clustering adds some value Simplifies automation of DNS switchover SAN mount transfers etc. Offline checkpoints can be beneficial After failover, db.* files may be in an unknown state 36
Just A Bit More About Failover It s Complicated! Simulation of hardware failures is non-trivial There is a limit to how much confidence you should gain from testing. No substitute for a trained administrator Can analyze failures Determine the best course of action 37
Example 3: Read-only Replica Use Filtered Replication Basic grep (with line buffering) For filtering one-liner journal entries like db.have More sophisticated filtering Needed for journal entries that span multiple lines Perforce Public Depot has a good example: //guest/michael_shields/src/p4jrep/awkfilter.sh 38
Example 3: Read-only Replica For Continuous Integration/Build Farms Define how users will connect to the Replica Simple (for administrators): Modify build scripts to use appropriate P4PORT values Point users at appropriate P4PORT depending on task Simple (for end users): All users use p4broker P4PORT p4broker routes requests to appropriate server instance Ether the live server or the read-only replica 39
Example 3: Read-only Replica Make Archive Files Available on Replica Multiple Server Machines, Master & Replica Use a SAN or other shared storage solution Files mounted read-only on the replica Run Replica instance on Primary server Works if hardware is powerful enough Run replica under different login Cannot write to the archived files 40
Review of RO Replica 41
Summary Advanced replication solutions Easier with p4 replicate and p4 export Typical Uses: High Availability Disaster Recovery Read-only Replicas Perforce Technical Support can help! Perforce Consulting can help, too! 42
Demo 43
Q & A 44