EMC Business Continuity for Microsoft Office SharePoint Server PDF Free Download

EMC Business Continuity for Microsoft Office SharePoint Server 27 Enabled by EMC CLARiiON CX4, EMC RecoverPoint/Cluster Enabler, and Microsoft Hyper-V Proven Solution Guide

Copyright 21 EMC Corporation. All rights reserved. Published October 21 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. Benchmark results are highly dependent upon workload, specific application requirements, and system design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, this workload should not be used as a substitute for a specific customer application benchmark when critical capacity planning and/or product evaluation decisions are contemplated. All performance data contained in this report was obtained in a rigorously controlled environment. Results obtained in other operating environments may vary significantly. EMC Corporation does not warrant or represent that a user can or will achieve similar performance expressed in transactions per minute. No warranty of system performance or price/performance is expressed or implied in this document. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. Part number: H742

Table of Contents Chapter 1: About this Document... 5 Overview... 5 Audience and purpose... 6 Scope... 7 Protecting a SharePoint farm using RecoverPoint/CE... 9 Objectives and findings... 1 Reference Architecture... 12 Validated environment profile... 13 Hardware and software resources... 13 SharePoint farm content types... 15 Prerequisites and supporting documentation... 16 Terminology... 17 Chapter 2: Storage Design... 18 Overview... 18 Considerations... 19 Layout... 19 Storage-system spindle design... 26 Space requirements... 26 Cluster details... 28 Chapter 3: File System... 29 Chapter 4: Application Design... 3 Overview... 3 Considerations... 31 Application design layout... 31 RecoverPoint policy settings... 32 Best practices and recommendations... 33 Chapter 5: Network Design... 34 Overview... 34 Considerations... 35 3

Table of Contents RecoverPoint network and Fibre Channel configuration... 36 Chapter 6: Application Configuration... 37 Overview... 37 Application configuration... 38 Chapter 7: Testing and Validation... 43 Overview... 43 Test methodology... 44 SharePoint user profiles... 45 Tested components... 46 Observations... 47 Section A: Test results summary... 48 Overview... 48 Testing objectives and results... 48 Section B: CLARiiON RecoverPoint splitter testing... 52 Section C: Baseline RecoverPoint testing... 55 Section D: Replication testing... 64 Synchronous test results... 64 Asynchronous test results... 67 Section E: Virtual machine migration testing... 73 Synchronous testing... 73 Asynchronous test results... 81 Section F: Unplanned failover testing... 88 Synchronous test results... 88 Asynchronous test results... 92 Section G: Planned failback testing... 95 Section H: Query scaling testing... 98 Synchronous test results... 98 Asynchronous test results... 13 Chapter 8: Conclusion... 19 4

Chapter 1: About this Document Chapter 1: About this Document Overview Introduction This Proven Solution Guide summarizes a series of best practices that EMC discovered, validated, or otherwise encountered during the validation of a solution for the business continuity of Microsoft Office SharePoint Server 27 using EMC CLARiiON CX4, EMC RecoverPoint with Cluster Enabler, and Microsoft Windows 28 R2 Hyper-V. EMC's commitment to consistently maintain and improve quality is led by the Total Customer Experience (TCE) program, which is driven by Six Sigma methodologies. As a result, EMC has built Customer Integration Labs in its Global Solutions Centers to reflect real-world deployments in which TCE use cases are developed and executed. These use cases provide EMC with an insight into the challenges currently facing its customers. Use case definition A use case reflects a defined set of tests that validates the reference architecture for a customer environment. This validated architecture can then be used as a reference point for a Proven Solution. Contents The content of this chapter includes the following topics. Topic See Page Audience and purpose 6 Scope 7 Protecting a SharePoint farm using RecoverPoint/CE 9 Objectives and findings 1 Reference Architecture 12 Validated environment profile 13 Hardware and software resources 13 SharePoint farm content types 15 Prerequisites and supporting documentation 16 Terminology 17 5

Chapter 1: About this Document Audience and purpose Audience The intended audience for the Proven Solution Guide is: Internal EMC personnel EMC partners Customers Purpose Microsoft SharePoint Server is a business-critical platform and, as such, should have the highest levels of availability under all circumstances for example, a site disaster. The purpose of this Proven Solution is to demonstrate EMC's replication and automation technology as shown by EMC s RecoverPoint product suite with RecoverPoint Cluster Enabler (RecoverPoint/CE) in providing fully automated disaster recovery for enterprise-class virtualized SharePoint farm environments. Using this technology, it is now possible to stretch existing Microsoft Failover Clusters in order to provide geographically separated high availability and disaster recovery. Failover times may vary depending on bandwidth, data change rate, and RecoverPoint configuration, but both the failover and failback times demonstrated by the solution were under 1 minutes. This is significantly less than what can be achieved with native tools. With RecoverPoint s data reduction and compression features, SharePoint environments can now be replicated across longer and bandwidth-constrained distances, and customers can expect automated disaster recovery within minutes. 6

Chapter 1: About this Document Scope Scope This Proven Solution describes a virtualized Microsoft Office SharePoint Server 27 enterprise farm that is protected by RecoverPoint and can serve a particular user count (based on a common user profile). The environment consists of one geographically dispersed six-node Microsoft Windows 28 R2 Failover Cluster with Hyper-V. The Proven Solution consists of a SharePoint 27 Publishing portal that is document-centric. User load can simulate these user actions: Browse Search Modify Business challenge Disaster recovery (DR) for federated applications and environments becomes increasingly difficult to achieve with consistency as the applications and environments grow. As a federated application, SharePoint requires that all server roles, configurations, and data are consistent across the farm. The challenges and solutions are listed in Table 1. Table 1. Challenge Business challenges and solutions Solution Adjusting to dynamic workloads as data volume grows and user workloads change. Maintaining consistency for SharePoint farms to avoid long re-indexing times and severely degraded search capabilities. Performing disaster recovery (DR) for Enterprise SharePoint environments, which is both complex and difficult. Server virtualization allows for simplified configuration and provisioning and rapid modification of the SharePoint farm when and where necessary. RecoverPoint/CE s support of full farm failover allows the SharePoint farm to resume with crash consistency and minimizes lengthy reindexing during failover or failback processes. The solution enables fully automated failover. Once configured, failover becomes automatic and planned failback is quick, simple, and causes minimal disruption. Enterprise SharePoint environments can stretch to tens of servers, with differing roles, such as Index, SQL servers, and application servers. Writing DR plans to meet failover and recovery service level agreements (SLAs) becomes an arduous task, and failovers become unreliable or fail. Typically, with legacy solutions, critical working components of a SharePoint farm, such as a valid search content index, are lost or broken on failover. 7

Chapter 1: About this Document Technology solution This solution describes a virtualized Microsoft Office SharePoint Server 27 enterprise farm environment, protected with remote disaster recovery and fully automated failover, enabled by EMC technology. The environment consists of a six-node Hyper-V Windows Failover Cluster, with three active nodes (production site) and three passive nodes (DR site). The cluster contains the entire host infrastructure required to operate an Office SharePoint Server 27 farm, for example, domain controllers, application servers, Web front ends (WFEs), and SQL servers. SharePoint Server 27 uses Microsoft SQL Server 28 as its data store. Microsoft Windows 28 R2 Enterprise with Hyper-V provides the virtualization platform to maximize hardware utilization and improve SharePoint performance and availability. Hyper-V enables virtual machine high availability through Microsoft Windows Failover Clustering (WFC) and provides both Live and Quick Migration features. In synchronous replication mode, the solution supported Live Migration of virtual machines between sites with minimal disruption to the availability of the virtual machine. EMC CLARiiON CX4-24 arrays provide consolidated, managed, and highlyavailable storage for both production and DR sites. In addition, they provide an inbuilt mechanism (CLARiiON splitter) for RecoverPoint to provide continuous remote replication (CRR) of production data to the DR site. RecoverPoint CRR can operate in two modes: Synchronous replication over an FC inter-site link Remote data is kept in sync with the production site, enabling zero recovery point objective (RPO) and minute-based recovery time objectives (RTO). Distances up to 2 km or 4 ms latencies are supported. Asynchronous replication over an IP WAN link Remote data is asynchronously sent to the remote site. RecoverPoint s Bandwidth Reduction feature allows for replication over longer distances, slower WAN links, or both. Metropolitan to intercontinental distances can be achieved. Fully integrating with Microsoft Windows failover methodologies, RecoverPoint/CE ensures that a complete site failover can happen with minimal downtime and zero user intervention. Planned failover of a virtual machine from site to site is now possible from Microsoft Windows Failover Cluster Manager. The operator does not need to be trained in RecoverPoint or CLARiiON technologies in order to achieve this. Integration of this solution in an existing CLARiiON CX4-24 SharePoint environment requires minimal downtime (minutes to convert cluster groups) and a minimal footprint per site: Two server racks for RecoverPoint appliances (RPAs) Eight 4/8 GB FC ports Four 1 GigE network ports RecoverPoint/CE software installed on all cluster nodes 8

Chapter 1: About this Document Protecting a SharePoint farm using RecoverPoint/CE Protecting a SharePoint farm When a failure occurs on a SharePoint farm, it is very important to keep certain farm components consistent, especially the SQL databases and the Search function. If the Index server is out of sync with the Search database, the Index server will start a full incremental crawl that can take many days to complete. During the crawl, new documents will not be searchable; this may breach your Search SLAs, and farm performance will be hindered. Certain companies with large data sets, for example 35 TB, quote up to 39 days in order to re-crawl their entire farm. RecoverPoint provides crash-consistency across the entire SharePoint farm. Therefore, if a failure occurs, the farm will restart on the local or DR side (depending on the failure) without inconsistency issues. EMC RecoverPoint with RecoverPoint/CE automatically provides crash recovery with minimal service disruption and a fully functional SharePoint farm within minutes of a failure. 9

Chapter 1: About this Document Objectives and findings Objectives and findings Table 2. The objectives and findings of this Proven Solution are listed in Table 2. List of objectives Objective Details Findings Baseline performance testing Baseline RecoverPoint testing Synchronous distance testing Asynchronous distance testing RecoverPoint migration testing RecoverPoint failover testing Conduct performance testing to determine the performance baseline for the SharePoint 27 farm. Measure Microsoft SQL Server load, Microsoft Internet Information Services load, and passed tests/second. Conduct testing to observe and document the performance impact that RecoverPoint replication has on the SharePoint farm. Determine the optimum distances and latencies for RecoverPoint replication over synchronous links. Understand what, if any, overhead or infrastructure impact RecoverPoint replication incurs. Determine the optimum distances and latencies for RecoverPoint replication over asynchronous links. Understand what, if any, overhead or infrastructure impact RecoverPoint replication incurs. Conduct migration tests to determine the impact of virtual machine migration within the production site. (lateral move) Conduct migration tests to determine the impact of virtual machines migrating from production to the DR site using RecoverPoint. (peer move) Determine the time taken to migrate various farm virtual machines. Determine the failover capabilities of RecoverPoint. 248,28 users at 1% concurrency CPU load highest on WFEs at close to 1% 249,36 users at 1% concurrency 1% increase in CLARiiON SP utilization Maximum distance 1 km roundtrip over a 1 Gb/s link Maximum distance of 1,6 km round-trip over a 9 Mb/s link Lateral moves successful, only downtime impact from primary SQL server migration Peer moves to DR site and back successful, only downtime impact from primary SQL server migration Unplanned synchronous failover to DR of the entire farm resulted in 7 minutes downtime with a fully operational farm failed over automatically 1

Chapter 1: About this Document Query scaling testing Determine the distances achievable by using fewer Query servers, which reduces bandwidth usage over synchronous and asynchronous links. Unplanned asynchronous failover to DR of the entire farm resulted in 7 minutes downtime with a fully operational farm failed over automatically Query scaling in a synchronous environment increased distance to 3 km round-trip with a 1 Mb/s link Query scaling in a synchronous environment increased distance to 2,5 km round-trip with a 9 Mb/s link 11

Chapter 1: About this Document Reference Architecture Corresponding Reference Architecture This Proven Solution Guide has a corresponding Reference Architecture document that is available on Powerlink, EMC.com, and KB.Wiki. Refer to EMC Business Continuity for Microsoft Office SharePoint Server 27 Enabled by EMC CLARiiON CX4, EMC RecoverPoint/Cluster Enabler, and Microsoft Hyper-V Reference Architecture for details. If you do not have access to this content, contact your EMC representative. Reference Architecture diagram The solution s overall physical architecture is shown in Figure 1. Figure 1. Reference Architecture diagram 12

Chapter 1: About this Document Validated environment profile Profile characteristics EMC validated the solution with the environment profile as shown in Table 3. Table 3. Profile characteristics Profile Characteristic SharePoint farm user data Value 1.5 TB Concurrency 1% Site collections 15 Sites per site collection 1 SQL 28 virtual machine Hyper-V cluster (physical) Web Front Ends (virtual machines) Excel Services (virtual machine) (hosting Central Admin) 2 instances (active/active) 6 nodes (3 production / 3 DR) 6 (also running Query role) 1 Index servers (virtual machine) 1 Application servers (virtual machines) 1 SCVVM (physical) 1 Hardware and software resources Hardware The hardware used to validate the solution is listed in Table 4. Table 4. Hardware Equipment Quantity Configuration Storage array (Production site) Storage array (DR site) Distance emulation device 1 CLARiiON CX4-24 with: 8 FC ports per SP FLARE 29 44 * 3 GB 15k FC disks 1 CLARiiON CX4-24 with: 8 FC ports per SP FLARE 29 52 * 3 GB 15k FC disks 1 Anue H-series GEM + FC dual-blade Network Emulator software v3.3.5 Network switch 4 48-port Cisco trunkable network switches (2 production and 2 disaster recovery) 13

Chapter 1: About this Document FC switch 3 48-port FC 4 GB switch (2 production and 1 DR) Hyper-V server 6 16-core, 48 GB RAM Infrastructure server 2 8-core, 16 GB RAM RecoverPoint appliance 4 GEN3 Software The software used to validate the solution is listed in Table 5. Table 5. Software Software Windows Server 28 Enterprise Edition R2 Microsoft Hyper-V Microsoft SQL Server 28 Microsoft Office SharePoint Server 27 Microsoft SCVMM 28 R2 PowerPath (with VE capabilities) Visual Studio Test Suite 28 Version RTM 28 R2 64-bit Enterprise Edition SP1 SP2 RTM 5.3 SP1 SP1 KnowledgeLake DocLoader 1.1 EMC RecoverPoint 3.2 SP2 Patch 2 EMC RecoverPoint/Cluster Enabler 4..1 14

Chapter 1: About this Document SharePoint farm content types Farm content types The SharePoint farm content types are detailed in Table 6. Table 6. Farm content types Type Size (KB) DOC 251 DOCX 12 XLSX 2 MPP 235 PPTX 189 JPG 93 GIF 75 VSD 471 The average document size (including document weighting) was 187 KB, which is indicative of real-world average file sizes. 15

Chapter 1: About this Document Prerequisites and supporting documentation Technology It is assumed the reader has a general knowledge of: EMC CLARiiON CX4-24 EMC RecoverPoint EMC RecoverPoint/CE Microsoft SharePoint Microsoft Hyper-V Microsoft Cluster services Supporting documents The following documents, located on Powerlink.com, provide additional, relevant information. Access to these documents is based on your login credentials. If you do not have access to the following content, contact your EMC representative. EMC RecoverPoint Administration Guide EMC RecoverPoint/CE Administration Guide Disaster Recovery for Windows Using EMC RecoverPoint/Cluster Enabler - Applied Technology Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment - Enabled by the EMC CLARiiON CX4 Platform, EMC RecoverPoint, and Microsoft Hyper-V on Windows Server 28 CLARiiON best practices Third-party documents The following documents are available on the Microsoft website: Hyper-V Deployment and Best Practices (online presentation) Running SQL Server 28 in a Hyper-V Environment: Best Practices and Performance Considerations (SQL Server technical article) 16

Chapter 1: About this Document Terminology Introduction This section defines the terms used in this document. Term Synchronous replication Asynchronous replication Consistency group Recovery point objective (RPO) Recovery time objective (RTO) Microsoft Visual Studio Team System (VSTS) Live migration Quick migration Virtual hard disk (VHD) Definition Synchronous replication ensures that data replicated at a secondary site is an identical copy of the primary site with no data lag between the primary and secondary site. The secondary site has to acknowledge receipt of each write before the next write can occur. Asynchronous replication ensures that data replicated at a secondary site is an identical copy of the primary site with some data lag. The writes on the primary site can continue while the acknowledgment from the secondary site can be delayed. A consistency group is a set of mirrors that are managed as a single entity and whose secondary images always remain in a consistent and restartable state with respect to their primary image and each other. RPO is the point in time to which systems and data must be recovered after an outage. This defines the amount of data loss a business can endure. RTO is the period of time within which the systems, applications, or functions must be recovered after an outage. This defines the amount of downtime that a business can endure and survive. VSTS is an Application Lifecycle Management solution that has four rolespecific editions, all based on Microsoft Team Foundation Server (TFS) as the underlying platform. VSTS was used during test validation to generate and emulate user load. Microsoft Hyper-V live migration feature that moves a running virtual machine from one physical host to another without any disruption of service or perceived downtime. Microsoft Hyper-V quick migration feature that rapidly moves a running virtual machine from one physical host system to another with minimal downtime. VHD format is the common virtualization file format that captures the entire virtual machine operating system and the application stack in a single file stores on a file system in the parent partition. 17

Chapter 2: Storage Design Chapter 2: Storage Design Overview Introduction EMC has a number of documents that identify recommendations and guidelines associated with the operation of: EMC CLARiiON and Microsoft SQL Server 25 EMC storage design for Microsoft SharePoint Server 27 EMC CLARiiON storage systems For more information, refer to Chapter 1: About this Document > Prerequisites and supporting documentation > Supporting documents. These best practices form the basis of this Proven Solution. To determine the optimum storage design, follow the guidelines listed below: Determine the required number of IOPS that the storage system must support including a factor for utilization growth (and future local/remote replication spindle overhead). Determine the maximum user count, user profiles, and user concurrency. Define the common and uncommon operations: Search; Modify; and Browse. Define customer response time SLAs for common and uncommon operations. Determine the size of the database and log LUNs. Determine the size of the Index and Query LUNs. Determine the required size of the virtual machine OS drives for various farm roles. Determine the required size of the RecoverPoint Repository and the Journal LUNs. Scope The storage design layout instructions presented in this chapter apply to the specific components used during the development of this solution. Contents This chapter contains the following topics: Topic See Page Considerations 19 Layout 19 Storage-system spindle design 26 Space requirements 26 Cluster details 28 18

Chapter 2: Storage Design Considerations Storage Design Considerations The optimal disk layout was determined from preliminary tests in the environment in order to profile the storage IOPS requirements per virtual machine and role. From this, storage design (RAID group configurations, number of LUNs per RAID group, and so on) could be determined based on tested IOPS calculations. LUN response times are dependent on what the role of the LUN is in the SharePoint environment, and so can vary greatly. For example, the SQL server TempDBs and SearchDBs require a quicker response time than the Application server LUN. During the design phase, it was ensured that the LUN response times from the CLARiiON were not a limiting factor for SharePoint farm and RecoverPoint performance. For RecoverPoint, the response times of the Repository and Journal LUNs were very important. Layout Introduction Storage on the CLARiiON CX4-24 was allocated in order to ensure optimal performance of the SharePoint farm and RecoverPoint components. It was determined during testing based on IOPS that the TempDB and SearchDBs required RAID 1/ LUNs. All other SharePoint LUNs were RAID 5. One hot spare was allocated on each CLARiiON DAE bus. The RecoverPoint Repository and Journal LUNs were all RAID 1/. This was to ensure the best performance for RecoverPoint, especially the Repository LUN. Goal To optimize storage usage and performance of all components in the solution. 19

Chapter 2: Storage Design SharePoint and RecoverPoint production storage layout The layout of the RAID groups for the production site is shown in Figure 2. Figure 2. SharePoint and RecoverPoint production storage layout 2

Chapter 2: Storage Design SharePoint and RecoverPoint DR storage layout The layout of the RAID groups for the DR site is shown in Figure 3. Additional RAID 1/ groups were used for the larger Journal LUNs to accommodate RecoverPoint on the DR site. Figure 3. SharePoint and RecoverPoint DR storage layout Production SharePoint farm disk layout The CLARiiON disk layout legend for Figure 2 in this solution is shown in Table 7. Table 7. CLARiiON disk layout legend for production DAE RAID type Allocation _ RAID 5, 4+1 OS LUNs, WFE OS and Query LUNs 1_ RAID 5, 4+1 RAID 1/, 2+2 RAID 1/ 2+2 HS _1 RAID 5, 4+1 RAID 1/ 2+2 ContentDB 1-5 Data LUNs SearchDB data and log LUNs TempDB data and log LUNs RecoverPoint Journals Hot spare ContentDB 11-15 Data LUNs ContentDB log LUNs SQL internal DB LUNs SearchDB data and log LUNs TempDB data and log LUNs 21

Chapter 2: Storage Design RAID 1/ 2+2 HS RecoverPoint Journals Hot spare 1_1 RAID 5, 4+1 ContentDB 6-1 Data LUNs DR SharePoint farm disk layout The CLARiiON disk layout legend for Figure 3 in this solution is shown in Table 8. Table 8. CLARiiON disk layout legend for DR DAE RAID type Allocation _ RAID 5, 4+1 OS LUNs, WFE OS and Query LUNs RecoverPoint Repository 1_ RAID 5, 4+1 RAID 1/ 2+2 RAID 1/ 2+2 HS _1 RAID 5, 4+1 RAID 1/ 2+2 RAID 1/ 2+2 HS ContentDB 15 Data LUNs SearchDB data and log LUNs TempDB data and log LUNs RecoverPoint Journals RecoverPoint Repository Hot spare ContentDB 11-15 Data LUNs ContentDB log LUNs SQL internal DB LUNs SearchDB data and log LUNs TempDB data and log LUNs RecoverPoint Journals Hot spare 1_1 RAID 5, 4+1 ContentDB 6-1 Data LUNs _2 RAID 1/ 2+2 RecoverPoint Journals 1_2 RAID 1/ 2+2 RecoverPoint Journals OS and application LUN usage The SharePoint farm usage with respective LUN sizes and total LUN space is shown in Table 9. Table 9. SharePoint farm usage LUN size and space CX4-24 array LUN usage LUN sizes Total LUN space WFEs 6 x 4 GB OS LUNs 6 x 7 GB Query LUNs 66 GB Domain controllers 2 x 25 GB OS LUNs 5 GB Index server 5 GB OS LUN 2 GB Index LUN 25 GB 22

Chapter 2: Storage Design Excel and application servers 2 x 5 GB 1 GB Total 1.6 TB RecoverPoint LUN usage production array The RecoverPoint array usage with respective LUN sizes and total LUN space on the production array is shown in Table 1. Table 1. RecoverPoint array usage LUN size and space production array CX4-24 array LUN usage LUN sizes Total LUN space RecoverPoint Repository 1 x 5 GB 5 GB WFEs, application server Journals 1 x 5 GB 5 GB Index Journal 1 x 3 GB 3 GB SQL Journals 2 x 15 GB 3 GB Total 385 GB RecoverPoint LUN usage DR array The RecoverPoint array usage with respective LUN sizes and total LUN space on the DR array is shown in Table 11. Table 11. RecoverPoint array usage LUN size and space DR array CX4-24 array LUN usage LUN sizes Total LUN space RecoverPoint Repository 1 x 5 GB 5 GB WFEs, application server Journals 1 x 1 GB 1 GB Index Journal 1 x 3 GB 3 GB SQL Journals 4 x 15 GB 6 GB Total 735 GB 23

Chapter 2: Storage Design Storage design for SQL databases Introduction to Storage design for SQL databases The first step in determining the storage design for a database is to assess the performance requirements of the applications that run on the database, and the type of load that places on the database and storage system. The main questions are: How many IOPS will the SharePoint farm generate on the storage system? What is the maximum acceptable LUN response rate in ms? In a production environment, the best way to obtain this information is to analyze the current application and database performance. Options for assessing performance requirements The optimal disk layout was determined based on preliminary tests in the environment. From this, the disks could be laid out based on real IOPS calculations. LUN response times are dependent on what the role of the LUN is in SQL, and so can vary greatly. During the design phase, it was ensured that the LUN response times from the CLARiiON were not a limiting factor for SharePoint farm performance. Database storage layout Based on observed IOPS: Database data LUNs were evenly spread over 3 x 5-disk RAID 5 LUNs Database log LUNs resided on a 1 x 5-disk RAID 5 LUN Search and Temp databases and log files resided on 2 x 4-disk RAID 1/ LUNs 24

Chapter 2: Storage Design Storage design for RecoverPoint Introduction The first step in determining the storage design for RecoverPoint is to assess the performance requirements of the applications, and the type of load that places on the database and storage system. The main questions are: How many write IOPS will the SharePoint farm generate on the storage system? What is the maximum acceptable LUN response rate in ms? Is this a synchronous or an asynchronous replication environment? How does this affect the IOPS to the RecoverPoint Repository LUN and Journal LUNs? In a production environment, the best way to obtain this information is to analyze the current application and database performance. Options for assessing performance requirements The optimal disk layout was determined based on preliminary tests in the environment. From this, the disks could be laid out based on real IOPS calculations. LUN response times are very important for RecoverPoint because slow response times can lead to high loads or application delays. During the design phase, it was ensured that LUN response times from the CLARiiON were not a limiting factor for RecoverPoint performance. The production Repository LUN had the highest IOPS of all the RecoverPoint LUNs. Database storage layout Based on observed IOPS: Production RecoverPoint Repository and Journal LUNs were evenly spread over a 1 x 4-disk RAID 1/ group DR RecoverPoint Journal LUNs were evenly spread over 3 x 4-disk RAID 1/ groups The DR Repository LUN resided on a 1 x 5-disk RAID 5 shared group 25

Chapter 2: Storage Design Storage-system spindle design Spindle count used The spindle count used for production is: 3 x 3 GB FC, RAID 5, 15k rpm disks 8 x 3 GB FC, RAID 1, 15k rpm disks 4 x 3 GB FC, RAID 1, 15k rpm disks (RecoverPoint) 2 x 3 GB FC, 15k rpm disks (hot spares) The spindle count used for DR is: 3 x 3 GB FC, RAID 5, 15k rpm disks 8 x 3 GB FC, RAID 1, 15k rpm disks 12 x 3 GB FC, RAID 1, 15k rpm disks (RecoverPoint) 2 x 3 GB FC, 15k rpm disks (hot spares) Space requirements Space requirement validation After the spindle requirements for performance have been calculated, space requirements need to be validated. SharePoint space requirements SharePoint space requirements are shown in Table 12. Table 12. SharePoint space requirements Storage Item Configuration Array disk WFE OS LUNs 24 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) Array disk WFE Query LUNs 42 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) Array disk Array disk AD and Application Server OS LUNs Index server OS and Index LUNs 15 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) 25 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) 26

Chapter 2: Storage Design SQL files The space requirements for the SQL database were determined using best practice guidelines in conjunction with the calculations specific to the environment as shown in Table 13. Table 13. Quantity of LUNs Space requirements for SQL database Item LUN size Configuration (formatted capacity) 1 Content DB data 1 Content DB log 3 Search DB data 1 Search DB log 3 Temp DB data 1 Temp DB log 2 GB 3 x 1 GB RAID 5, 15 x 3 GB FC 15k rpm 25 GB 25 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) 1 GB 3 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) 1 GB 1 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) 1 GB 3 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) 1 GB 1 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) RecoverPoint The space requirements for the RecoverPoint Repository and Journal volumes were determined using best practice guidelines in conjunction with the calculations specific to the environment. Space requirements for production and DR are shown in Table 14 and Table 15, respectively. Table 14. RecoverPoint space requirements for production Item LUN size Configuration (formatted capacity) Repository LUN Journal LUNs 5 GB 5 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) 38 GB 38 GB RAID 1/, 4 x 3 GB FC 15k rpm (shared) Table 15. RecoverPoint space requirements for DR Item LUN size Configuration (formatted capacity) Repository LUN Journal LUNs 5 GB 5 GB RAID 5, 5 x 3 GB FC 15k rpm (shared) 73 GB 73 GB RAID 1/, 4 x 3 GB FC 15k rpm 27

Chapter 2: Storage Design Cluster details Hyper-V cluster design and configuration In Windows Server 28 R2, Hyper-V has the ability to do live migrations. The cluster consisted of six active nodes, three production, and three DR. In this way, EMC could test a lateral move as well as a full site failure. 28

Chapter 3: File System Chapter 3: File System Introduction to file system This chapter details the file system layout for SharePoint and RecoverPoint. Storage enumeration for all Hyper-V virtual machines boot LUNs were configured as virtual hard disks (VHDs). All application LUNs were configured as Pass-through disks. SharePoint The solution used the Microsoft maximum recommended sizing of 1 GB for each SQL SharePoint 27 content database. A total of 1.5 TB of data was split into 15 x 1 GB content databases, each on a 2 GB LUN (file system). All virtual machines used the CX4-24 for their file systems. The SharePoint file system layout is the same as that listed in the EMC Virtual Architecture for Microsoft SharePoint Server 27 Enabled by EMC CLARiiON CX3-4, Microsoft Windows 28, and Hyper-V Integration Guide. This integration guide is available on EMC Powerlink. If you do not have access to the document, contact your EMC representative. RecoverPoint Thirteen RecoverPoint Journal LUNs were allocated, one for each consistency group (CG). One RecoverPoint Repository LUN was allocated as well. The same number of LUNs for RecoverPoint was allocated on the DR array. The LUNs on the DR array were bigger because, in normal operation with the SharePoint farm on the production side, more data needs to be stored there. The data that is stored on a Journal LUN consists of periodic bookmarks for a consistency group, which allows you to restore to a specific point in time, if necessary. The larger the Journal LUN, the more bookmarks that can be stored, and therefore providing the ability to go back further in time for recovery. 29

Chapter 4: Application Design Chapter 4: Application Design Overview Introduction to application design This chapter details the application design for the Proven Solution. SharePoint was configured to ensure continued stability and performance as RecoverPoint replication occurred. RecoverPoint was configured to optimize replication performance rates and meet the objectives of the Proven Solution. This chapter deals with three main elements: SharePoint farm design RecoverPoint design RecoverPoint/CE design Contents This section contains the following topics: Topic See Page Considerations 31 Application design layout 31 RecoverPoint policy settings 32 Best practices and recommendations 33 3

Chapter 4: Application Design Considerations Design considerations The RecoverPoint application planning took into account the impact of RecoverPoint on the storage, the SharePoint farm, SQL Server, and the network infrastructure. Application design layout SharePoint configuration The SharePoint farm was designed as a publishing portal. Fifteen SharePoint sites (document centers) were populated with random user data and configured as one site collection. As the WFEs are CPU-intensive, each WFE virtual machine was allocated four virtual CPUs. The Excel calculation server was configured as the Central Administration server. As crawling is a CPU-intensive activity, the Index server was configured with four CPUs. To improve the crawl speed, the Index server also ran a dedicated WFE role, so all Index requests were maintained within one machine. The incremental crawl schedule was set to 15 minutes. In addition, every front-end WFE server was a Query server for search purposes. RecoverPoint/ CE configuration Each virtual machine was configured as a Microsoft cluster group and a RecoverPoint consistency group. Each cluster group and therefore each consistency group consisted of the virtual machine and all associated disks. When a Microsoft cluster group is migrated to a DR site node, RecoverPoint/CE notifies RecoverPoint to migrate the consistency group to the DR array so that the DR node has access to all the cluster group disks. RecoverPoint then reverses the direction of replication so that the production array has a copy of the consistency group data on the DR array. In order to ensure transportability between sites at an individual virtual machine level, it is important to understand that each virtual machine requires dedicated LUNs and cannot be shared. 31

Chapter 4: Application Design RecoverPoint policy settings RecoverPoint policy settings The policy settings for each consistency group during synchronous replication were: Best compression RPO set to system optimized lag The policy settings for each consistency group during asynchronous replication were: Best compression RPO set to a maximum lag of 3 seconds 32

Chapter 4: Application Design Best practices and recommendations Introduction When designing a SharePoint farm for use with RecoverPoint, it is important to find a balance between the performance of the farm, the storage performance, and the impact of RecoverPoint replication on the farm. Best practices Failover Clustering and RecoverPoint If the cluster service is restarted on a cluster node, or a cluster node is rebooted, image access in RecoverPoint must first be enabled for the node to access the LUNs. Once the node can access the LUNs, image access must be disabled for that consistency group or failover to that cluster node will fail. This is a cluster service limitation. Consistency Groups Each consistency group in RecoverPoint should correspond to a cluster group, which in turn should correspond to a virtual machine. For this Proven Solution, EMC used the convention where the cluster group names were also used as the names for the RecoverPoint consistency groups. Sufficient Bandwidth There should be sufficient bandwidth allocated to ensure that no consistency group goes into a high load state because this causes an initialization, which is a full sync of that consistency group. If this occurs, the consistency group is out of sync and protection is compromised. Initializations also cause more traffic to be sent across the WAN link. Networking The public and private networks should be bridged between the production site and DR site so that a failover does not affect connectivity for the users accessing the SharePoint farm. Query Server scaling If bandwidth is very limited, then consider reducing the number of Query servers to reduce network traffic between the sites. Most of the network traffic in the SharePoint 27 farm is due to index propagations. Preferred owners You should set the Preferred Owners order in Cluster Management to ensure balanced failover of all the cluster groups to the correct DR servers. 33

Chapter 5: Network Design Chapter 5: Network Design Overview Introduction to network design This chapter details the RecoverPoint network and Fibre Channel connectivity. Contents This section contains the following topics: Topic See Page Considerations 35 RecoverPoint network and Fibre Channel configuration 36 34

Chapter 5: Network Design Considerations Considerations To ensure performance of the RecoverPoint components, 1 GB Ethernet connections were used for all network connections. The multiple networks with which the servers were configured, comprised of a cluster heartbeat, public, private, iscsi 1 and 2 for production, iscsi 1 and 2 for DR, and a Live Migration network, are listed in Table 16. Table 16. Network Networks Description Cluster heartbeat Public Private Live migration Dedicated cluster heartbeat network External network for client access, for example, WFEs Internal network for SharePoint, for example, SQL servers Dedicated network for Microsoft Live Migration 35

Chapter 5: Network Design RecoverPoint network and Fibre Channel configuration RecoverPoint network and Fibre Channel configuration The RecoverPoint Appliances (RPAs) for the production and DR sites were connected by a trunked network between two switches in order for them to communicate with each other, as shown in Figure 4. Each RPA was connected to the network switches by 2 x 1 GB connections. The network switches were connected to the Distance Emulator using the WAN VLAN. The other networks, for example, public, were trunked between the switches for IP connectivity between the sites. In synchronous replication mode, the RPAs were connected by Fibre Channel to the distance emulator, and the distance emulator blades were connected by a Fibre Channel connection. The distance emulator could then be set to emulate distance and latency for the Fibre Channel connection. In asynchronous replication, the distance emulator blades were connected by a 1 GB Ethernet connection. The distance emulator could then be set to emulate distance and latency for the Ethernet connection. There were 4 x 4 GB Fibre Channel connections from the RPAs to the CLARiiON CX4-24, with two to each CX4-24 storage processor (SP) in each site. Figure 4. RecoverPoint network configuration 36

Chapter 6: Application Configuration Chapter 6: Application Configuration Overview Introduction to application configuration This chapter shows examples of the configuration that EMC used in this Proven Solution. 37

Chapter 6: Application Configuration Application configuration Hyper-V virtual machine settings The Hyper-V virtual machines had the boot drives configured as VHDs and all secondary drives as Pass-through disks. The drive configuration as well as other basic settings for one of the SQL virtual machines is shown in Figure 5. In Hyper-V, the boot drive must be configured as an IDE device. All other Pass-through disks were configured as SCSI disks. Each SCSI controller could control a maximum number of 15 disks. The maximum number of CPUs per virtual machine in Hyper-V is four. Figure 5. Hyper-V virtual machine configuration 38

Chapter 6: Application Configuration Cluster resource configuration The disks for each virtual machine are added as cluster resources in the virtual machine groups once the virtual machine is configured for high availability in Failover Cluster Manager as shown in Figure 6. There are three production nodes (tce-sp-r9x) and three DR nodes (tce-spdr9x) in the cluster. Figure 6. Cluster resource configuration 39

Chapter 6: Application Configuration Consistency group disk resources In RecoverPoint, you can see the disk resources for the virtual machine in a consistency group as shown in Figure 7. Each replication set consists of the production LUN and the DR replica LUN. The LUNs have to be of the same size. Each consistency group has a Journal LUN on the production and DR arrays. The Journal LUNs on the DR arrays are bigger because in normal operation the farm is active on the production array. The bigger the Journal LUN, the more bookmarks that can be kept by RecoverPoint, so that the consistency group can be rolled back if required. Figure 7. Consistency group disks 4

Chapter 6: Application Configuration Setting the preferred owners for server failover In Cluster Manager, RecoverPoint/CE automatically sets the preferred owners in a specific order so that, during a server failure event, the virtual machines first fail to other servers in the same production site. This is called a lateral move. If all the cluster nodes in the production site fail, then the virtual machines fail over to specific cluster nodes on the DR site. This is called a peer move. The Cluster group property Auto Start needs to be enabled for the virtual machine to resume after a failover, as shown in Figure 8. Figure 8. Preferred Owners settings 41

Chapter 6: Application Configuration Consistency group traffic flow The normal flow of traffic for one of the consistency groups is shown in Figure 9. The green lines show that data is flowing from the production host to the local storage. At the same time, it is being replicated to the remote Journal volume and then to the remote storage. Since the remote RPA will send an acknowledgment of the data being received on the remote site, there is very little delay compared to the data actually being committed to the remote storage before acknowledgment. The system traffic window shows how much data is being committed locally as well as how much data is being sent over the remote link. Figure 9. Traffic flow for a consistency group 42

Chapter 7: Testing and Validation Overview Introduction to testing and validation This chapter details the performance testing under user load using Microsoft Visual Studio Team Suite (Microsoft VSTS). Contents This section contains the following topics: Topic See Page Test methodology 44 SharePoint user profiles 45 Tested components 46 Observations 47 43

Test methodology Data population The data population tool used a set of sample documents. Altering the document title and document metadata before insertion made each document unique to SharePoint. There was one load agent host for each WFE, allowing data to be loaded in parallel until the 1.5 TB (approximately 6.8 million documents) data set goal was reached. The data was spread evenly across the 15 site collections (each a unique content database). Testing tools Microsoft VSTS 28 was used to simulate load on the SharePoint farm using custom test code from an independent third party, KnowledgeLake, a Microsoft Gold Partner. Load generation To generate and emulate client load, Microsoft VSTS was used in conjunction with KnowledgeLake code to simulate real-world, SharePoint user activity. The VSTS team test rig in this Proven Solution consisted of seven virtual machines one controller and six VSTS team agent hosts. The controller evenly distributed client load across the agent hosts. Passed tests per second Passed tests per second are the number of user actions (Browse, Search, or Modify) per second that the SharePoint farm can service. 44

SharePoint user profiles Profile mix A common mix of the user profiles was used to emulate different types of business organizations. For example, some organizations are browse-intensive, while others are search-intensive, modify-intensive, or both. The common user profile used consisted of a mix of Browse, Search, and Modify. The ratio used was: 8% Browse 1% Search 1% Modify All tests were run from a load controller host that spread the load evenly across each of the six load agent hosts. The load controller host also collected performance metrics from all the load agents and hosts in the farm for analysis. Browse In a Browse test, the code simulates a user browsing a site until the user reaches an end document listing that contains no sub-pages. Search In a Search test, the code simulates a user running a stored procedure in the SQL database to find a unique number, in this case a Social Security Number (SSN). The code then performs a web request to search for that unique number. Modify In a Modify test, the code simulates a user retrieving a document. The document name is extracted from the database prior to each test run. The code then modifies the metadata for that document before saving it back to the farm in its modified form. User capacity calculation All users were run against a Microsoft heavy user profile, that is, 6 requests per hour. Zero percent think time was applied to all tests. % think time is the elimination of typical user decision time when browsing, searching, or modifying in Microsoft Office SharePoint Server. For example, a single complete user request is completed from start to finish without user pause, therefore creating a continuous workload on the system. The maximum user capacity is derived from the following formula: # = seconds per hour / RPH / Concurrency% * RPS Example: 36 / 6 / 1% * 54 = 324, Example: 36 / 6 / 1% * 54 = 32,4 45

Tested components Tested components The testing for this Proven Solution is related to the RecoverPoint product protecting data on a SharePoint farm. The tested components were: RecoverPoint functionality and performance Impact of RecoverPoint replication on the SharePoint farm RecoverPoint/CE integration with Microsoft Failover Cluster Distance testing between the production and DR sites Query scaling Live and quick migrations Full farm unplanned failover Full farm planned failback 46

Observations Index server propagation spikes The Index server consistency group statistics are shown in Figure 1. There are spikes in the charts that represent propagations and updates of data being written to the Index server. These large writes cause time lags, as the data has to be replicated to the remote site. In the Proven Solution, the Index server did incremental crawls every 15 minutes. Each incremental crawl then started a propagation to the Query servers. Figure 1. Index server consistency group 47

Section A: Test results summary Overview Introduction to test results summary This section provides a short summary of all the Proven Solution test results. Details of the testing results are provided in the subsequent sections. Graphical content Charts and images shown in the subsequent sections are from tests that were run on the SharePoint farm with RecoverPoint/CE. Testing objectives and results Introduction to testing objectives RecoverPoint reliably and consistently replicated data in the SharePoint farm. A number of tests were performed and are listed as follows. Objective 1: Baseline performance testing Baseline performance testing consisted of maximum user load on the production SharePoint farm. Results The results of the baseline performing testing show: 248,28 users Objective 2: Baseline RecoverPoint testing Baseline RecoverPoint testing consisted of maximum user load on the production SharePoint farm with the RecoverPoint splitter installed on the CLARiiON array and replicating data to the DR site. Results The results of the baseline RecoverPoint testing show: 249,36 users 1% more CLARiiON SP utilization RecoverPoint achieved an average compression ratio of 3.42:1 48

Objective 3: Synchronous distance testing Synchronous distance testing consisted of maximum user load testing on the farm with synchronous replication to the DR site over various distances. Results The results of synchronous distance testing are shown in Table 17 and indicate that it is possible to support a bandwidth of 1 Gb/s within a -1 km distance. Table 17. Synchronous distance testing results Distance scenario: Synchronous replication (FC) Round-trip distance (km) Achieved results Latency (ms) Bandwidth (Mb/s) Baseline (same site) 1, Metro CWDM/DWDM 1 1 1, Objective 4: Asynchronous distance testing Asynchronous distance testing consisted of maximum user load testing on the farm with asynchronous replication to the DR site over various distances. Results The results of asynchronous distance testing are shown in Table 18. The RPO was a maximum 3 seconds between the production and DR sites. Table 18. Asynchronous distance testing results Distance scenario: Asynchronous replication (IP) Round-trip distance (km) Achieved results Latency (ms) Bandwidth (Mb/s) Baseline (same site) 3 City-to-city, shorter distance 4 4 5 State-to-state/inter-country (Europe) 1,6 16 9 Objective 5: RecoverPoint migration testing RecoverPoint migration testing consisted of using Microsoft live migration and quick migration to migrate various virtual machines to the local site and the DR site. The synchronous testing was done over a 1 Gb/s link with 1 ms latency, which equates to a 1-km round trip. The asynchronous testing was done over a 3 Mb/s link with no latency. 49

Results The results of RecoverPoint migration testing are shown in Table 19. Table 19. Migration testing results virtual machine role Failover type virtual machine downtime (seconds) * Synchronous Cluster migration time (seconds) ** virtual machine downtime (seconds) * Asynchronous Cluster migration time (seconds) ** Live migration Domain controller Web front end SQL Database Web front end Local to remote Local to remote Local to remote Local to local 45 35 15 63 21 38 21 6 15 27 21 52 3 1 3 1 Quick migration Domain controller Web front end SQL Database Remote to local Remote to local Remote to local 15 26 75 26 21 26 15 26 75 27 57 55 Web front end * virtual machine downtime: The amount of time that the virtual machine is unresponsive to application requests. ** Cluster migration time: The amount of time required by the cluster to fail over the cluster group (a subset of the overall virtual machine downtime). This is the time from an offline request to a virtual machine cluster group on the source node to the time that the virtual machine online action completes on the target node. Local to local 135 1 9 1 Note Live migration - Virtual machines with large memory configurations take longer to migrate than virtual machines with smaller memory configurations. This is because active memory is copied over the network to the receiving cluster node prior to migration. Live migration requires high bandwidth and low latency networks in order to function. Quick migration - This migration type is essentially a suspend-to-disk and then resume operation. Virtual machine memory must be committed to disk before migration. This causes a significant number of burst writes, which need to be replicated quickly, and therefore requires highbandwidth WAN links between sites. 5

Objective 6: RecoverPoint failover testing RecoverPoint failover testing consisted of the unplanned failover of the entire farm from the production site to the DR site during synchronous and asynchronous replication. Results The results of RecoverPoint failover testing show: Synchronous failover in 7 minutes; total downtime: 7 minutes Asynchronous failover in 7 minutes; total downtime: 7 minutes Objective 7: RecoverPoint planned failback testing RecoverPoint planned failback testing consisted of planned failback of the entire farm from the DR site to the production site during asynchronous replication. Results The results of RecoverPoint planned failback testing show: Planned failback with 2 minutes downtime Objective 8: Query scaling testing Query scaling testing consisted of reducing the number of Query servers, which reduced SharePoint propagation traffic. Because of reduced propagation traffic, less data needed to be replicated, and longer distances between sites could be achieved. Reducing SharePoint propagation traffic significantly reduces the amount of data that needs to be replicated across the WAN. Results The results of Query scaling testing are shown in Table 2. Table 2. Query scaling testing results Number of Query servers Round-trip distance (km) Latency (ms) Bandwidth (Mb/s) Asynchronous Query server scaling 6 1,6 16 9 4 1,8 18 9 2 2,5 25 9 Synchronous Query server scaling 6 1 1 1, 4 2 2 1, 2 3 3 1, 51

Section B: CLARiiON RecoverPoint splitter testing SharePoint farm with and without RecoverPoint splitter The passed tests per second for the SharePoint farm with the RecoverPoint splitter enabled versus not enabled are shown in Figure 11. Passed tests/sec baseline with no splitter: 41.38 248,28 users at 1% concurrency Passed tests/sec baseline with splitter enabled: 41.56 249,36 users at 1% concurrency These results show that the RecoverPoint splitter has minimal impact on farm performance. 5 45 4 35 3 25 2 15 1 5 Passed Tests/Sec no splitter Passed Tests/Sec baseline :15: :21:15 :27:3 :33:45 :4: :46:15 :52:3 :58:45 1:5: 1:11:15 1:17:3 1:23:45 1:3: 1:36:15 1:42:3 1:48:45 1:55: 2:1:15 2:7:3 2:13:45 Figure 11. Passed tests/sec results, with and without the RecoverPoint splitter 52

CLARiiON SP utilization, without RecoverPoint splitter The CLARiiON SP utilization on the SharePoint farm without the RecoverPoint splitter is shown in Figure 12. The average SP CPU utilization was 12.85%. 25 2 15 1 5 SP A - Utilization (%) SP B - Utilization (%) Tue Mar 23 9:58:19 GMT 21 Tue Mar 23 1::19 GMT 21 Tue Mar 23 1:2:19 GMT 21 Tue Mar 23 1:4:19 GMT 21 Tue Mar 23 1:6:19 GMT 21 Tue Mar 23 1:8:19 GMT 21 Tue Mar 23 1:1:19 GMT 21 Tue Mar 23 1:12:19 GMT 21 Tue Mar 23 1:14:19 GMT 21 Tue Mar 23 1:16:19 GMT 21 Tue Mar 23 1:18:19 GMT 21 Tue Mar 23 1:2:19 GMT 21 Tue Mar 23 1:22:19 GMT 21 Tue Mar 23 1:24:19 GMT 21 Tue Mar 23 1:26:19 GMT 21 Tue Mar 23 1:28:19 GMT 21 Tue Mar 23 1:3:19 GMT 21 Tue Mar 23 1:32:19 GMT 21 Tue Mar 23 1:34:19 GMT 21 Tue Mar 23 1:36:19 GMT 21 Tue Mar 23 1:38:19 GMT 21 Tue Mar 23 1:4:19 GMT 21 Tue Mar 23 1:42:19 GMT 21 Tue Mar 23 1:44:19 GMT 21 Tue Mar 23 1:46:19 GMT 21 Tue Mar 23 1:48:19 GMT 21 Tue Mar 23 1:5:19 GMT 21 Tue Mar 23 1:52:19 GMT 21 Tue Mar 23 1:54:19 GMT 21 Tue Mar 23 1:56:19 GMT 21 Tue Mar 23 1:58:19 GMT 21 Figure 12. CLARiiON SP utilization, without the RecoverPoint splitter 53

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 13. Browse averaged 1.23 seconds, Search averaged 1.1 seconds, and Modify averaged 1.15 seconds. 4 3.5 3 2.5 2 1.5 1 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify.5 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 2:15:45 Figure 13. Average test time results for Browse, Search, and Modify 54

Section C: Baseline RecoverPoint testing RecoverPoint replication enabled The passed tests per second with synchronous RecoverPoint replication enabled with 1 Gb/s bandwidth and no latency are shown in Figure 14. Passed tests/sec baseline with splitter enabled: 41.56, which equates to 249,36 users at 1% concurrency. Passed Tests/Sec 5 45 4 35 3 25 2 15 1 5 Passed Tests/Sec :15: :19:45 :24:3 :29:15 :34: :38:45 :43:3 :48:15 :53: :57:45 1:2:3 1:7:15 1:12: 1:16:45 1:21:3 1:26:15 1:31: 1:35:45 1:4:3 1:45:15 1:5: 1:54:45 1:59:3 2:4:15 2:9: 2:13:45 Figure 14. Passed tests/sec results, RecoverPoint replication enabled 55

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 15. Browse averaged 1.22 seconds, Search averaged 1.15 seconds, and Modify averaged 1.16 seconds. 4 3.5 3 2.5 2 1.5 1 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify.5 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 2:15:45 Figure 15. Average test time results for Browse, Search, and Modify 56

CLARiiON SP utilization, RecoverPoint splitter enabled The CLARiiON SP utilization on the SharePoint farm with the RecoverPoint splitter enabled is shown in Figure 16. With an average of 13.81, it is 1% higher than without the RecoverPoint splitter. This is still very low SP CPU utilization overall. 35 3 25 2 15 1 5 SP A - Utilization (%) SP B - Utilization (%) Tue Feb 2 12:28:1 GMT 21 Tue Feb 2 12:3:1 GMT 21 Tue Feb 2 12:32:1 GMT 21 Tue Feb 2 12:34:1 GMT 21 Tue Feb 2 12:36:1 GMT 21 Tue Feb 2 12:38:1 GMT 21 Tue Feb 2 12:4:1 GMT 21 Tue Feb 2 12:42:1 GMT 21 Tue Feb 2 12:44:1 GMT 21 Tue Feb 2 12:46:1 GMT 21 Tue Feb 2 12:48:1 GMT 21 Tue Feb 2 12:5:9 GMT 21 Tue Feb 2 12:52:9 GMT 21 Tue Feb 2 12:54:9 GMT 21 Tue Feb 2 12:56:9 GMT 21 Tue Feb 2 12:58:9 GMT 21 Tue Feb 2 13::9 GMT 21 Tue Feb 2 13:2:9 GMT 21 Tue Feb 2 13:4:9 GMT 21 Tue Feb 2 13:6:9 GMT 21 Tue Feb 2 13:8:9 GMT 21 Tue Feb 2 13:1:9 GMT 21 Tue Feb 2 13:12:9 GMT 21 Tue Feb 2 13:14:9 GMT 21 Tue Feb 2 13:16:9 GMT 21 Tue Feb 2 13:18:9 GMT 21 Tue Feb 2 13:2:9 GMT 21 Tue Feb 2 13:22:9 GMT 21 Tue Feb 2 13:24:9 GMT 21 Tue Feb 2 13:26:9 GMT 21 Tue Feb 2 13:28:9 GMT 21 Figure 16. CLARiiON SP utilization, RecoverPoint splitter enabled 57

RecoverPoint Repository disk utilization The disk utilization for the RecoverPoint Repository LUN is shown in Figure 17. Disk utilization averaged 37.79% with some peaks close to 1%. The Repository LUN is very busy, with each disk servicing an average of 136 IOPS. 1 9 8 7 6 5 4 3 2 1 Bus Enclosure Disk 1 - Utilization (%) Bus Enclosure Disk 11 - Utilization (%) Bus Enclosure Disk 12 - Utilization (%) Bus Enclosure Disk 13 - Utilization (%) Bus Enclosure Disk 14 - Utilization (%) Figure 17. RecoverPoint Repository disk utilization results 58

Lag in time between production and DR sites The maximum lag in time between the production and DR consistency groups was 2.76 seconds, while the average was.86 seconds as shown in Figure 18. This is the average amount of time difference between the production site and the DR site if the production site goes down and the DR site has to recover. RPO lag in time between replicas 3 2.5 2 1.5 1.5 21/2/2 1:23:16.935 GMT 21/2/2 1:29:16.935 GMT 21/2/2 1:35:16.935 GMT 21/2/2 1:41:16.935 GMT 21/2/2 1:47:16.935 GMT 21/2/2 1:53:16.935 GMT 21/2/2 1:59:16.935 GMT 21/2/2 11:5:16.935 GMT 21/2/2 11:11:16.935 GMT 21/2/2 11:17:16.935 GMT 21/2/2 11:23:16.935 GMT 21/2/2 11:29:16.935 GMT 21/2/2 11:35:16.935 GMT 21/2/2 11:41:16.935 GMT 21/2/2 11:47:16.935 GMT 21/2/2 11:53:16.935 GMT 21/2/2 11:59:16.935 GMT 21/2/2 12:5:16.935 GMT 21/2/2 12:11:16.935 GMT 21/2/2 12:17:16.935 GMT 21/2/2 12:23:16.935 GMT 21/2/2 12:29:16.935 GMT 21/2/2 12:35:16.935 GMT 21/2/2 12:41:16.935 GMT 21/2/2 12:47:16.935 GMT 21/2/2 12:53:16.935 GMT 21/2/2 12:59:16.935 GMT 21/2/2 13:5:16.935 GMT 21/2/2 13:11:16.935 GMT 21/2/2 13:17:16.935 GMT 21/2/2 13:23:16.935 GMT - PRODSITE - RPA1 - TCE-SP-APP-1 - sec - PRODSITE - RPA1 - TCE-SP-DB1 - sec - PRODSITE - RPA1 - TCE-SP-DC-1 - sec - PRODSITE - RPA1 - TCE-SP-EXCEL-1 - sec - PRODSITE - RPA1 - TCE-SP-WFE-1 - sec - PRODSITE - RPA1 - TCE-SP-WFE-2 - sec - PRODSITE - RPA1 - TCE-SP-WFE-3 - sec - PRODSITE - RPA2 - TCE-SP-DB2 - sec - PRODSITE - RPA2 - TCE-SP-DC-2 - sec - PRODSITE - RPA2 - TCE-SP-INDEX-1 - sec - PRODSITE - RPA2 - TCE-SP-WFE-4 - sec - PRODSITE - RPA2 - TCE-SP-WFE-5 - sec - PRODSITE - RPA2 - TCE-SP-WFE-6 - sec Figure 18. Results for lag in time between replicas 59

Transfer rate The average transfer rate of 1.99 MB/s, as shown in Figure 19, shows the average amount of data per consistency group that needs to be replicated from the production site to the DR site. The total average amount of data being transferred is 51.74 MB/s. 25 2 15 1 5 Transfer Rate 21/2/2 1:23:16.935 GMT 21/2/2 1:26:16.935 GMT 21/2/2 1:29:16.935 GMT 21/2/2 1:32:16.935 GMT 21/2/2 1:35:16.935 GMT 21/2/2 1:38:16.935 GMT 21/2/2 1:41:16.935 GMT 21/2/2 1:44:16.935 GMT 21/2/2 1:47:16.935 GMT 21/2/2 1:5:16.935 GMT 21/2/2 1:53:16.935 GMT 21/2/2 1:56:16.935 GMT 21/2/2 1:59:16.935 GMT 21/2/2 11:2:16.935 GMT 21/2/2 11:5:16.935 GMT 21/2/2 11:8:16.935 GMT 21/2/2 11:11:16.935 GMT 21/2/2 11:14:16.935 GMT 21/2/2 11:17:16.935 GMT 21/2/2 11:2:16.935 GMT 21/2/2 11:23:16.935 GMT 21/2/2 11:26:16.935 GMT 21/2/2 11:29:16.935 GMT 21/2/2 11:32:16.935 GMT 21/2/2 11:35:16.935 GMT 21/2/2 11:38:16.935 GMT 21/2/2 11:41:16.935 GMT 21/2/2 11:44:16.935 GMT 21/2/2 11:47:16.935 GMT 21/2/2 11:5:16.935 GMT 21/2/2 11:53:16.935 GMT 21/2/2 11:56:16.935 GMT 21/2/2 11:59:16.935 GMT 21/2/2 12:2:16.935 GMT 21/2/2 12:5:16.935 GMT 21/2/2 12:8:16.935 GMT 21/2/2 12:11:16.935 GMT 21/2/2 12:14:16.935 GMT 21/2/2 12:17:16.935 GMT 21/2/2 12:2:16.935 GMT 21/2/2 12:23:16.935 GMT 21/2/2 12:26:16.935 GMT 21/2/2 12:29:16.935 GMT 21/2/2 12:32:16.935 GMT 21/2/2 12:35:16.935 GMT 21/2/2 12:38:16.935 GMT 21/2/2 12:41:16.935 GMT 21/2/2 12:44:16.935 GMT 21/2/2 12:47:16.935 GMT 21/2/2 12:5:16.935 GMT 21/2/2 12:53:16.935 GMT 21/2/2 12:56:16.935 GMT 21/2/2 12:59:16.935 GMT 21/2/2 13:2:16.935 GMT 21/2/2 13:5:16.935 GMT 21/2/2 13:8:16.935 GMT 21/2/2 13:11:16.935 GMT 21/2/2 13:14:16.935 GMT 21/2/2 13:17:16.935 GMT 21/2/2 13:2:16.935 GMT 21/2/2 13:23:16.935 GMT Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer Total Output rate for link during transfer (blank) - (blank) - (blank) - (blank) - (blank) - PRODSITE - RPA1 - TCE-SP-APP-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-DB1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-DC-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-EXCEL-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-2 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-3 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-DB2 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-DC-2 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-INDEX-1 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-4 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-5 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-6 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-APP-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-DB1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-DC-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-EXCEL-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-1 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-2 - Megabytes/sec - PRODSITE - RPA1 - TCE-SP-WFE-3 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-DB2 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-DC-2 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-INDEX-1 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-4 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-5 - Megabytes/sec - PRODSITE - RPA2 - TCE-SP-WFE-6 - Megabytes/sec Figure 19. Transfer rate results 6

WAN throughput The average WAN throughput of 9.59 Mb/s, as shown in Figure 2, shows how much data needs to be replicated from the production site to the DR site. This equates to 11.32 MB/s, which illustrates the advantage of RecoverPoint data compression. 3 25 2 15 1 5 RPA and site WAN throughput 21/2/2 1:23:16.935 GMT 21/2/2 1:28:16.935 GMT 21/2/2 1:33:16.935 GMT 21/2/2 1:38:16.935 GMT 21/2/2 1:43:16.935 GMT 21/2/2 1:48:16.935 GMT 21/2/2 1:53:16.935 GMT 21/2/2 1:58:16.935 GMT 21/2/2 11:3:16.935 GMT 21/2/2 11:8:16.935 GMT 21/2/2 11:13:16.935 GMT 21/2/2 11:18:16.935 GMT 21/2/2 11:23:16.935 GMT 21/2/2 11:28:16.935 GMT 21/2/2 11:33:16.935 GMT 21/2/2 11:38:16.935 GMT 21/2/2 11:43:16.935 GMT 21/2/2 11:48:16.935 GMT 21/2/2 11:53:16.935 GMT 21/2/2 11:58:16.935 GMT 21/2/2 12:3:16.935 GMT 21/2/2 12:8:16.935 GMT 21/2/2 12:13:16.935 GMT 21/2/2 12:18:16.935 GMT 21/2/2 12:23:16.935 GMT 21/2/2 12:28:16.935 GMT 21/2/2 12:33:16.935 GMT 21/2/2 12:38:16.935 GMT 21/2/2 12:43:16.935 GMT 21/2/2 12:48:16.935 GMT 21/2/2 12:53:16.935 GMT 21/2/2 12:58:16.935 GMT 21/2/2 13:3:16.935 GMT 21/2/2 13:8:16.935 GMT 21/2/2 13:13:16.935 GMT 21/2/2 13:18:16.935 GMT 21/2/2 13:23:16.935 GMT WAN throughput from site - PRODSITE - (blank) - (blank) - Megabits/sec Figure 2. RPA and site WAN throughput results 61

Compression ratio The average compression ratio of 3.42:1 with peaks of over 8:1, as shown in Figure 21, shows that the RecoverPoint devices are getting reasonable compression of the data before sending it over the WAN. This allows less bandwidth to be used and reduces the lag in time between replicas. 9 8 7 6 5 4 3 2 1 RPA compression ratio 21/2/2 1:23:16.935 GMT 21/2/2 1:28:16.935 GMT 21/2/2 1:33:16.935 GMT 21/2/2 1:38:16.935 GMT 21/2/2 1:43:16.935 GMT 21/2/2 1:48:16.935 GMT 21/2/2 1:53:16.935 GMT 21/2/2 1:58:16.935 GMT 21/2/2 11:3:16.935 GMT 21/2/2 11:8:16.935 GMT 21/2/2 11:13:16.935 GMT 21/2/2 11:18:16.935 GMT 21/2/2 11:23:16.935 GMT 21/2/2 11:28:16.935 GMT 21/2/2 11:33:16.935 GMT 21/2/2 11:38:16.935 GMT 21/2/2 11:43:16.935 GMT 21/2/2 11:48:16.935 GMT 21/2/2 11:53:16.935 GMT 21/2/2 11:58:16.935 GMT 21/2/2 12:3:16.935 GMT 21/2/2 12:8:16.935 GMT 21/2/2 12:13:16.935 GMT 21/2/2 12:18:16.935 GMT 21/2/2 12:23:16.935 GMT 21/2/2 12:28:16.935 GMT 21/2/2 12:33:16.935 GMT 21/2/2 12:38:16.935 GMT 21/2/2 12:43:16.935 GMT 21/2/2 12:48:16.935 GMT 21/2/2 12:53:16.935 GMT 21/2/2 12:58:16.935 GMT 21/2/2 13:3:16.935 GMT 21/2/2 13:8:16.935 GMT 21/2/2 13:13:16.935 GMT 21/2/2 13:18:16.935 GMT 21/2/2 13:23:16.935 GMT Compression ratio - PRODSITE - (blank) - (blank) - (blank) Figure 21. Compression ratio results 62

RPA CPU utilization due to compression The average RPA CPU utilization due to compression was 17.26% as shown in Figure 22. There were some peaks over 5%, but all within acceptable CPU usage parameters. 6 5 4 3 2 1 RPA utilization due to compression 21/2/2 1:23:16.935 21/2/2 1:28:16.935 21/2/2 1:33:16.935 21/2/2 1:38:16.935 21/2/2 1:43:16.935 21/2/2 1:48:16.935 21/2/2 1:53:16.935 21/2/2 1:58:16.935 21/2/2 11:3:16.935 21/2/2 11:8:16.935 21/2/2 11:13:16.935 21/2/2 11:18:16.935 21/2/2 11:23:16.935 21/2/2 11:28:16.935 21/2/2 11:33:16.935 21/2/2 11:38:16.935 21/2/2 11:43:16.935 21/2/2 11:48:16.935 21/2/2 11:53:16.935 21/2/2 11:58:16.935 21/2/2 12:3:16.935 21/2/2 12:8:16.935 21/2/2 12:13:16.935 21/2/2 12:18:16.935 21/2/2 12:23:16.935 21/2/2 12:28:16.935 21/2/2 12:33:16.935 21/2/2 12:38:16.935 21/2/2 12:43:16.935 21/2/2 12:48:16.935 21/2/2 12:53:16.935 21/2/2 12:58:16.935 21/2/2 13:3:16.935 21/2/2 13:8:16.935 21/2/2 13:13:16.935 21/2/2 13:18:16.935 21/2/2 13:23:16.935 Box utilization - PRODSITE - RPA1 - (blank) - % Box utilization - PRODSITE - RPA2 - (blank) - % Figure 22. RPA CPU utilization due to compression results 63

Section D: Replication testing Synchronous test results Synchronous replication testing The passed tests per second with RecoverPoint replication enabled with 1 Gb/s bandwidth and 1 ms latency are shown in Figure 23. The passed tests/sec averaged 39.73, which is equivalent to 238,38 users at 1% concurrency. Adding 1 ms latency, which equates to a 1 km round trip time, caused a lower user count as well as a few delays, which can be seen by the downward spikes in the chart. 5 Passed Tests/Sec 45 4 35 3 25 2 15 Passed Tests/Sec 1 5 :15: :19:45 :24:3 :29:15 :34: :38:45 :43:3 :48:15 :53: :57:45 1:2:3 1:7:15 1:12: 1:16:45 1:21:3 1:26:15 1:31: 1:35:45 1:4:3 1:45:15 1:5: 1:54:45 1:59:3 2:4:15 2:9: 2:13:45 Figure 23. Passed tests/sec results for synchronous replication 64

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 24. The average search time has a few peaks that happen just after incremental crawls. Incremental crawls cause a lot of write activity on the production farm, which then saturates the WAN link for a short period. All test times were higher because of the increased latency. The average Browse time was 1.24 seconds, average Search time was 1.24 seconds, and average Modify time was 1.48 seconds. 16 14 12 1 8 6 4 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify 2 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 Figure 24. Average test time results for Browse, Search, and Modify 65

DR Journal LUNs disk utilization The disaster recovery Journal LUNs disk usage is shown in Figure 25. The disk utilization averaged 5.5%. The total average disk IOPS for the Journal LUNs was 135.73. 1 9 8 7 6 5 4 3 2 1 11:7:3 11:13:3 11:19:3 11:25:3 11:31:3 11:37:3 11:43:3 11:49:3 11:55:3 12:1:3 12:7:3 12:13:3 12:19:3 12:25:3 12:31:3 12:37:3 12:43:3 12:49:3 12:55:3 13:1:3 Bus Enclosure 2 Disk - Utilization (%) Bus Enclosure 2 Disk 1 - Utilization (%) Bus 1 Enclosure 2 Disk - Utilization (%) Bus 1 Enclosure 2 Disk 1 - Utilization (%) Bus Enclosure 2 Disk 2 - Utilization (%) Bus Enclosure 2 Disk 3 - Utilization (%) Bus 1 Enclosure 2 Disk 2 - Utilization (%) Bus 1 Enclosure 2 Disk 3 - Utilization (%) Bus Enclosure 1 Disk - Utilization (%) Bus Enclosure 1 Disk 1 - Utilization (%) Bus 1 Enclosure Disk - Utilization (%) Bus 1 Enclosure Disk 1 - Utilization (%) Figure 25. DR Journal LUNs disk utilization results 66

Asynchronous test results Asynchronous replication testing (9 Mb/s) Figure 26 shows the passed tests per second with RecoverPoint replication enabled with 9 Mb/s bandwidth and 16 ms latency. Because RecoverPoint was set up as asynchronous, it did not have to wait for an acknowledgement from the DR RPAs to complete a transaction. Passed tests/sec were 41.24, which is equivalent to 247,44 users at 1% concurrency. Sixteen ms latency equates to 1,6 km round trip distances. 5 Passed Tests/Sec 45 4 35 3 25 2 15 Passed Tests/Sec 1 5 :15: :19:45 :24:3 :29:15 :34: :38:45 :43:3 :48:15 :53: :57:45 1:2:3 1:7:15 1:12: 1:16:45 1:21:3 1:26:15 1:31: 1:35:45 1:4:3 1:45:15 1:5: 1:54:45 1:59:3 2:4:15 2:9: 2:13:45 Figure 26. Passed tests/sec results (9 Mb/s) 67

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 27. The average Browse time was 1.22 seconds, average Search time was 1.18 seconds, and average Modify time was 1.23 seconds. 3 2.5 2 1.5 1 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify.5 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 Figure 27. Average test time results for Browse, Search, and Modify 68

Asynchronous replication testing (5 Mb/s) The passed tests per second with RecoverPoint replication enabled with 5 Mb/s bandwidth and 4 ms latency are shown in Figure 28. Four ms latency is equivalent to 4 km round trip distances. Passed tests per second of 41.2 equates to 247,2 users at 1% concurrency. The reason so much bandwidth is needed is that, for each 15-minute incremental crawl, two Query master merges to each Query server, which causes write bursts to the Query LUNs that need to be replicated. 5 Passed Tests/Sec 45 4 35 3 25 2 15 Passed Tests/Sec 1 5 :15: :19:45 :24:3 :29:15 :34: :38:45 :43:3 :48:15 :53: :57:45 1:2:3 1:7:15 1:12: 1:16:45 1:21:3 1:26:15 1:31: 1:35:45 1:4:3 1:45:15 1:5: 1:54:45 1:59:3 2:4:15 2:9: 2:13:45 Figure 28. Passed tests/sec results (5 Mb/s) 69

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 29. The average Browse time was 1.22 seconds, average Search time was 1.1 seconds, and average Modify time was 1.15 seconds. 3 2.5 2 1.5 1 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify.5 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 Figure 29. Average test time results for Browse, Search, and Modify 7

Asynchronous replication testing (3 Mb/s) The passed tests per second with RecoverPoint replication enabled with 3 Mb/s bandwidth and no latency are shown in Figure 3. Passed tests per second of 4.95 equate to 245,7 users at 1% concurrency. 5 45 4 35 3 25 2 15 1 5 Passed Tests/Sec Passed Tests/Sec :15: :19:45 :24:3 :29:15 :34: :38:45 :43:3 :48:15 :53: :57:45 1:2:3 1:7:15 1:12: 1:16:45 1:21:3 1:26:15 1:31: 1:35:45 1:4:3 1:45:15 1:5: 1:54:45 1:59:3 2:4:15 2:9: 2:13:45 Figure 3. Passed tests/sec results (3 Mb/s) 71

Browse, Search, and Modify test times The test times for Browse, Search, and Modify are shown in Figure 31. The average Browse time was 1.23 seconds, average Search time was 1.14 seconds, and average Modify time was 1.25 seconds. 3 2.5 2 1.5 1 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify.5 :15: :2:15 :25:3 :3:45 :36: :41:15 :46:3 :51:45 :57: 1:2:15 1:7:3 1:12:45 1:18: 1:23:15 1:28:3 1:33:45 1:39: 1:44:15 1:49:3 1:54:45 2:: 2:5:15 2:1:3 Figure 31. Average test time results for Browse, Search, and Modify 72

Section E: Virtual machine migration testing Synchronous testing Migration testing (1 Gb/s) The passed tests per second with RecoverPoint replication enabled with 1 Gb/s bandwidth and 1 ms latency are shown in Figure 32. These tests were run at a slightly lower user load of 16,62 at 1% concurrency to simulate normal SharePoint farm activity instead of an extremely busy farm. The big reductions in passed tests per second visible in the chart occurred during the migration of the SQL server. During this time, the farm was inaccessible to users. The first downward spike is a live migration. The interruption occurs because the SQL server memory has to be copied across to the DR site before the server can be migrated. The quick migration back to the production site has longer downtime because all the SQL server data has to be committed to disk and replicated before the server can be migrated. 4 Passed Tests/Sec 35 3 25 2 15 Passed Tests/Sec 1 5 :15: :18:45 :22:3 :26:15 :3: :33:45 :37:3 :41:15 :45: :48:45 :52:3 :56:15 1:: 1:3:45 1:7:3 1:11:15 1:15: 1:18:45 1:22:3 1:26:15 1:3: 1:33:45 1:37:3 1:41:15 Figure 32. Passed tests/sec results (1 Gb/s) 73

Browse, Search, and Modify test times Figure 33 shows how the test times were affected during the failover. Browse peaked at just under 35 seconds. 7 6 5 4 3 2 Avg. Test Time, Browse Avg. Test Time, Search Avg. Test Time, Modify 1 :15: :19:15 :23:3 :27:45 :32: :36:15 :4:3 :44:45 :49: :53:15 :57:3 1:1:45 1:6: 1:1:15 1:14:3 1:18:45 1:23: 1:27:15 1:31:3 1:35:45 1:4: Figure 33. Average test time results for Browse, Search, and Modify 74

Migration of a Domain Controller The live migration of one of the two Domain Controllers to the DR site and then a quick migration back to the production site is shown in Figure 34. The live migration took 1 minute and 3 seconds to complete, while the quick migration took 2 minutes. Since there were two Domain Controllers, there was very little impact to the performance of the farm during these migrations. 1 TCE-SP-DC-1 % Processor Time 9 8 7 6 5 4 3 TCE-SP-DC-1 % Processor Time 2 1 :2:3 :21: :21:3 :22: :22:3 :23: :23:3 :24: :24:3 :25: :25:3 :26: :26:3 :27: :27:3 :28: :28:3 Figure 34. Percentage processor time during migrations of a Domain Controller 75

Failover of a Domain Controller Figure 35 shows the traffic as DC1 is failed over from the production site to the DR site. The migration is initiated in Failover Cluster Manager. The Total Traffic chart shows that local consistency group traffic is being replaced by site-to-site traffic as the virtual machine is migrated. The spikes in System Traffic, Application Traffic, and Incoming Writes are a result of the migration of the virtual machine. This view is available in the RecoverPoint Management Application. Figure 35. Traffic during failover of a Domain Controller 76

Migration of the SQL server The live migration of the SQL server from the production site to the DR site, and the quick migration from the DR site to the production site is shown in Figure 36. The live migration took 2 minutes, while the quick migration took 12 minutes. During these migrations, downtime occurred on the farm because the SQL server was unreachable. 1 TCE-SP-DB1 % Processor Time 9 8 7 6 5 4 3 TCE-SP-DB1 % Processor Time 2 1 :28:3 :3:3 :32:3 :34:3 :36:3 :38:3 :4:3 :42:3 :44:3 :46:3 :48:3 :5:3 :52:3 :54:3 :56:3 :58:3 1::3 1:2:3 Figure 36. Percentage processor time during migrations of a SQL server 77

Migration of a WFE The live migration and quick migration of one of the WFEs from the production site to the DR site and back are shown in Figure 37. Since there were six WFEs on the farm, the impact of one WFE being migrated was small. The live migration took 4 minutes, while the quick migration took 3 minutes and 3 seconds. 1 TCE-SP-WFE-4 % Processor Time 9 8 7 6 5 4 3 TCE-SP-WFE-4 % Processor Time 2 1 1:5:45 1:6:3 1:7:15 1:8: 1:8:45 1:9:3 1:1:15 1:11: 1:11:45 1:12:3 1:13:15 1:14: 1:14:45 1:15:3 1:16:15 1:17: 1:17:45 1:18:3 Figure 37. Percentage processor time during migrations of a WFE 78

Migration of a WFE You can view the migration of WFE4 in the RecoverPoint Management Application as well as Failover Cluster Manager as shown in Figure 38. Figure 38. Migration of a WFE 79

EMC Business Continuity for Microsoft Office SharePoint Server 2007