EMC Business Continuity for Microsoft SQL Server PDF Free Download

EMC Business Continuity for Microsoft SQL Server 2008 Enabled by EMC Symmetrix V-Max with SRDF/CE, EMC Replication Manager, and Enterprise Flash Drives Proven Solution Guide

Copyright 2010 EMC Corporation. All rights reserved. Published January, 2010 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. Benchmark results are highly dependent upon workload, specific application requirements, and system design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, this workload should not be used as a substitute for a specific customer application benchmark when critical capacity planning and/or product evaluation decisions are contemplated. All performance data contained in this report was obtained in a rigorously controlled environment. Results obtained in other operating environments may vary significantly. EMC Corporation does not warrant or represent that a user can or will achieve similar performance expressed in transactions per minute. No warranty of system performance or price/performance is expressed or implied in this document. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. Part number: H6574

Table of Contents Table of Contents Chapter 1: About this Document... 5 Overview... 5 Audience and purpose... 6 Scope... 7 Business challenge... 8 Technology solution... 8 Objectives...10 Reference architecture...11 Validated environment profile...12 Hardware and software resources...13 Prerequisites and supporting documentation...15 Terminology...16 Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview...17 Overview...17 Server architecture...18 Key elements of the storage design layout...19 Best practices for storage design...20 Storage design layout...21 Chapter 3: Disaster Recovery Design...23 Overview...23 Deploying Windows 2008 failover clustering and SRDF/CE in synchronous mode...24 Production site protection...27 DR site protection...28 Chapter 4: Replication Management and Design...30 Overview...30 Replication Manager design...31 TimeFinder/Snap and TimeFinder/Clone design...36 Chapter 5: Storage Optimization...37 Overview...37 EFDs...38 Storage tiering...39 Chapter 6: Test and Validation...46 Overview...46 Section A: Testing methodology...47 Overview...47 Generating the workload for testing...48 3

Table of Contents Database components and test configuration...49 Section B: Baseline performance summary...53 Overview...53 Baseline performance test profile...54 Baseline performance test results...55 Section C: OLTP application migration test results summary and recommendations...57 Overview...57 Summary of test results...58 Recommendations...60 Section D: Storage tiering test results summary and recommendations...61 Overview...61 Summary of test results...62 Recommendations...65 Section E: Replication Manager test results summary and recommendations...66 Summary of test results...67 Recommendations...68 Section F: Failover clustering with the Symmetrix V-Max SRDF/CE integrated software test results summary and recommendations...69 Overview...69 Summary of Test Results...70 Recommendations...71 Chapter 7: Conclusion...72 Overview...72 4

Chapter 1: About this Document Chapter 1: About this Document Overview Introduction This Proven Solution Guide summarizes a series of best practices that were discovered, validated, or otherwise encountered during the validation of a solution using: EMC Symmetrix V-Max with EMC SRDF /CE EMC Replication Manager Enterprise Flash Drives (EFDs) This Proven Solution Guide will help turn plans for a highly available Microsoft SQL Server 2008 online transaction processing (OLTP) environment into reality by utilizing EMC Replication Manager for simplified data protection, tiered storage for optimizing resources, and the effectiveness of Symmetrix Remote Data Facility/Cluster Enabler (SRDF/CE) during planned failovers and unplanned site outages. EMC's commitment to consistently maintain and improve quality is led by the Total Customer Experience (TCE) program, which is driven by Six Sigma methodologies. As a result, EMC has built Customer Integration Labs (CIL) in its Global Solutions Centers to reflect real-world deployments in which TCE use cases are developed and executed. These use cases provide EMC with an insight into the challenges currently facing its customers. Use case definition A use case reflects a defined set of tests that validates the reference architecture for a customer environment. This validated architecture can then be used as a reference point for a Proven Solution. Contents The content of this chapter includes the following topics. Topic See Page Audience and purpose 6 Scope 7 Business challenge 8 Technology solution 8 Objectives 10 Reference architecture 11 Validated environment profile 12 Hardware and software resources 13 Prerequisites and supporting documentation 15 Terminology 16 5

Chapter 1: About this Document Audience and purpose Audience The intended audience for this Proven Solutions Guide is: Internal EMC personnel EMC partners, and Customers Purpose The key purpose of this solution is to validate an efficient, remote storage replication process for disaster recovery (DR) and business continuity in a high volume SQL Server OLTP environment. This is accomplished using synchronous replication (SRDF/CE in synchronous mode) for the automated site failover with Microsoft failover clusters across the SRDF link. In this solution, the Symmetrix V-Max array is used for storage consolidation while SQL Server works as the relational database management system supporting a multifaceted OLTP environment. This solution takes advantage of EMC TimeFinder replication technology within the Symmetrix V-Max to protect data by creating consistent snapshots at various points throughout the day. Both asynchronous TimeFinder/Snap and TimeFinder/Clone technologies do not require downtime to perform backups and protect data which is a significant advantage. Also, in comparison to host-based replication, this solution poses a negligible performance impact on the host. Additionally, this solution employs a tiered storage infrastructure, which utilizes EFDs to accelerate access to critical data and low-cost, high-capacity Serial Advanced Technology Attachment (SATA) drives to store historical information. The purpose of this solution is to: Demonstrate an effective DR solution for geographically dispersed failover clusters enabled by SRDF/CE software. Demonstrate simplified application protection using Replication Manager to rapidly protect SQL Server databases in a very active OLTP environment. Validate the benefit of EFD performance for SQL Server OLTP workloads in comparison to traditional Fibre Channel (FC) drive performance. Demonstrate how to migrate SQL Server table partitions between storage tiers including EFDs, FC, and SATA drives. Show the benefits of storage tiering for Microsoft SQL Server OLTP-type applications. 6

Chapter 1: About this Document Scope Scope This document s focus is the design, configuration, and validation of a data protection solution for SQL Server databases hosted on the Symmetrix V-Max platform. The simulated testing environment reflects a high-volume, real-world SQL Server workload. To meet both the performance and cost-efficiency demands placed on critical SQL Server databases, this proven solution combines the benefits of tiered storage with the benefits of advanced storage protection by incorporating: Symmetrix V-Max for highly available, shared storage EMC Replication Manager to create and mount Symmetrix V-Max TimeFinder clones to a mount host Storage Tiering using: EFDs FC disk drives SATA drives Additionally, tiered storage reduces costs significantly as compared to provisioning large amounts of any one particular tier of disk for the entire environment. While mileage will vary with each specific customer environment, EFDs have shown performance improvements of up to 30 times in typical OLTP workloads. EFDs represent a crucial element in this solution by helping to eliminate potential performance bottlenecks. Not in scope Basic Microsoft SQL Server application functionality and best practices are outside the scope of this testing. 7

Chapter 1: About this Document Business challenge Overview SQL Server often forms the foundation for today s most demanding, enterprise-level, transaction-based companies with its rich feature set and ability to store data from structured, semi-structured, and unstructured documents. Technology solution OLTP systems running on a SQL Server platform represent one of the most common data processing systems in today's enterprises. The availability requirements of OLTP systems are very demanding. Downtime can represent failure for critical business processes, effectively halting business operations. It is vital that OLTP systems remain online during backups so that customers can continue to access the system. SQL Server administrators need to ensure that a plan is in place that does not introduce major performance degradation to the environment. A business whose very existence relies on 24x7 availability can succeed or fail depending on the database recovery infrastructure in place. SQL Server database administrators (DBAs) want to design and deploy a SQL-based OLTP infrastructure that: Reduces the cost of storing vast amounts of data Provides redundancy and high availability throughout the entire system Reduces I/O and locking contention for better application performance Ensures 24x7 access to critical business data Achieves enterprise-level performance for transactional latency and user concurrency (the key success criteria for OLTP database systems) Provides nondisruptive storage tiering to enable cost-effective information lifecycle management (ILM) Overview To meet both the performance and cost-efficiency demands placed on critical SQL Server OLTP databases; this proven solution combines the benefits of tiered storage with the benefits of advanced storage protection. Tiered storage This solution utilizes the three types of storage media available on the Symmetrix V-Max platform: EFDs FC disk drives SATA drives The environment also utilizes both RAID 1 mirroring and RAID 5 striping. This ensures that the most active areas (tables) receive the most suitable tiered level of storage to meet performance requirements of the database. 8

Chapter 1: About this Document EFDs EFDs can dramatically increase performance for demanding Microsoft SQL OLTP database applications because they can deliver single-millisecond application response times, and significantly higher IOPS as compared to traditional FC disk drives. Energy consumption can be significantly reduced using EFDs. The high-performance characteristics of EFDs minimize the need for organizations to purchase large numbers of traditional hard disk drives, while only utilizing a small portion of their capacity to satisfy the IOPS and latency requirements. Advanced storage protection In addition to using the latest available technologies for database storage, this solution also utilizes advanced array-based replication technology for both local and remote protection. Array-based replication technology has the advantage of being host-agnostic; any concerns over existing host-based SQL Server protection technologies will not affect the array-based replication of SQL databases. Local data protection is provided by Replication Manager for array-based cloning technology. Replication Manager s integration with Microsoft s Virtual Device Interface (VDI) is a significant enhancement for most OLTP-based environments, as it: Creates application-consistent copies of production data in minutes Produces zero production host overhead (in-array clone processing) Enables off-host backup, data mining, repurposing, data validity checking Remote protection in this solution is provided by EMC s SRDF replication technology in conjunction with Microsoft Failover Clustering extended by EMC s CE software. SRDF/CE is a Microsoft Windows failover cluster extension utility that stretches a typical active/passive cluster across geographically-dispersed sites. The combination of SRDF and CE (SRDF/CE) makes it possible to not only handle unplanned site outages with quick, automated failover, but it also becomes a helpful utility to handle planned site or host-level outages. SRDF/CE ensures that DR failover is repeatable and predictable, while significantly reducing DR failover management. 9

Chapter 1: About this Document Objectives Objectives The solution focuses on the following objectives. Objective Describe the baseline performance results generated using a TPC-E-like testing tool. Validate the benefit of EFD performance for SQL Server OLTP workloads in comparison to traditional FC drive performance. Demonstrate Replication Manager functionality using local clones and snapshots. Demonstrate Replication Manager server DR capabilities. Perform VLUN migrations to appropriate tiers under load and document the impact for both EFD and FC hosted databases. Validate SQL Server application s availability and recovery time with both planned and unplanned failure scenarios under simulated load with SRDF/Synchronous (SRDF/S) automated by SRDF/CE. Details This solution first establishes a performance baseline using: Simulated loads Database maintenance Replication Manager jobs The initial configuration places all of the database files on FC drives. During testing, the database files and log files are moved to EFDs to demonstrate the performance benefits. Replication Manager is used to create database replicas using snapshots to provide point-in-time recovery and clones for daily backup. The impact on daily activity is monitored and documented. Outline the steps to provide DR capabilities for Replication Manager. Provide guidelines and considerations. Depending on the user activity the database files are moved to: EFDs FC drives, or SATA drives Failover cluster functionality is tested and recovery time is measured. The impact of a geographically dispersed node enabled by SRDF/CE is tested in both planned failovers and unexpected site failures. 10

Chapter 1: About this Document Reference architecture Corresponding Reference Architecture The corresponding Reference Architecture document for this use case is available on Powerlink and EMC.com. Refer to EMC Business Continuity for Microsoft SQL Server 2008 Enabled by EMC Symmetrix V-Max with SRDF/CE, EMC Replication Manager, and Enterprise Flash Drives Reference Architecture for details. If you do not have access to this content, contact your EMC representative. Reference Architecture diagram The following diagram depicts the overall physical architecture of the use case. 11

Chapter 1: About this Document Validated environment profile Profile characteristics The use case was validated with the following environment profile. Profile characteristic OLTP database OLTP database size SQL storage type (high frequency data) SQL storage type (medium frequency data) SQL storage type (low frequency, historical data) SQL TimeFinder storage Value Supporting 75,000 users with 1 percent concurrency rate 1.7 TB RAID 5 (7+1), 400 GB EFDs RAID 1, 450 GB, 15k rpm FC drives RAID 5 (3+1), 1,000 GB, 7.2k rpm SATA drives RAID 5 (3+1), 400 GB, 10k rpm FC drives Site link characteristics The solution was validated using the following site link configuration. Site link characteristics Link type Distances tested for synchronous replication Data transmission mechanism Configuration OC-3 (155 Mb/s) 1 Gigabit Ethernet (stretched VLAN) 10 km 200 km FCIP 12

Chapter 1: About this Document Hardware and software resources Production site hardware The Production site hardware used to validate the solution is listed below. Note Testing performed during this use case used one FC switch only. However, most Production environments require two FC switches for redundancy. Equipment at the Production site Quantity Configuration Storage array 1 EMC Symmetrix V-Max 4 V-Max Engines 9 x 400 GB EFDs 213 x 450 GB, 15k rpm FC disks 18 x 1 TB 7.2k rpm SATA drives Fibre Channel switch 1 4 Gb/s enterprise class Fibre Channel switch, (requires a minimum of 48 ports) Ethernet network switch 1 Gigabit Ethernet network switch (requires a minimum of 32 ports) SQL Server active node 1 4 quad core SQL Server local passive node 1 4 quad core 2.93 GHz x 7350 Intel processors with 64 GB of 667 MHz FBD-DIMM 2.93 GHz x 7350 Intel processors with 64 GB of 667 MHz FBD-DIMM Replication Manager server 1 2 CPU quad-core, 4 GB RAM EMC SMC server 1 2 CPU quad-core, 4 GB RAM DR site hardware The DR site hardware used to validate the solution is listed below. Equipment at the DR Site Quantity Configuration Storage array 1 EMC Symmetrix V-Max 4 V-Max Engines 221 x 450 GB, 15k rpm FC disks 18 x 1 TB 7.2k rpm SATA drives Fibre Channel switch 1 4 Gb/s enterprise class Fibre Channel switch, (requires a minimum of 48 ports) Ethernet network switch 1 Gigabit Ethernet network switch (requires a minimum of 32 ports) SQL Server remote passive node 1 4 CPU quad core, 64 GB RAM Replication Manager server 1 2 CPU quad core, 32 GB RAM 13

Chapter 1: About this Document Software The software used to validate the solution is listed below. Software Windows Server 2008, x64 Microsoft SQL Server 2008, x64 Version Enterprise Edition SP2 Enterprise Edition SP1 EMC Enginuity 5874.157.129 EMC Solutions Enabler 7.0 EMC SRDF/CE 3.1 EMC Replication Manager 5.2, SP1 EMC Symmetrix Management Console (SMC) 7.0.0.5 14

Chapter 1: About this Document Prerequisites and supporting documentation Technology It is assumed the reader has a general knowledge of: Microsoft SQL Server 2008 Enterprise Edition EMC Symmetrix V-Max EMC Replication Manager EMC SRDF/CE EMC Solutions Enabler EMC Symmetrix Management Console (SMC) Supporting documents The following documents, located on Powerlink.com, provide additional, relevant information. Access to these documents is based on your login credentials. If you do not have access to the following content, contact your EMC representative. EMC Business Continuity for Microsoft SQL Server 2008 Enabled by EMC Symmetrix V-Max with SRDF/CE, EMC Replication Manager, and Enterprise Flash Drives Reference Architecture (companion document to this Proven Solution Guide) EMC Replication Manager 5.2 Administrator s Guide EMC SRDF/Cluster Enabler Version 3.1 Product Guide EMC Symmetrix DMX-4 Enterprise Flash Drives with Microsoft SQL Server Databases Applied Technology white paper 15

Chapter 1: About this Document Terminology Terms and definitions Term Cluster Metavolume Meta members This section defines terms used in this document. Definition Multiple physical servers that act as a single, logical server. A series of smaller disk devices combined to form a LUN. One of several disk devices that make up a metavolume. Node majority mode Planned failover Preferred owner Quorum mode R1 R2 One of the quorum models available for failover clusters. In this model, each node in the cluster communicates with a vote. A majority of votes must be present to provide cluster services. This node majority mode is commonly used for clusters with an odd number of nodes. Cluster services (SQL Server) are moved from a node on the Production site to a node on the DR site (or remote site) in a controlled manner. A list of server nodes for a cluster service. The preferred owner list is accessed through the Properties of a cluster service. The cluster reviews the list of nodes in the order presented on the property sheet for the first available node to host the service. This setting ensures that when a cluster is running, enough members of the distributed system are operational and communicating, and that at least one replica of the current state is guaranteed or accessible. The quorum mode is set using the Failover Cluster Manager GUI, provided by Microsoft. Represents the local copy of the data at the Production site. Represents the local copy of the data at the remote site. 16

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Overview Introduction The following sections detail the key server architecture and storage design elements for the test OLTP application. The main questions that need to be answered in determining an appropriate storage design layout for this environment include: How many IOPS will the SQL Server databases generate on the storage system? What is the maximum acceptable LUN response rate (latency) in milliseconds (ms)? Contents This chapter contains the following topics: Topic See Page Server architecture 18 Key elements of the storage design layout 19 Best practices for storage design 20 Storage design layout 21 17

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Server architecture Key elements The following sections detail the key elements of the server architecture used for the test OLTP application. Microsoft SQL Server failover cluster The SQL Server application is hosted on a Microsoft SQL Server failover cluster. This cluster consists of one active and two passive nodes. The Production site contains one active node and one passive node dedicated for local failover. The DR site contains one passive node for site failover. Physical servers In addition to the servers that comprise the failover cluster, the test environment utilizes the following physical servers: One server is deployed at each site (Production and DR) to: Manage Replication Manager Act as mount hosts to enable tasks such as database consistency checks against the replica volumes One server at the Production site to host the SMC application Physical server connections The design implements the following physical connections: The Production site and the DR site each contain a 4 Gb/s FC SAN switch and a Gigabit Ethernet LAN switch. The sites are connected by an OC-3 155 Mb/s link between the LAN switches. This link carries FC traffic as well as LAN traffic. The SAN switches connect to each other through an interswitch link (ISL) that communicates across the OC-3 link in FCIP protocol. The servers in the failover cluster each have four HBA ports connected to the SAN switch. The Replication Manager servers each have two HBA ports connected to the SAN switch. The SMC server has two HBA ports connected to the SAN switch. Each Symmetrix V-Max is connected to the SAN through 20 of their 4 Gb/s front-end ports. Each server has one LAN connection to a gigabit Ethernet switch. The SQL Server failover cluster nodes have one cluster private network connection to the gigabit Ethernet switch providing heartbeat for the cluster nodes. The server VLAN and cluster private network are stretched across the OC-3 connection to the DR site. 18

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Key elements of the storage design layout OLTP application storage requirements Capacity, I/O throughput, and latency requirements are the primary drivers in determining the appropriate storage design for OLTP databases. Capacity refers to how much data will need to be stored, in respect to how large the database is. I/O throughput refers to the number of IOPS required to deliver the expected response times. Verify that the storage design: Distributes storage resources efficiently to prevent bottlenecks. Accounts for additional storage to support potential database growth. Supports the application I/O throughput and latency requirements. Considers the impact of overheads of different RAID protection levels on IOPS. Formula for calculating the number of disks required It is critical that data centers supporting high volume SQL Server databases identify the correct number of disks and IOPS required to deliver target response times. Use the following formula to calculate the disks required: ((Total_IO*read_IO%)-read_hit) + ((Total_IO*write_IO%)*RAID Factor))=IOPS Where: Total_I/O = Anticipated database workload Read_I/O% = Percentage of Total_I/O that is read requests Read_hit = Amount of read workload that is serviced from the array cache Write_I/O% = Percentage of Total_I/O that is write requests RAIDFactor = RAID protection overhead (for example, RAID 1 is indicated by 2, RAID 5 is indicated by 4, RAID 6 is indicated by 6) IOPS = The adjusted IOPS requirement Target disks calculated for this use case The following example shows how to identify the number of disks required to support this solution s high-capacity OLTP test environment. See Chapter 2>SQL Server 2008 on the Symmetrix V-Max Design Overview>Formula for calculating the number of disks required, for more information on the formula. Example IOPS=((20,000*.80)-0) + ((20,000*.20)*2)) = 24,000 In testing, the target drives (450 GB, 15k rpm drives) produced 200 IOPS. By dividing the adjusted IOPS by the expected IOPS per disk (24,000/200) it was determined that 120 disks are required to support this workload. The number of disks was then rounded up to 128. Recommendation The number of disks is rounded up from 120 to 128 to balance the workload across the Symmetrix V-Max back-end I/O modules. Because the back end of the storage array contains 32 physical disk controllers, it is good practice to use a number of disks that is divisible by 32. This maximizes the use of the Symmetrix V-Max Engines and back-end controllers. 19

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Best practices for storage design Microsoft SQL Server The OLTP application is hosted on a Microsoft SQL Server 2008 failover cluster. Review the storage-related best practices presented in the table below. Area Disk latency Log files TempDB files Database files Best practice Should not exceed 10 ms for best performance. 1 to 5 ms for log files 4 to 20 ms for database files on OLTP systems (ideally, 10 ms or less) Place on RAID 1 or RAID 10 storage (as they are write-intensive). Consider the following: The number of files should equal the number of physical CPU sockets. This practice increases parallel access to the database. Pre-allocate files to avoid the overhead of autogrowth. Place TempDB files on their own LUNs using RAID 1 or RAID 10 protection. Consider the following: Configure database files across multiple LUNs to take advantage of parallel access and to minimize I/O contention (where necessary). Keep database files and log files on separate LUNs. This is because log files represent a sequential write workload; whereas database files supporting OLTP applications represent random read/write read activity. Combining heterogeneous workloads can have a negative effect on overall database performance. Make sure that all database files in the same file group are equal in size. SQL Server uses a proportional fill algorithm that favors allocation to files with more free space. Keeping the files the same size provides more evenly distribution data. Set the Enable Autogrowth parameter in the Microsoft SQL Server Management Studio graphical user interface (GUI) to expand the database file in case of unanticipated growth. For detailed information on this setting, refer to the appropriate Microsoft SQL Server Management Studio documentation. Pre-allocate the data files. 20

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview Storage design layout Introduction Review the following sections to learn how the storage layout was designed to support the high-capacity OLTP workloads in this use case. Physical disk distribution at the Production site The following image represents the physical disk distribution for the Symmetrix V-Max at the Production site. Physical disk distribution at the DR site The following image represents the physical disk distribution for the Symmetrix V-Max at the DR site. 21

Chapter 2: SQL Server 2008 on the Symmetrix V-Max Design Overview LUN distribution in the test environment LUNs were distributed in the test environment as follows: A total of 32 LUNs are used to support the OLTP application. The LUNs are in a disk group of 128, 450 GB, 15k rpm disk drives that are evenly distributed across all of the available back-end I/O modules. Most of the LUNs are metavolumes. A metavolume is a series of smaller disk devices combined to form a LUN. The disk devices are called meta members. Additionally, the following table lists the members for each LUN. LUNs 22.5 GB LUNs 2 45 GB LUNs 4 90 GB LUNs 8 135 GB LUNs 8 Number of members 22

Chapter 3: Disaster Recovery Design Chapter 3: Disaster Recovery Design Overview Using Windows 2008 failover clustering extended by EMC SRDF/CE It is imperative that SQL Server DBAs running high-capacity OLTP workloads build the following three key elements into the environment s DR design: Repeatability Predictability Reduction in failover management This solution combines all three of these aspects by leveraging the integrated remote data protection features of the Symmetrix V-Max array. SRDF/CE combines Microsoft failover clusters with SRDF/S to automate the failover. If the Production site fails, SRDF/CE is automatically triggered to move services either laterally within the Production site, or to the remote DR site in case of a full site failure. Contents This chapter contains the following topics: Topic Deploying Windows 2008 failover clustering and SRDF/CE in synchronous mode See Page 24 Production site protection 27 DR site protection 28 23

Chapter 3: Disaster Recovery Design Deploying Windows 2008 failover clustering and SRDF/CE in synchronous mode Distances targeted during failover clustering testing This solution is built around geographically dispersed failover clustering targeting the multi-site data center. The geographically dispersed cluster allows organizations to position the individual nodes in separate data centers miles away from one another. In order to make the test environment as realistic as possible, this use case established a baseline using two simulated distances: 10 km represents a campus-like environment or metro area 200 km represents the longest, recommended distances between geographically dispersed sites SRDF/S and SRDF/CE functionality The Symmetrix V-Max storage array is the main component of this solution, integrating the most powerful suite of remote storage replication technologies for superior failover/failback performance EMC SRDF software. More specifically: SRDF/S: Maintains real-time synchronous remote data replication from the Production site to the DR site, providing a recovery point objective (RPO) of zero data loss. SRDF/CE: Works with Microsoft failover clusters to leverage the SRDF/S link for accessing the remote DR site. In addition, SRDF/CE enables consistent replication that virtually eliminates the need to perform full resynchronizations of the environment. See Chapter 6>Section F> Failover clustering with the Symmetrix V-Max SRDF/CE integrated software test results summary and recommendations for findings. SRDF/S configuration details Review the following prior to configuring SRDF/S: SRDF is configured between two Symmetrix V-Max arrays. Synchronous replication is performed at a distance of 10 km and 200 km between the Production site and the DR site. The data transmission mechanism is FCIP. Each array dedicates eight front-end I/O modules (two per engine) for SRDF communication. The link between the Production site and the DR site is an OC-3 (155 Mb/s) 1 Gigabit Ethernet (stretched VLAN) connection. The SQL cluster is connected to four of the eight Symmetrix V-Max front-end I/O modules. The remaining four front-end modules can be used for additional connectivity capacity. All of the SQL application devices (LUNs) are in a single RDF 1 type group. See Chapter 6>Section F> Failover clustering with the Symmetrix V-Max SRDF/CE integrated software test results summary and recommendations for findings. 24

Chapter 3: Disaster Recovery Design SRDF/CE prerequisites Complete the following tasks prior to configuring SRDF/CE. See the EMC SRDF/Cluster Enabler Version 3.1 Product Guide for detailed procedures. SRDF/CE prerequisite Licenses required Failover cluster configuration Zoning Microsoft Cluster Validate Mapping R1 and R2 disk devices Write enable devices in a cluster group Details Install these Solutions Enabler licenses: Base kit SRDF Cluster Enabler Configure the failover cluster prior to installing SRDF/CE. Cluster nodes are zoned only to their local storage. Microsoft Cluster Validate must pass all tests except for storage. This test procedure is part of the Microsoft Failover Cluster installation. Map the R1 disk devices to all nodes at the Production site and the R2 disk devices to peer nodes at the DR site. Ensure that all of the devices in a cluster group are write-enabled on the node that owns the group in the cluster. SRDF/CE configuration details Review the following prior to configuring SRDF/CE. See the EMC SRDF/Cluster Enabler Version 3.1 Product Guide for detailed procedures. Note The failover cluster must be configured prior to installing SRDF/CE. Use the Configure CE Cluster wizard in the EMC Cluster Enabler Manager GUI to: Detect the current cluster. Validate the appropriate software versions. Perform a storage discovery for each cluster node. When the wizard successfully completes, Cluster Enabler displays the components of the current cluster in the navigation tree. See Chapter 6>Section F> Failover clustering with the V-Max SRDF/CE integrated software test results summary and recommendations for findings. 25

Chapter 3: Disaster Recovery Design Cluster Enabler Manager GUI The Cluster Enabler Manager GUI is the user interface that manages SRDF/CE activity. This GUI allows you to configure disk-based resources to automatically move between geographically dispersed sites. The managed cluster objects are represented by folders, as detailed in the following image and table. In the example, the resources for the Production site (site 1) are shown. Folder Groups Storage Sites Nodes Details Displays the Services and Applications from the Failover Cluster Manager. Displays the storage systems (two Symmetrix V-Max arrays). Displays the geographically dispersed locations. Displays the cluster nodes. 26

Chapter 3: Disaster Recovery Design Production site protection Production site protection details Review the following section prior to implementing geographically dispersed failover clusters in the SQL environment. The primary means of managing the failover cluster is done through the Failover Cluster Management GUI provided by Microsoft. The failover cluster is comprised of three nodes, two at the Production site and one at the DR site. A planned failover is managed by moving services to the second node at the Production site. All cluster services preferred owners are configured to run on the first node in the Production site. The second preferred owner is the second node at the Production site, and the third preferred owner is the node at the DR site. Note The preferred owner is managed by SRDF/CE. It is not recommended to manually change the preferred owner. Executing a planned failover Use the Failover Cluster Management GUI to initiate a service failover, as follows: Right-click the service to move. Select Move this service or application to another node. Select the node where the service is to be moved. The service will then be brought online to the other node. 27

Chapter 3: Disaster Recovery Design DR site protection DR site protection details Review the following details prior to implementing geographically dispersed failover clusters into the SQL environment. The failover cluster uses one node at the DR site. This node s function is to run the SQL Server service in the case of a site failure at the Production site. The planned failover is managed by moving services to the node at the DR site. This can be useful for testing DR procedures, which is sometimes required by regulatory agencies. Preferred owner list order in the test environment SRDF-CE will honor the predefined preferred owner list, and manages the storage resources accordingly. The following preferred owner list is implemented in this use case: The primary owner is represented by the first node at the Production site. The secondary owner is represented by the lateral (second) node at the Production site. The third owner is represented by the single node at the DR site. SRDF/CE also implements a delay failback function. Delay failback will automatically modify the preferred owner list so that a failover to a lateral node (cluster node connected to the same storage array) is a higher priority than a failover to a peer node (cluster node connected to a different storage array). Node majority selected as the quorum mode The quorum mode used for this failover cluster is node majority. In order to keep costs low at the DR site, only one peer node is installed. In order to maintain a majority of nodes in the event of a site failure at the Production site, two more Votes (through fileshare witness or peer nodes) would need to be established. Using quorum mode, each node in the cluster communicates with a vote. A majority of votes (two in this configuration) must be present to initiate cluster services. See http://technet.microsoft.com/en-us/library/cc770830(ws.10).aspx for more information on quorum modes. 28

Chapter 3: Disaster Recovery Design Force starting the cluster is required A minimum of cluster nodes were used in this configuration as detailed in Chapter 3>Disaster Recovery Design>Node majority selected as the quorum mode. Since only one peer node would remain at the DR site in the event of a Production site failure, a node majority no longer exists. In this case the cluster must be force started to provide services. Before force starting the cluster, verify that the storage components have failed over to the DR site. Use Cluster Enabler Manager to verify that the storage has failed over, as follows: Click the appropriate SQL Server Group. Verify the Owner Storage ID and the Owning Node. Next, force start the cluster. Open a command window and type: 29

Chapter 4: Replication Management and Design Chapter 4: Replication Management and Design Overview Introduction to the backup infrastructure design Applying this solution s design principles and product recommendations will help to establish a reliable, highly efficient replication process. The replication model uses Replication Manager and EMC TimeFinder technology to create snapshots and clones of the database LUNs at regular intervals within the Symmetrix V-Max. The validated design presented here demonstrates that: There was negligible impact to user experience on the simulated SQL Server OLTP environment during replication with Replication Manager and TimeFinder software. There is a little performance impact on the host (as compared to host-based replication). Contents This section contains the following topics: Topic See Page Replication Manager design 31 TimeFinder/Snap and TimeFinder/Clone design 36 30

Chapter 4: Replication Management and Design Replication Manager design How Replication Manager was used in this solution Replication Manager was used to coordinate the data protection and recovery of the Production SQL Server databases. This solution s Replication Manager-based testing targeted: 75,000 users with 1.7 TB (OLTP database) An 8-hour performance test window to represent a typical production day A Replication Manager server configured as a primary server at the Production site, and a Replication Manager server configured as a secondary server at the DR site VDI snapshot replication Replication Manager utilizes the VDI framework in order to obtain application-consistent snapshots of active databases for both Snap and Clone jobs. Generally, a VDI snapshot backup has minimal impact on database performance during its execution. Most importantly, user connections to the SQL Server are not broken during this process. Read access is unaffected while database write operations occur in the transaction logs during the VDI backup window, and while the VSS snapshot is performed on the underlying filesystems. The writes are temporarily held in memory for a maximum of 10 seconds. Those transactions, which execute a commit operation while the VDI backup is being processed, may be suspended because of their write requirement. Most VDI backup operations will execute within a matter of seconds, although the VDI implementation itself does not implement a timeout value. See http://technet.microsoft.com/en-us/library/ms175536.aspx for more detailed information on using VDI in a SQL Server context. VSS implementation The volume copy shadow services (VSS) implementation used with SQL Server 2008 provides a structured framework for executing SQL Server disk-based backup operations. The VSS framework utilizes an implementation that has similar requirements to that of VDI. The threshold value for VSS operations is 10 seconds. If a disk mirror-based backup exceeds this timing, the backup will abort and I/O operations will proceed as normal. 31

Chapter 4: Replication Management and Design Replication Manager configuration highlights The test environment uses Replication Manager to create replicas of the SQL Server application database for rapid recovery. It is important to note that: Each Replication Manager server uses four 6-cylinder gatekeeper devices assigned to them from their respective Symmetrix V-Max array. In order to effectively replicate the database, 24 LUNs are copied. A Replication Manager storage pool is established (for replica creation) at each site. A storage pool with 192 virtual devices (24 LUNs x 8 sessions) is created to support daily snapshots. A storage pool of 72 standard devices (24 LUNs x 3 sets) is created to support clone operations.this allows for three sets of clones. See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. Replication Manager design considerations Consider the following before implementing Replication Manager into the SQL Server environment. See the EMC Replication Manager 5.2 Administrator s Guide for detailed configuration information. See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. A method to provide server name resolution to an IP address must be available. In the test environment, name resolution is provided through the domain name server (DNS). Install the Replication Manager server on the secondary server before installing on the primary server. During the primary server installation, you will be prompted for the name of the secondary server. If a Replication Manager server already exists, install the secondary server and stop the Replication Manager Server service. Upgrade the Replication Manager server on the primary node. Start the Replication Manager service on the secondary node. For more information, see the EMC Replication Manager 5.2 Administrator s Guide. Provide an IP communications port for the servers to be able to synchronize the Replication Manager database. The default port is 1964. A standalone Replication Manager server can be converted to a DR server configuration. See the EMC Replication Manager Administrator s Guide for details. Each site should have a Replication Manager storage pool to create replicas. Site specific replication tasks should be created. When a secondary server becomes a primary server, the name of the server will change. Pre-configuring tasks for each site will make DR easier. When a failover occurs between sites, the replicas at the now secondary site are no longer valid for restore. The replicas must be manually expired and deleted. 32

Chapter 4: Replication Management and Design Replication Manager and SQL Server observations Replication Manager is very effective for local data protection in a high-volume OLTP environment because: Replication Manager integrates a scheduling feature to automate replica creation. DBAs can schedule consistent SQL replication to occur at regular intervals and manage the lifecycle of those replicas. Replicas can be automatically mounted to alternate hosts for SQL consistency checks or to be transferred to offline storage media such as disk libraries or tape. Replicas can be automatically mounted copy files used for data mining activities, offloading these functions from the Production host. Replication Manager will rotate through the sets of storage in the storage pool automatically expiring the oldest set. Replication Manager and SQL Server best practices Consider the following prior to introducing Replication Manager into the SQL Server environment: See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. Because this is a DR configuration, configure storage pools at each site. Each site requires a set of application sets and jobs specific to the array at the site. In this use case the application set is specified to perform database replication. This setting will copy all data and related transaction logs. Choosing Filegroup replication will not replicate active transaction logs as part of a filegroup replication. Replication Manager does not truncate transaction logs for SQL Server. A database maintenance plan is needed to backup and truncate the transaction logs. In this environment, log backup and truncation is scheduled after the full clone replication of the database. When setting up the SQL Server replication jobs select the Full, Online with advanced recovery (using VDI) setting as the consistency method. This setting enables log replay upon recovery. 33

Chapter 4: Replication Management and Design Replication Manager Server DR functionality Replication Manager auto-discovers the SQL Server from an application perspective, then identifies the Production host s defined storage, enabling DBAs to quickly devise a backup strategy. See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. The replication model is described next: Replication Manager server DR is implemented in a primary server/secondary server model. The primary server controls replication management. All configuration or schedule edits are performed on the primary server. The secondary server is a read-only configuration that is kept synchronized with the primary server. Should the primary server become unavailable, the secondary server can be designated as the primary server to take over and manage replications. Replication Manager Server failover Use Replication Manager s command line interface (CLI) to initiate the failover as follows. Note The steps outlined here are high-level in nature and should be read in conjunction with the EMC Replication Manager 5.2 Administrator s Guide. Step Action 1 Type the following at the command prompt: The system responds with a prompt. 2 Log in to the system: 3 Type the following command to designate the secondary server as the primary server: : 4 Restart the Replication Manager server service for the change to take effect. Note The replicas at the Production site are no longer valid for recovery. 34

Chapter 4: Replication Management and Design Replication Manager server failback Use Replication Manager s command line interface (CLI) to initiate the failback as follows: Note The steps outlined here are high-level in nature and should be read in conjunction with the EMC Replication Manager 5.2 Administrator s Guide. See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. Step Action 1 Start the Replication Manager server service on the original primary server at the Production site. Note This server now becomes the secondary server. 2 Stop the Replication Manager server service on the original secondary server at the DR site 3 Restart the Replication Manager server service on the secondary server at the DR site. : Note The replicas at the DR site are no longer valid for recovery. 35

Chapter 4: Replication Management and Design TimeFinder/Snap and TimeFinder/Clone design Introduction In the test environment, both TimeFinder technologies are leveraged through Replication Manager. TimeFinder/Snap sessions taken at several points during the day provide a point-in-time rollback should corruption occur during peak usage times. If corruption occurs, the database can be rapidly rolled back to the last pointin-time copy. SQL log files can then be applied to return the data to the point of failure providing rapid recovery. TimeFinder/Snap in the test environment Replication Manager integrates the following TimeFinder/Snap capabilities in the test environment: See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. Database snapshots are taken hourly to provide point-in-time copies for recovery should corruption occur. TimeFinder/Snap allows customers to make multiple pointer-based copies of source data simultaneously on multiple target devices from a single source device. This results in point-in-time copies that can be accessed immediately. TimeFinder/Snap does not create a full copy of the data, and therefore does not consume as much space as a TimeFinder/Clone. It is an asynchronous copy-on-first-write (A/COFW) technology that copies blocks as they are changed. Blocks that do not change are read from the source volume. The asynchronous copy-on-first-write (ACOFW) feature improves host performance by eliminating the need to intercept I/O, and copy data before the I/O can complete. With ACOFW, the cache slot for the source disk track is marked as versioned and the host write is allowed to continue. The data is copied to the snap or clone device after the write completes. Snapshot sessions are created immediately and are maintained until they are stopped. Creation of a snapshot causes a small spike in latency at creation time, but has no impact to overall performance. TimeFinder/ Clone in the test environment Replication Manager integrates the following TimeFinder/Clone capabilities in the test environment: See Chapter 6>Section E> Replication Manager test results summary and recommendations for findings. TimeFinder clones of the database LUNs are taken at the end of the day for a daily point-in-time recovery and for copying to other media. Produces a full image copy of the data. Because it is a full copy, there is no dependence on the source volume as in the TimeFinder/Snap feature. Produces a SQL consistent full image copy of the database at the end of the day to provide an independent copy for full database recovery. Subsequent TimeFinder clone copies are incremental syncs, and take less time to complete. Provides the ability to repurpose the data for SQL consistency checks, offloading the function from the Production host. 36

Chapter 5: Storage Optimization Chapter 5: Storage Optimization Overview Optimizing storage resources High-transaction OLTP environments require a built-in plan for storage optimization. Read this section to learn how the Symmetrix V-Max storage system works in the test environment to achieve a high level of storage efficiency through: Moving the OLTP application in its entirety to high-performing EFDs Distributing database files across storage types within the Symmetrix V-Max array (EFDs, FC, and SATA drives) Contents Topic See Page EFDs 38 Storage tiering 39 37

Chapter 5: Storage Optimization EFDs Increased performance levels with EFDs This solution demonstrates how leveraging the Enterprise Flash Drive (EFD) technology available in the Symmetrix V-Max storage array achieves increased performance levels and energy efficiency. Because EFDs contain no moving parts, much of the storage latency delay associated with traditional magnetic disk drives no longer exists. A Symmetrix V-Max with integrated EFDs can deliver single-millisecond application response times and up to 30 times more IOPS than traditional FC disk drives. Additionally, because there are no mechanical components, EFDs consume significantly less energy than hard disk drives. Energy consumption can be reduced up to 98 percent for a given IOPS workload by replacing disk drives with fewer EFDs. For example, in some workload scenarios, it would take 30 or more 15k rpm FC disk drives to deliver the same performance as a single EFD. See Chapter 6>Section C>OLTP Application Migration test results summary and recommendations for findings. 38

Chapter 5: Storage Optimization Storage tiering Three types of storage tiering provided by the Symmetrix V-Max This solution utilizes the three types of storage media available on the Symmetrix V-Max platform: EFDs FC disk drives SATA drives The environment also utilizes both RAID 1 mirroring, and RAID 5 striping to ensure that the most active areas (tables) receive the most suitable tiered level of storage to meet performance requirements. Tiering storage has proven to reduce costs significantly as compared to provisioning large amounts of any one particular storage type for the entire environment. While mileage will vary with each specific customer environment, EFDs have shown performance improvements of up to 30 times in typical OLTP workloads. See Chapter 6>Section D> Storage Tiering test results summary and recommendations for findings. Virtual LUN (VLUN) migration The Virtual LUN migration feature introduced with Symmetrix V-Max offers SQL Server storage DBAs the ability to transparently migrate database volumes between different storage types, as well as from differing tiers of protection. Database volumes can be migrated to either unallocated space (also referred to as unconfigured space) or to configured space, which is defined as existing Symmetrix volumes that are not currently assigned to a host within the same subsystem. The data on the original source volumes is cleared using instant volume table of contents (VTOC) once the migration has been completed. The migration does not require swap or driver (DRV) space, and is nondisruptive to the attached SQL application systems and other internal Symmetrix applications such as TimeFinder and SRDF. All migration combinations of drive types and protection types are valid except for unprotected volumes As demonstrated in this proven solution, the database files move to either EFD, FC or SATA drives depending on user activity. The device migration is completely transparent to the host operating system and SQL application because the migration operation is executed against the Symmetrix device, so the host address of the device is not changed and database operations are uninterrupted. Furthermore, in SRDF environments (like this validated solution) the migration does not require customers to re-establish their DR protection after the migration. 39

Chapter 5: Storage Optimization Storage tiering considerations Consider the following before implementing storage tiering: Performance improves when all of the database and log files with RAID 1 data protection are placed on EFD drives. However, this may not be the best use of the storage resource. Some database files will be more active than others and require higher performing drives. Database applications tend to display a workload skew behavior where most of the I/O demand is focused on a smaller number of LUNs and others have a much lesser demand. Applications that behave this way are good candidates for storage tiering. See Chapter 6>Section D> Storage Tiering test results summary and recommendations for findings. Monitor the I/O activity pattern of the OLTP database Monitoring the I/O pattern of the database over time is a good first step for identifying the most active databases. Usage patterns can change depending on the day of the week, or the week of the month. Gather enough performance data to understand utilization patterns. 40

Chapter 5: Storage Optimization Test OLTP application activity patterns The test environment s OLTP application I/O patterns and IOPS values are detailed in the chart and table below. Also, see Chapter 6>Section A> Database components and test configuration for a detailed graphic that clearly illustrates the three database tables used in the test configuration: Customer data Broker data, and Market data Disk IOPS Disk IOPS Broker\B10 1463.1 Customer\C10 184.7 Broker\B9 1322.3 MKT DB 90.8 Broker\B8 1320.4 Customer\C9 58.5 Broker\B5 1320.2 Customer\C7 58.0 Broker\B2 1319.3 Customer\C5 57.5 Broker\B7 1318.5 Customer\C3 55.5 Broker\B3 1314.7 Customer\C2 54.7 Broker\B4 1314.7 Customer\C4 53.1 Broker\B6 1313.3 Customer\C6 52.0 Broker\B1 1312.6 Customer\C1 51.9 Broker\B0 769.9 Customer\C8 51.8 Customer\C0 311.8 Total IOPS: 15,169 Notes Based on the data collected, most of the broker file group should be migrated to EFDs, and most of the customer file group should be migrated to SATA drives. Some of the tables in the broker file group are more active than those in the customer file group. There is a wide range of activity from the most active to the least active file groups. 41

Chapter 5: Storage Optimization Migrating the OLTP application data The following details how OLTP application data is migrated in the test environment. See Chapter 6>Section C>OLTP Application Migration test results summary and recommendations for findings. The SQL database and log files used in the test environment were placed on 128 FC drives with RAID 1 protection. The data and log files were moved nondisruptively to EFDs in the same array leveraging the VLUN migration feature. The RAID type is also changed during the migration process, allowing the data to fit onto eight EFDs. For example: Testing started with 8+8 RAID 1 on FC drives Testing ended with 7+1 RAID 5 on EFDs In this solution, LUNs are migrated as a member of a storage group. However, LUNs may be: Ungrouped Members of a device group The migration is performed using the LUN Migration Wizard available on the Tasks view of the EMC Symmetrix Management Console (SMC). See Chapter 5>Storage Optimization>Migrating the LUNs using the LUN Migration wizard in SMC. 42

Chapter 5: Storage Optimization Migrating the LUNs using the LUN Migration wizard in SMC Use the LUN Migration wizard available in SMC to configure the LUNs for migrating to the appropriate storage tier. Step Action 1 Click the LUN Migration wizard hyperlink from the Task view in SMC to start the LUN Migration wizard. A Welcome screen appears. 2 Follow the onscreen prompts. Type the data required to migrate the LUNs. 3 The Select Source Devices screen appears: Select the Symmetrix ID. Type a Session Name. The Session Name is used by SMC to track migration progress. Select a Group Type. Select a Group Name. Click on the appropriate storage group and click Ok to confirm the selection. Select the LUNs to be migrated from the Available Devices pane and move to the Selected Source Devices pane on the screen. Click Next. 43

Chapter 5: Storage Optimization 4 The Select Migration Type screen appears: Note These settings enable SMC to create appropriate target devices prior to migration. Select the Unconfigured Devices radio button. Select the RAID type. Select the Disk Group. Click Next. 5 The Summary screen indicates that the LUNs are migrating to the SATA drives using a RAID 6 14+2 protection type. Click Finish to start the migration. 44

Chapter 5: Storage Optimization Checking the migration status Use the LUN Migration wizard in SMC to monitor the migration. Step Action 1 Navigate to the Symmetrix Arrays Migration Session folder. Select the appropriate session. 2 Click the Properties tab to view device information. 3 Click the RAID Group Info tab to view the properties of the device being migrated. Properties include the source RAID type and the destination RAID type. In the example shown below, you can see a primary mirror of RAID 1 and a secondary mirror of RAID 5 (7+1). This is normal during a migration. When the migration completes, there will only be a primary mirror of RAID 5 (7+1). 45

Chapter 6: Test and Validation Chapter 6: Test and Validation Overview Introduction End-to-end testing of the entire infrastructure was performed to validate the achievable performance levels for this solution. Performance is measured using five key phases: Baseline performance OLTP application migration Storage tiering Replication Manager job cycle SQL Server databases restore and recovery with V-Max SRDF/CE integrated software Contents Topic See Page Section A: Testing methodology 47 Section B: Baseline performance summary 53 Section C: OLTP application migration test results summary and recommendations Section D: Storage tiering test results summary and recommendations Section E: Replication Manager test results summary and recommendations Section F: Failover clustering with the Symmetrix V-Max SRDF/CE integrated software test results summary and recommendations 57 61 66 69 46

Chapter 6: Test and Validation Section A: Testing methodology Overview Introduction This section describes the key components of the test configuration. Contents Topic See Page Generating the workload for testing 48 Database components and test configuration 49 47

Chapter 6: Test and Validation Generating the workload for testing SQL load test tool The SQL load test tool used in this environment simulates an OLTP workload. It is comprised of a set of transactional operations designed to exercise system functionalities in a manner representative of a complex OLTP application environment. OLTP workloads The OLTP application used to generate user load in this test environment is based on the TPC Benchmark-E (TPC-E) standard. TPC-E testing is composed of a set of transactions that represent the processing activities. The database schema, data population, transactions, and implementation rules have been designed to be broadly representative of modern OLTP systems. The TPC-E application models the activity of a brokerage firm by: Managing customer accounts Executing customer trade orders Tracking customer activity with financial markets For further clarification, see Chapter 6>Section A> Database components and test configuration for a detailed graphic that clearly illustrates the three database tables used in the test configuration (customer data, broker data, and market data). 48

Chapter 6: Test and Validation Database components and test configuration Key components of the OLTP test environment This benchmark is composed of a set of transactions that are executed against three sets of database tables that represent market data, customer data, and broker data. A fourth set of tables contains generic dimension data such as zip codes. The following diagram illustrates the key components of the test environment. Logical drive functionality and configuration The following table details how the logical drives function in the test environment. Note The standard LUN size configured on the Symmetrix V-Max used in solution testing was 22.5 GB. However, it is possible to configure the LUNs using a smaller size. Function Size Number of LUNs RAID type Filesystem mount points 1.9 GB 2 1 MSDTC storage 22.5 GB 1 1 SQL system databases 22.5 GB 1 1 SQL system logs 22.5 GB 1 1 TempDB data files 22.5 GB 4 1 TempDB log 22.5 GB 1 1 49

Chapter 6: Test and Validation Application database LUNs The test environment integrates 23 LUNs that contain database application files, as detailed in the following table. In addition, see Chapter 6>Section A> Database components and test configuration for a detailed graphic that clearly illustrates the three database tables used in the test configuration (customer data, broker data, and market data). Function Size Number of LUNs RAID type Application Database Transaction Log 90 GB 1 1 file Database File Broker B0 135 GB 1 1 Database File Broker B1 90 GB 1 1 Database File Broker B2 90 GB 1 1 Database File Broker B3 90 GB 1 1 Database File Broker B4 90 GB 1 1 Database File Broker B5 90 GB 1 1 Database File Broker B6 90 GB 1 1 Database File Broker B7 90 GB 1 1 Database File Broker B8 90 GB 1 1 Database File Broker B9 90 GB 1 1 Database File Broker B10 90 GB 1 1 Database File Customer C0 45 GB 1 1 Database File Customer C1 22.5 GB 1 1 Database File Customer C2 22.5 GB 1 1 Database File Customer C3 22.5 GB 1 1 Database File Customer C4 22.5 GB 1 1 Database File Customer C5 22.5 GB 1 1 Database File Customer C6 22.5 GB 1 1 Database File Customer C7 22.5 GB 1 1 Database File Customer C8 22.5 GB 1 1 Database File Customer C9 22.5 GB 1 1 Database File Customer C10 22.5 GB 1 1 Partitioning the SQL database SQL table partitioning is used to segment data into smaller, more manageable sections. Table partitioning can lead to better performance through parallel operations. The performance of large-scale operations across extremely large data sets (for instance many millions of rows) can benefit by performing multiple operations against individual subsets in parallel. The number of table partitions to allocate depends on: Table size LUN utilization The broker and customer file groups for this application are the largest and best candidates for partitioning. For more information on the file groups used in testing, see Chapter 6>Section A> Database components and test configuration for a detailed graphic that shows the three sets of database tables used (customer data, broker data, and market data). 50

Chapter 6: Test and Validation Broker and customer file groups The following table details the file groups used in testing. For further clarification, see Chapter 6>Section A> Database components and test configuration for a detailed graphic that clearly illustrates the three database tables used in the test configuration (customer data, broker data, and market data). Note The OLTP application s storage is configured across the FC drives using RAID 1 protection. File group name Table name Drive (directory with mount point) broker_fg1-10 CASH_TRANSACTION M:\Broker\B1~B10 SETTLEMENT TRADE TRADE_HISTORY customer_fg1-10 HOLDING M:\Customer\C1~C10 HOLDING_HISTORY broker_fg CHARGE M:\Broker\B0 COMMISSION_RATE TRADE_TYPE TRADE_REQUEST BROKER customer_fg ACCOUNT_PERMISSION M:\Customer\C0 CUSTOMER CUSTOMER_ACCOUNT CUSTOMER_TAXRATE HOLDING_SUMMARY market_fg EXCHANGE M:\MKT DB INDUSTRY SECTOR STATUS_TYPE COMPANY COMPANY_COMPETITOR DAILY_MARKET FINANCIAL LAST_TRADE NEWS_ITEM NEWS_XREF SECURITY WATCH_ITEM WATCH_LIST misc_fg TAXRATE M:\MKT DB ZIP_CODE ADDRESS Tempdb Y:\TEMPDB1~4 Transaction Log L:\ 51

Chapter 6: Test and Validation Logical drive configuration The following table details the logical drives used in the application test environment. See Chapter 6>Section A> Database components and test configuration for a detailed graphic that clearly illustrates the three database tables used in the test configuration (customer data, broker data, and market data). Disk File Group Size M:\Broker\B0 broker_fg 135 GB M:\Broker\B1-B10 broker_fg1-10 90 GB each M:\Customer\C0 customer_fg 45 GB M:\Customer\C1-C10 customer_fg1-10 22.5 GB each M:\MKT DB misc_fg 45 GB 52

Chapter 6: Test and Validation Section B: Baseline performance summary Overview Introduction A performance baseline of the SQL Server application is measured prior to introducing SRDF/CE, storage tiering, and EFDs into the environment. Baseline performance metrics identified: CPU utilization Database TPS Read/write activity IOPS and latency values for each database LUN Contents Topic See Page Baseline performance test profile 54 Baseline performance test results 55 53

Chapter 6: Test and Validation Baseline performance test profile Summary A performance baseline of the SQL server application is determined using the following profile: Simulated user load is provided from a utility server that initiates transactions against the database. The database contains 1.7 TB of data supporting 75,000 users. Simulated workload with 1 percent concurrency rate and zero think time consistent with the Microsoft testing framework. 54

Chapter 6: Test and Validation Baseline performance test results Baseline performance CPU utilization CPU utilization averages 73 percent, as shown in the following chart. Baseline performance database activity The following details database activity: The database is processing 2,180 TPS A high percentage of activity focuses on broker data 80 percent Read activity 20 percent Write activity 55

Chapter 6: Test and Validation Baseline performance IOPS activity for database LUNs The following chart shows IOPS activity for the database LUNs. Baseline performance IOPS and latency values The average IOPS and latency values for each database LUN are listed below. Disk IOPS Latency Disk IOPS Latency Broker\B0 769.9 9 ms Customer\C1 51.9 5 ms Broker\B1 1312.6 5 ms Customer\C10 184.7 8 ms Broker\B10 1463.1 5 ms Customer\C2 54.7 5 ms Broker\B2 1319.3 4 ms Customer\C3 55.5 5 ms Broker\B3 1314.7 4 ms Customer\C4 53.1 5 ms Broker\B4 1314.7 4 ms Customer\C5 57.5 5 ms Broker\B5 1320.2 5 ms Customer\C6 52.0 5 ms Broker\B6 1313.3 5 ms Customer\C7 58.0 5 ms Broker\B7 1318.5 5 ms Customer\C8 51.8 5 ms Broker\B8 1320.4 4 ms Customer\C9 58.5 5 ms Broker\B9 1322.3 5 ms MKT DB 90.8 1 ms Customer\C0 311.8 6 ms Total IOPS: 15169.3 56

Chapter 6: Test and Validation Section C: OLTP application migration test results summary and recommendations Overview Introduction The following sections detail the effect of moving the OLTP application database files to EFDs. Contents Topic See Page Summary of test results 58 Recommendations 60 57

Chapter 6: Test and Validation Summary of test results CPU utilization after moving the application database to EFDs Moving the application database to EFDs increased performance. The same user load applied in the baseline performance testing is run against the relocated application database. The CPU percent utilization increased from 73 percent to 95 percent, as shown in the following image. This shows that the storage is now capable of processing more TPS, which translates to increased CPU load on the SQL Server. 58

Chapter 6: Test and Validation Increased IOPS and improved disk latency The table below indicates a dramatic difference in IOPS for the database LUNs after migration completes. Average disk latency is between 2 ms and 4 ms. LUNs on FC drives LUNs on EFDs LUN IOPS Latency IOPS Latency Broker\B0 769.9 9 ms 1427.0 2 ms Broker\B1 1312.6 5 ms 3083.5 2 ms Broker\B2 1319.3 4 ms 2083.5 2 ms Broker\B3 1314.7 4 ms 2380.5 2 ms Broker\B4 1314.7 4 ms 2179.3 2 ms Broker\B5 1320.2 5 ms 2485.0 3 ms Broker\B6 1313.3 5 ms 2535.1 2 ms Broker\B7 1318.5 5 ms 2877.8 2 ms Broker\B8 1320.4 4 ms 2755.3 2 ms Broker\B9 1322.3 5 ms 3083.5 3 ms Broker\B10 1463.1 5 ms 1427.0 3 ms Customer\C0 311.8 6 ms 147.3 3 ms Customer\C1 51.9 5 ms 157.2 3 ms Customer\C10 184.7 8 ms 683.4 2 ms Customer\C2 54.7 5 ms 157.2 2 ms Customer\C3 55.5 5 ms 173.9 2 ms Customer\C4 53.1 5 ms 157.5 3 ms Customer\C5 57.5 5 ms 144.2 3 ms Customer\C6 52.0 5 ms 112.8 2 ms Customer\C7 58.0 5 ms 149.4 2 ms Customer\C8 51.8 5 ms 122.2 2 ms Customer\C9 58.5 5 ms 122.5 2 ms MKT DB 90.8 1 ms 203.3 1 ms Total IOPS: 15169.3 28648.4 Reduction in physical disks The number of physical disks is reduced from 128 FC drives to 8 EFDs with an increased transaction rate of 2,753 TPS. 59

Chapter 6: Test and Validation Recommendations OLTP application migration to EFDs OLTP application migration testing indicates that customers should: Use EFDs to increase IOPS by consolidating data with a smaller footprint. Use EFDs for read-intensive database partitions, but not for all database partitions. For example, excellent performance can be achieved by placing less utilized database partitions on less expensive media, while reserving EFDs for heavily utilized partitions. Analyze OLTP workload activity to identify the partitions that would benefit from using EFDs the most. 60

Chapter 6: Test and Validation Section D: Storage tiering test results summary and recommendations Overview Introduction The following sections detail the effect of moving the application database files to EFDs, FC and SATA drives. Contents Topic See Page Summary of test results 62 Recommendations 65 61

Chapter 6: Test and Validation Summary of test results Performance results CPU utilization The following chart details CPU utilization after storage tiering is introduced into the test environment. The same user load applied to baseline performance testing is run against the relocated application database, where the CPU utilization percentage increased from 73 percent to 81 percent. 62

Chapter 6: Test and Validation IOPS and latency The following table compares the IOPS and disk latency before storage tiering is implemented in the test environment, as compared to IOPS after storage tiering. Before Tiering After Tiering LUN IOPS Latency Storage type IOPS Latency Storage type Broker\B0 1 769.9 9 ms FC 928.1 2 ms Flash Drive Broker\B1 1 1312.6 5 ms FC 1596.8 3 ms Flash Drive Broker\B2 1319.3 4 ms FC 1512.7 3 ms Flash Drive Broker\B3 1314.7 4 ms FC 1600.3 3 ms Flash Drive Broker\B4 1 1314.7 4 ms FC 1404.1 2 ms Flash Drive Broker\B5 1320.2 5 ms FC 1515.4 3 ms Flash Drive Broker\B6 1313.3 5 ms FC 1497.7 2 ms Flash Drive Broker\B7 1318.5 5 ms FC 1499.7 2 ms Flash Drive Broker\B8 1320.4 4 ms FC 1494.3 2 ms Flash Drive Broker\B9 1322.3 5 ms FC 1600.1 4 ms Flash Drive Broker\B10 1463.1 5 ms FC 1644.2 4 ms Flash Drive Customer\C0 2 311.8 6 ms FC 299.1 5 ms FC Customer\C1 51.9 5 ms FC 52.3 5 ms SATA Customer\C10 2 184.7 8 ms FC 204 6 ms FC Customer\C2 54.7 5 ms FC 46.5 5 ms SATA Customer\C3 55.5 5 ms FC 47.1 5 ms SATA Customer\C4 53.1 5 ms FC 58.8 6 ms SATA Customer\C5 57.5 5 ms FC 59.6 6 ms SATA Customer\C6 52.0 5 ms FC 44.3 5 ms SATA Customer\C7 58.0 5 ms FC 53.8 5 ms SATA Customer\C8 51.8 5 ms FC 46.5 5 ms SATA Customer\C9 58.5 5 ms FC 59.8 5 ms SATA MKT DB 90.8 1 ms FC 93.1 2 ms SATA Total IOPS: 15169.3 17358.3 1 Note the improvement in latency by moving these LUNs to EFDs. 2 Note the improvement in latency on FC drives after the busier LUNs are moved to EFDs. Increased performance for the remaining LUNs is observed (as there is less demand on the FC drives). 63

Chapter 6: Test and Validation Performance results number of physical disks reduced Solution testing validated that the number of physical disks is reduced after storage tiering is implemented. The original 128 FC drive count is now reduced to 56 drives, as follows: 8 EFDs 16 FC drives 32 SATA drives Storage tiering consolidated resources significantly. Additionally, the migration to EFDs revealed a marked increase in performance levels with an improved transaction rate of 2,605 TPS. This represents an increase of 19 percent (425 TPS). 64

Chapter 6: Test and Validation Recommendations Storage tiering summary Storage tiering testing indicates that customers should: Understand utilization patterns prior to introducing storage tiering into their environment. Use EFDs to support the busiest database partitions. Use SATA drives to support the less frequently accessed partitions. Leverage the available storage tiers available to achieve higher performance levels versus using a single FC tier. 65

Chapter 6: Test and Validation Section E: Replication Manager test results summary and recommendations Introduction The following sections detail the effect of using Replication Manager to maintain storage replicas of the OLTP application database providing recovery points in case of site failure or data corruption. Contents Topic See Page Summary of test results 67 Recommendations 68 66

Chapter 6: Test and Validation Summary of test results 8-hour operational test cycle Testing was performed to validate a full 8-hour cycle that consisted of: User activity (typical user load, accessing and updating databases) An hourly point-in-time snapshot of the database Regular database maintenance that: Reorganizes the index Updates statistics Note After the 8-hour test cycle, a full copy clone of the database is taken and mounted to an alternate host. Impact on daily application activity is monitored The 8-hour operational test is monitored during the Replication Manager cycle. Minimal impact to the SQL Server application performance is observed during this timeframe. Replication Manager job performance results Replication Manager job performance test results presented in the table below indicate that replicas are created and recovered within a two hour window: Item SQL Server data set size Average time for Replication Manager Snap job Average time for Replication Manager Clone job SQL recovery from a snapshot job SQL recovery from a clone job Result 1.7 TB 10 min 65 min 12 min to restore the data, and replay the logs 1 hour, 55 min to restore the data, replay the logs 67

Chapter 6: Test and Validation TimeFinder/Snap and TimeFinder/Clone performance The following chart represents the impact of TimeFinder/snapshots on a LUN s response time during snapshot activity. The greatest impact to performance occurs during: Snapshot activation Snapshot termination This impact is significant because the only impact to performance occurs during activation of a snapshot, or termination of a snapshot. While the snapshot is active, no impact to performance is observed. Recommendations Replication Manager summary Based on observations, replication testing indicates that: Recovery point snapshots can be taken at regular intervals without a great impact on the database performance. A clone is necessary to have a full, independent copy of the database in the event of loss of the source device. A transaction log backup with the truncate option must be performed after the clone/snap. 68