EMC AVAMAR FOR HIGHLY SCALED NETWORK-ATTACHED STORAGE (NAS) ENVIRONMENTS

Similar documents
Technology Insight Series

EMC Backup and Recovery for Microsoft SQL Server

Virtualizing SQL Server 2008 Using EMC VNX Series and VMware vsphere 4.1. Reference Architecture

EMC Integrated Infrastructure for VMware. Business Continuity

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

A Thorough Introduction to 64-Bit Aggregates

EMC Business Continuity for Microsoft Applications

A Thorough Introduction to 64-Bit Aggregates

Microsoft Office SharePoint Server 2007

DATA PROTECTION IN A ROBO ENVIRONMENT

Video Surveillance EMC Storage with Godrej IQ Vision Ultimate

EMC Celerra Replicator V2 with Silver Peak WAN Optimization

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Achieving Storage Efficiency through EMC Celerra Data Deduplication

EMC STORAGE FOR MILESTONE XPROTECT CORPORATE

A. Deduplication rate is less than expected, accounting for the remaining GSAN capacity

EMC Celerra Virtual Provisioned Storage

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

EMC CLARiiON Backup Storage Solutions

Surveillance Dell EMC Storage with Milestone XProtect Corporate

EMC Celerra CNS with CLARiiON Storage

INTEGRATED INFRASTRUCTURE FOR VIRTUAL DESKTOPS ENABLED BY EMC VNXE3300, VMWARE VSPHERE 4.1, AND VMWARE VIEW 4.5

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

Thinking Different: Simple, Efficient, Affordable, Unified Storage

Mostafa Magdy Senior Technology Consultant Saudi Arabia. Copyright 2011 EMC Corporation. All rights reserved.

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

Clustered Data ONTAP Administration and Data Protection

KillTest. 半年免费更新服务

Protect enterprise data, achieve long-term data retention

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

EMC DATA DOMAIN PRODUCT OvERvIEW

Data ONTAP 7-Mode Administration (D7ADM)

EMC CLARiiON CX3-80. Enterprise Solutions for Microsoft SQL Server 2005

EMC Backup and Recovery for Microsoft Exchange 2007

EMC Backup and Recovery for Microsoft Exchange 2007 SP1. Enabled by EMC CLARiiON CX4-120, Replication Manager, and VMware ESX Server 3.

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

EMC XTREMCACHE ACCELERATES ORACLE

EMC Disk Library Automated Tape Caching Feature

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

EMC Celerra NS20. EMC Solutions for Microsoft Exchange Reference Architecture

Data ONTAP 7-Mode Administration Course D7ADM; 5 Days, Instructor-led

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

EMC Virtual Infrastructure for Microsoft Exchange 2010 Enabled by EMC Symmetrix VMAX, VMware vsphere 4, and Replication Manager

The World s Fastest Backup Systems

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

64-Bit Aggregates. Overview and Best Practices. Abstract. Data Classification. Technical Report. GV Govindasamy, NetApp April 2015 TR-3978

EMC DATA PROTECTION FOR VMWARE WINNING IN THE REAL WORLD

DELL EMC DATA PROTECTION FOR VMWARE WINNING IN THE REAL WORLD

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

WHY SECURE MULTI-TENANCY WITH DATA DOMAIN SYSTEMS?

EMC DATA DOMAIN OPERATING SYSTEM

BACKUP AND RECOVERY FOR ORACLE DATABASE 11g WITH EMC DEDUPLICATION A Detailed Review

Using EMC FAST with SAP on EMC Unified Storage

EMC Business Continuity for Microsoft SharePoint Server (MOSS 2007)

EMC BACKUP AND RECOVERY PRODUCT OVERVIEW

PracticeDump. Free Practice Dumps - Unlimited Free Access of practice exam

Cisco SAN Analytics and SAN Telemetry Streaming

Accelerate the Journey to 100% Virtualization with EMC Backup and Recovery. Copyright 2010 EMC Corporation. All rights reserved.

Technical Note P/N REV A01 March 29, 2007

EMC VSPEX END-USER COMPUTING

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

INTRODUCING VNX SERIES February 2011

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

The 5 Keys to Virtual Backup Excellence

Backup and Recovery Best Practices With Tintri VMstore

EMC VIPR SRM: VAPP BACKUP AND RESTORE USING VMWARE VSPHERE DATA PROTECTION ADVANCED

VMware vsphere 5.0 STORAGE-CENTRIC FEATURES AND INTEGRATION WITH EMC VNX PLATFORMS

StorageCraft OneXafe and Veeam 9.5

Surveillance Dell EMC Storage in Physical Security Solutions with Axis NAS-Attached Cameras

FlexArray Virtualization

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

HYCU and ExaGrid Hyper-converged Backup for Nutanix

Catalogic DPX TM 4.3. ECX 2.0 Best Practices for Deployment and Cataloging

Scale-out Data Deduplication Architecture

Isilon OneFS. Version Built-In Migration Tools Guide

Deduplication has been around for several

EMC VNX2 Deduplication and Compression

Optimizing and Managing File Storage in Windows Environments

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

EMC VSPEX END-USER COMPUTING

Virtualizing Microsoft Exchange Server 2010 with NetApp and VMware

Midsize Enterprise Solutions Selling Guide. Sell NetApp s midsize enterprise solutions and take your business and your customers further, faster

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Dell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain

Surveillance Dell EMC Storage with Milestone XProtect Corporate

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Video Surveillance EMC Storage with Digifort Enterprise

EMC Performance Optimization for VMware Enabled by EMC PowerPath/VE

EMC VSPEX FOR VIRTUALIZED MICROSOFT SQL SERVER 2012

NetVault Backup Client and Server Sizing Guide 2.1

Global Headquarters: 5 Speen Street Framingham, MA USA P F

EMC VSPEX FOR VIRTUALIZED MICROSOFT SQL SERVER 2012 WITH MICROSOFT HYPER-V

Maximizing Data Efficiency: Benefits of Global Deduplication

Surveillance Dell EMC Storage with Synectics Digital Recording System

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

Surveillance Dell EMC Storage with Aimetis Symphony

Transcription:

White Paper EMC AVAMAR FOR HIGHLY SCALED NETWORK-ATTACHED STORAGE (NAS) ENVIRONMENTS Application and best practices of the Avamar NDMP Accelerator for NAS backup and recovery Abstract This white paper outlines configuration and sizing examples, growth management, and reporting of the EMC Avamar NDMP Accelerator in an environment with many terabytes of networkattached storage (NAS) capacity, and many millions of files provisioned from multiple NAS systems. The uniquely efficient approach to backup and restore makes Avamar one of the best methods for protecting highly scaled NAS environments in the industry. April 2011

Copyright 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Part Number h8237 2

Table of Contents Executive summary... 4 Intended audience... 4 NDMP backup solution... 4 Suitability of the Avamar NDMP Accelerator for highly scaled-out NAS environments... 5 Capacity planning and growth management... 6 Native NAS system deduplication and Avamar NDMP backups... 7 Expanding stream counts... 8 Expanding to find total effective protected data... 9 Expanding the configuration... 10 Scenario... 11 Observations from the example... 11 Solution for the example... 12 Notes and considerations for the solution... 14 Points of measurement and capacity planning... 15 Conclusion... 17 References... 17 Appendix... 18 Gathering data... 18 Additional calculations for sizing... 19 Additional useful logs for NDMP backups... 20 3

Executive summary The EMC Avamar solution for NAS backup and recovery utilizes the innovative Avamar NDMP Accelerator to deliver fast, daily full backups and one-step recovery. And unlike other solutions in the industry, Avamar stores NAS backup data on resilient, enterprise-class systems for extended retention and simple, granular-level restore. This white paper describes the architecture, processes, performance and growth expectations, best practices in reporting, and fine-tuning suggestions for use in large NAS environments. The reader should already understand that the Avamar NDMP Accelerator is well suited for very fast, scalable backups and restores of NAS systems. Beyond the business reasons associated with deployment, this paper will also describe why this solution can provide a much more reliable and scalable approach to NAS backups when compared to traditional backup-to-tape, snapshot retention and replication, or other methods. Intended audience This white paper is intended for experienced Avamar administrators who either already have NDMP Accelerators deployed or intend to deploy them soon. A clear understanding of NAS NDMP backups and the challenges associated with legacy, scaled-out NAS backups is not addressed but should be understood as a foundation prior to reading this paper. EMC Avamar represents an innovative, contemporary approach to NAS system backups that supports the high degree to which companies are growing their NAS systems. This paper provides best practices gathered by EMC BRS Integration Labs, and from the empirical data and feedback provided by valued EMC customers who are already leveraging the NDMP Accelerator to qualify size, configure, tune, manage, and grow an enterprise NAS system backup solution. NDMP backup solution Given performance results outlined in the best practices white paper EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP, we will discuss scaled-out configuration, sizing, and management best practices, beyond those outlined in the EMC Avamar System Administration Guide. 4

Figure 1 depicts a large NAS environment with a configuration of multiple EMC Avamar NDMP Accelerators that service backups from the NAS systems to multiple Avamar grids. Figure 1. Multiple NAS systems with multiple NDMP Accelerators Notice in the figure that multiple NAS devices are depicted on the far left, with gigabit Ethernet connections into the EMC Avamar NDMP Accelerators. And as depicted, fewer NDMP Accelerators may be needed to service a given number of NAS systems, or vice versa. The relationship is not 1:1. Data is deduplicated, usually to a range of 96 percent to 99.9 percent at the NDMP Accelerator, depending on the data type and change rate being serviced for backups. Since data is so intensely deduplicated, lower bandwidth requirements exist between the NDMP Accelerators and the final destination Avamar grid, as shown in Figure 1 where a wide area network can be leveraged. This makes the NDMP Accelerator ideal for deployment in remote offices or secondary data centers where Avamar grids are not located. Suitability of the Avamar NDMP Accelerator for highly scaled-out NAS environments EMC Avamar supports NetApp Data OnTap and the EMC Celerra, VNX TM DART, and VNXe TM operating systems. Please refer to the latest interoperability matrix, which is posted on the EMC Powerlink website, for the most updated information. NAS systems vary in CPU, memory, I/O connectivity, and cluster configurations. There are many different configurations supporting a number of protocols including iscsi, Fibre Channel SAN, FCoE, NFS, Common Internet File System (CIFS), FTP, and HTTP. Both file- and block-based storage can be serviced from these various protocols. It is 5

important to note that for NetApp, the Avamar NDMP Accelerator can be used for any storage type (block or file) serviced by any of the protocols. However, file-level granularity of the backups and restores is only possible with NAS or file-based data. When LUNs are backed up and restored, the entire LUN is captured. Files within the LUN cannot be identified by using NDMP. For LUN backups, it is best to install the client Avamar software on to the respective client to where the LUN is provisioned. With EMC Celerra or VNX, only file-based data serviced via NFS and CIFS protocols is supported with NDMP backups. File-level granularity for restores is fully supported. Since the NAS systems can be so diverse in configuration and size, there is no clear answer or definitive method for sizing backups given a specific model of NAS system and related capacity. Therefore, information gathering from a number of sources and possibly some exploratory baseline testing is the best way to ensure proper sizing with the lowest risk of mistakes. An Avamar NDMP backup solution can be applied to almost any EMC Celerra or VNX system or NetApp FAS system that supports NDMP, regardless of the capacity. There are already many very large companies with large amounts of capacity serviced by NAS systems that are backed up by Avamar today. However, the variables identified in the following section and example will lead one to identify the backup window and connectivity of the NDMP Accelerators provided. Capacity planning and growth management When planning for Avamar growth, standard reports can be generated from the usual places within Avamar such as the Enterprise Manager, the Avamar administrative console, and the MCCLI command set. EMC Data Protection Advisor (DPA) can be leveraged for mining report data for historical purposes and cross-referencing and/or correlating data points with other elements of the data storage infrastructure. A major advantage to leveraging DPA is that NAS system data can also be collected for a variety of uses, including: To correlate performance trends to backup window duration To identify performance bottlenecks To correlate job failures to specific elements within a solution, such as: Network ports System CPU and memory consumption Snapshot activity Snapshot space Avamar system maintenance DPA was developed to maintain data for historical reporting and trend analysis. Note that Avamar s purpose is to maintain high-performance backups and replication of data, not to maintain reports or logs for long periods of time. One will find only the 6

most basic client, system, job and deduplication trending reports in the Avamar console. NAS systems tend to be landing areas for many tiers of data that can grow uncontrollably beyond expectations, which can be due to: Policies that allow autogrowth of underlying filesystems or volumes Lack of leveraging quotas Mass data migrations such as the movement of all departmental data shares to centralized NAS systems. Tools to migrate older files off to long-term media storage, such as EMC File Management Appliance, can be leveraged to help manage over-retention of data files. Since absolute resource limitations and rupture points were not found in our testing results from small test samples, the results must be extrapolated to find the maximum number of files and capacity that can be backed up by a single NDMP Accelerator. This is a plausible exercise since the maximum number of streams serviced by the NDMP Accelerator used in our testing did not overly tax the unit beyond its available resources. Therefore, some amount of extrapolation can be done since the Avamar NDMP Accelerator is capable of supporting eight parallel streams, with round-robin load balancing across available filesystems or volumes to be backed up. Native NAS system deduplication and Avamar NDMP backups Both the EMC Celerra/VNX and NetApp FAS systems can perform a level of native filelevel deduplication and compression. These processes incur overhead to the NAS systems, so backup schedules should not interfere with native NAS system deduplication schedules or policies. Be mindful of the target capacity. The restoration of deduplicated files may consume more than the allocated space. Both the EMC Celerra/VNX and NetApp FAS systems can perform a level of native deduplication. As discussed earlier, these are processes that incur overhead to the NAS systems, so backup schedules should not interfere with native NAS system deduplication schedules. Flexibility in DART OS code for EMC Celerra and VNX allows rehydration (decompression) of files that had been previously compressed by Celerra data deduplication, meaning that NDMP backups will not send the compressed files to the Avamar NDMP Accelerator. This is good for the NDMP Accelerator since compressed files do not deduplicate well. However, data is sent to the NDMP Accelerator in an uncompressed state, which may result in slower backups for the initial (level 0) backup. To enable this, set the switch dedupe.backupdatathreshold=0, which will rehydrate the deduplicated files prior to sending. This switch needs to set per file system. Upon restore it will recompress the data. The following is an example to configure this. To set the backup data threshold for VNX File Deduplication and Compression on a Celerra Data Mover or VNX blade, use this command syntax: 7

$ fs_dedupe -default -set {<movername> -all} -backup_data_threshold <percent> <movername> = name of the Data Mover <percent> = full percentage that a deduplicated file has to be below in order to trigger space-reduced backups for NDMP For example, when set to 90, any deduplicated file whose physical size (compressed file plus changed blocks) is greater than 90 percent of the logical size of the file will have the entire file data backed up without attempting to back it up in a spacereduced format. Any deduplicated file whose physical size is less than 90 percent of the logical file size will be backed up in a space-reduced format. Setting this value to 0 disables a space-reduced backup. The range of values is 0 to 200, and the default value is 90 percent. Future developments to the VNX Operating Environment (the successor to DART version 6) include improvements for servicing incremental-forever methodology backups that relieve a large amount of resource consumption on the NDMP Accelerator. For this, we expect increases in throughput performance (not part of the tests outlined in this paper) when customers upgrade to future releases of VNX OS/DART 7 and Avamar NDMP Accelerator code. Additional performance and process improvements for handling the natively deduplicated files will be included in future releases of the Avamar NDMP client code. Expanding stream counts During the baseline laboratory testing outlined in the white paper EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP, memory was not overly taxed during the testing. No more than 7 GB of memory was observed to be consumed during backups of the test sample and four streams were utilized. Memory resources were quickly released once the backup jobs completed. Therefore, an accelerator with 36 GB of memory should not develop a bottleneck from the lack of memory. It is safe to assume that a 36 GB accelerator node could handle more streams than what was identified in this study. The overconsumption of memory, however, has dramatic performance implications by causing a system to begin swapping memory pages to disk swap space. This situation severely impairs performance, and should be constantly monitored if one has either: Exceeded the recommended number of streams Added more accounts to the NDMP Accelerator that would incur additional streams (avtar processes) In that same study, CPU utilization did reach peaks of 70 percent consumed (maximum of 100 percent as measured). Therefore, one can assume that 30 percent additional processing power is possible. It is more likely that an expanded NDMP Accelerator (by adding more streams) would run out of processing power before volatile memory resources. Applying the additional CPU overhead percentage to the conservative backup rate of 2.48 million files per hour in high file count environments, and 230 GB per hour in large file size environments, results in improved backup rates of 3.22 million files per hour and 299 GB per hour, 8

respectively. This assumption, however, has not yet been substantiated. The influence of the addition of streams on the backup jobs was not indicative of a linear relationship. As well, one must be fully leveraging all available streams to realize increased throughput. Cautionary Note: Although small-scale tests prove that NDMP Accelerator can process 4.96 million files per hour for four streams, that rate was observed for only a small dataset (~ 3.5 million files total) with 10 percent change. When the dataset was expanded to roughly 15 million total files, the aggregated backup rate reduced to only 3.1 million files per hour or four streams. This is likely due to the preparation time required by the Celerra to scan changed inodes and assemble for the dump stream. Therefore, NAS systems can be a limiting factor to backups of very dense filesystems. And, since NAS models vary in CPU, memory, and I/O capability (both disk and RAID types), it is nearly impossible to quantify throughput behavior for all possible combinations. Expanding to find total effective protected data Using the values found in the EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP white paper for high-capacity, relatively moderate file count environments with mixed file types, 150 to 164 GB per hour was identified for a single stream, and up to 230 GB per hour if all four streams are leveraged appropriately. Expanding that to the best practices window of 5 hours (plus 5 additional hours for replication for a total duration of 10 hours), with all four streams utilized, for a 10 percent change-rate environment, equates to a maximum protected capacity or effective backup of 11.5 TB per NDMP Accelerator. And, expanding the maximum file backup rate for four streams through a 5-hour backup window, for a 10 percent change-rate environment, equates to a maximum protected capacity or effective backup range of 124 million to 155 million total files (with all available streams utilized) per NDMP Accelerator. To summarize the findings for one NDMP Accelerator by using daily incremental (level 1) backups: System Number of streams Throughput (per hour) Backup window % Daily change rate Effective total protected data Sizing for capacity 4 230 GB 5 hours 10% 11.5 TB Sizing for high file count 4 3.1 M files 5 hours 10% 124 155 M files 9

Do not apply both of these values together when sizing, but instead one must identify the filesystems, data types, and distribution of data types among the filesystems for accurate throughput and backup window duration expectations. These foundational values can be changed based on backup window and replication needs, and subsequently used to expand the configuration for environments with many large NAS systems. An example is outlined in a following section. As well, note that the backup window and % daily change rates can be varied to increase or decrease the effective total amount of protected data. It is not expected that a typical NAS environment would experience such a high (10 percent) change rate, but care should be taken for those data types stored in the NAS system that may not deduplicate well, such as PST files, VMDK images, compressed files, and databases. See the Additional considerations for sizing section of the EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP white paper for an example to assist with sizing for these data types. Expanding the configuration NDMP Accelerators can be linearly added to the configuration of a NAS backup system or set of systems. However, the accelerators do not interoperate with any sort of data and/or configuration transfer between them. It is not necessary. The global deduplication process, which is the final step in the complete deduplication process, interacts with the Avamar target server or grid for inquiry of duplicate segment IDs. Since data segments and related hash IDs inside of the Avamar grid are not segregated, all data segments from all clients are compared to one another. Therefore, having multiple NDMP Accelerators will continue to provide outstanding deduplication, without the need for cross-accelerator communications. Data can also be restored through an alternate account, such as an alternate NDMP Accelerator, given that the same plug-in type (Celerra or NetApp) is leveraged. Requiring no advanced configuration or clustering, the NDMP Accelerators are easily added into an environment to support very large NAS environments. 10

Scenario The BlueBell Company has 18 NAS systems of varying size, vendors (EMC, NetApp and Microsoft WSS Storage Server), models, OS levels, capacity, and connectivity. A thorough collection of information as outlined in the white paper EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP in the section, Proper sizing guidelines, produced the data listed in the following table: System Capacity (TB) Number of filesystems Number of files Dept share Database PST 10 EMC Celerra or VNX 130 45 28 MM 80% 20% 4 NetApp FAS 46 32 12 MM 90% 10% 4 Microsoft WSS 5 39 32 MM 100% Types of data included simple departmental shares, VMDK images, PST files, and databases, as outlined in the table. Change rate of the data equals 3 percent average. Model of the NAS system is all varied, but recently refreshed. All systems are currently backing up either by using NDMP dump, local tape, or proxy backup over network. NAS operating system versions are the most recent for each type. The expected service-level agreement (SLA) for the backup window equals 7 hours for all data types. The customer already has five DS18 3.3 TB node Avamar grids, with varying available capacity. Immediately we can eliminate planning for the WSS Storage Servers, since the NDMP protocol is not leveraged to back up these systems. Instead, the Avamar client is installed locally on the WSS Storage Server for backup and restore. The Avamar client for WSS Storage Servers is intelligent enough to identify single-instance storage (SIS) pointers without reconstituting the entire dataset during backup. Therefore, this client software should be deployed not the NDMP Accelerator. Observations from the example The NDMP Accelerator can be used simultaneously by both the NetApp FAS and the EMC Celerra or VNX systems. Summarizing the collected data points for sizing purposes can be done. The 14 NAS systems, 156 TB, 77 filesystems, and 40 million files break down as follows: 10.6 TB of unstructured data (DB and PST) distributed over 6 filesystems 145.4 TB of flat file data distributed over a total of 71 filesystems 11

Majority of the 40 million files are scattered throughout the 71 filesystems of flat files (less than 1 percent of the total files are DB or PST) Change rate of 3 percent equates to the following data sent to the Avamar system: 4.36 TB changed files (flat files) are sent to NDMP Accelerator daily. This is 61.4 GB per filesystem. Although only a 3 percent change in DB and PST files is evident on the servers, the NAS system will likely identify a much greater change. Assume that NAS systems identify a much greater file-level change rate for this capacity. The use of a 50 percent rate equates to a 5.3 TB daily amount of data sent from the 6 filesystems, or 883 GB per filesystem. This 50 percent assumption should be adjusted once the NDMP backup job has been observed for these filesystems. If previous NDMP backup jobs were executed, then one should find this percentage as a relationship of the number of bytes protected to the number of bytes sent by using the dump or vbb command. Since the 40 million files are well distributed over the 71 filesystems, then 3 percent of 40 million files equates to 1.2 million files per day, or approximately 17,000 files per filesystem. If high file count was concentrated into one filesystem, then we may consider a separate policy for that one filesystem to run on a separate schedule. All NAS systems have supported NDMP backups in the past, and therefore will be able to sustain backups with Avamar as well. Standard sizing models (using an Alinean tool that is internal to EMC) can be generated for growth expectations to the existing Avamar grids. Solution for the example Each filesystem will receive a stream to execute the backup. Therefore, the BlueBell Company can be designed to use four streams per NDMP Accelerator since the models of NAS systems are mixed. Some filesystems may support eight streams; some may support less than four streams depending on NAS system model and existing system load. With four streams running, and each flat filesystem sending 17,000 files and approximately 62 GB each day, this equates to a total of 248 GB being sent simultaneously. Given the benchmark results noted previously, backups should finish in roughly 72 minutes when using the maximum duration of 230 GB per hour for every four filesystems. As stated in this example, the customer has 7 hours for backup, which equates to approximately 26 filesystems being completed every backup window. A total of 71 filesystems, with 26 filesystems per NDMP Accelerator, equates to roughly 2.7 NDMP Accelerator units. For the unstructured data, there are six filesystems sending roughly 900 GB of data per filesystem. At a 200 GB per hour scan rate, total time per file system equals 4.5 hours. For the 7-hour backup window defined by the customer, this allows 1.55 filesystems per backup window. But by splitting the window over 12

multiple accelerators, each with 7 hours available per backup window, we could state that 3.9 Accelerator units would be needed to meet the required backup window. Combining the Accelerator units above, the total number of NDMP Accelerators equals 2.7 (flat files) plus 3.9 (unstructured data), which equates to 7 NDMP Accelerators (rounded up). Figure 2. NAS systems backing up multiple filesystems and volumes (left) through multiple NDMP accelerators (center) to multiple Avamar Data Stores (right) Figure 2 outlines a simplified view of the designed solution. CIFS shares and NFS exports are provided to end-user systems by the NAS systems on the left. Avamar grids on the right provide backup services for standard clients and databases, as well as NDMP NAS backups. NDMP Accelerators, in the center, provide deduplication, backup, and restore services for data stored on the NAS systems. Centralized management is provided by the Avamar console (for consolidated policies) and Enterprise Manager (for multiple Avamar systems). 13

Notes and considerations for the solution Do not split filesystem backups over multiple NDMP Accelerators. Take care when planning unstructured data backups that will be intermixed with flat file data backups on the same NDMP Accelerator. The same type of plug-in (NetApp or EMC) must be used for an account. Multiple accounts can be configured on an accelerator, but care should be taken when leveraging them at the same time. Five Avamar grids exist at the customer site. To increase the global deduplication effect and speed of NAS system backups, it is best to identify the Avamar grids with the most amount of available capacity, as well as clients with potential commonality with data on the NAS systems. NDMP Accelerators can point to only one Avamar server, but an Avamar server can have multiple NDMP Accelerators backing up to it. EMC Technical Consultants should assist customers by leveraging the internal Alinean sizing tool to model the capacity and retention effect on the Avamar grids given the addition of the NAS system capacities. Additional NDMP Accelerators can be added as capacity and file count increases, or if the allowable backup window decreases. No correlation exists between the quantity of NAS systems and NDMP Accelerators. Although level 1 (differential incremental) backups are requested from the NAS systems, a considerable amount of data must still be sent (daily changes) to the NDMP Accelerator. For this reason, the NDMP Accelerators should be positioned close to the NAS systems, meaning preferably on the same network segment, with the fewest hops between the two endpoints. However, connectivity between the NDMP Accelerators and the Avamar server or grid does not need to be substantial. After processing the deduplication, the amount of data sent to the Avamar grid is very low. This makes the NDMP Accelerator suitable to be used for NAS systems that are located in remote offices with Avamar grids placed in central data centers. 14

Points of measurement and capacity planning Avamar supports a number of methods and tools to track capacity over time, as well as extrapolate growth to identify points in time when Avamar grid expansion is needed. Some of those native tools include: Manage Reports selection under the Avamar console Enterprise Manager The mccli command-line generated reports Reports provided to internal EMC employees and partners by the Avamar Avalanche call-home system (if configured). Sample screenshots of those reports are shown in the following figures. Figure 3. System Capacity as reported in the Avamar console, where Server Utilization is reported as a percentage of the GSAN utilized Figure 4. Monitoring capacity and generating linear forecasts in Enterprise Manager, under System > Capacity 15

Figure 5. Avalanche reports sent by EMC s internal Avamar call-home support system Note that capacity forecasts are linear extrapolations of the current behavior experienced by your Avamar system. It is very important to note that this trend should not be trusted if the system has not reached steady state. This state is the point when the longest retention period for the majority of data has been reached, and data starts to be expired and deleted. Growth rate for the Avamar system at that point is represented by the ratio of the number of new blocks coming into the system versus the number of old blocks expired and deleted. Expect this ratio or percentage to be much lower than when new clients were being added in the period prior to reaching the steady state. Identifying capacity trends allows one to plan on Avamar grid expansion. It does not, however, allow one to plan the expansion of the number of streams or number of NDMP Accelerators used for NAS system backups. For this purpose, one needs to be aware of the backup job durations with respect to the expected backup window. These durations can be easily identified in either the Job Activity Monitor in the Avamar console or DPN Summary Report. The Activity Monitor will delete (by default) jobs older than 72 hours but can provide an accurate depiction of the number of bytes transferred, elapsed time for a job, and time before a job was allowed to start. For historical trending, leverage the DPN Summary Report. In this report, the duration (seconds) can be used to determine the average amount of time a particular NDMP backup took to execute. Deduplication can be found by averaging the PcntCommon column for a given client or NDMP account over time. Note that individual filesystems within an account (as defined in the NDMP Accelerator) will not be identified in this report. Figure 6. Sample DPN Summary Report For accurate statistics at a filesystem level, check the specific job under the Activity Monitor. This report will depict the files backed up per hour, total bytes sent, and other statistics, all arranged sequentially in the log file as the job proceeded. 16

The Appendix provides more details on log file locations. Conclusion By using an example of a large NAS deployment, and a careful collection of input variables, we were able to provide an assessment for the total number of Avamar NDMP Accelerators needed for the environment. Based on the example and the solution described for NAS environments, the Avamar NDMP Accelerator is an ideal solution for efficient backups of very large NAS deployments. Centralized management, a variety of reporting mechanisms, and the assistance of EMC staff if needed can ensure that a large NAS backup solution that uses Avamar NDMP Accelerators can scale well into the future with high performance, resiliency, and availability. With Avamar NDMP Accelerators deployed for NAS system backups, decreased capital and operating expenditures can be realized through: Extensive Avamar deduplication processes that allow for decreased network consumption Fast, daily full backups that only require level 1 dumps, which reduce overhead to NAS systems A disk-based grid architecture of the target Avamar system with RAIN for high availability, providing peace of mind High-performance multistreaming embedded within the policy configurations that allows for greater attainment of SLAs These business practices, coupled with the unique technical approach to NAS system backup that leverages a well-developed protocol (NDMP), make the Avamar NDMP Accelerator an industry-proven, cost-effective, scalable solution. References The following documents can be found on EMC Powerlink (access required): EMC Avamar for Network-Attached Storage (NAS) Backups Using NDMP EMC Avamar for NAS Backups: An Overview and Business Case EMC Avamar Operational Best Practices Guide EMC Celerra Network Server Version 5.5 Command Reference Manual Configuring NDMP Backups on Celerra 17

The following documents can also be helpful: NetApp Data ONTAP 7.3 Data Protection Tape Backup and Recovery Guide Data Protection Strategies for Network Appliance Storage Systems Appendix Gathering data There are a number of useful commands that can be leveraged on a NetApp FAS and EMC Celerra to assist with information gathering. Gathering number of available kilobytes in Celerra, assuming server_2 is the Data Mover that you are sizing for backup: server_df server_2 Gathering number of available inodes or files in Celerra, assuming server_2 is the Data Mover that you are sizing for backup: server_df server_2 -inode Gathering number of available lilobytes in NetApp: df Gathering number of available inodes or files in NetApp: df i Note: Gather only the Used column of these outputs. As well, do not consider those Celerra filesystems that are checkpoints (snapshots). Additional performance data can be gathered on the NDMP Accelerator at the command line of that unit. Use the following commands for accurate information gathering: For CPU and memory consumption on NDMP Accelerator during a backup: iostat xtc 5 10 vmstat 5 10 Look at the User, Nice, and System columns to see consumed CPU. The %idle column is the inverse combination of these three. These commands are shown to run every 5 seconds for 10 iterations to produce an output as shown next. 18

Additional calculations for sizing The following are additional math calculations, as outlined in the Solution example section: Each filesystem sends 17,000 files and approximately 62 GB each day, which is a total of 248 GB sent simultaneously. That is, 3 percent of 40 million files equals 1.2 million files per day. A total of 1.2 million files divided by 71 filesystems equals 16,900, or approximately 17,000 files per filesystem. To find capacity sent, 3 percent of 145.4 TB equals 4.326 TB. A total of 71 filesystems equals 61.4 GB or approximately 62 GB (rounded up). A total of 62 GB multiplied by four simultaneous streams equals 248 GB being simultaneously sent and backed up. To find the number of accelerators needed, say the time to completion is 248 GB 230 GB/hour, which is roughly 72 minutes, for every 4 filesystems. Backup window = 7 hours, so (7 hr/window x 60 mins/hr) 72 minutes for every 4 filesystems = 5.833 completed cycles x 4 filesystems per cycle = ~26 filesystems completed every backup window. 71 filesystems total 26 filesystems per NDMP Accelerator = ~2.7 NDMP Accelerator units For the unstructured data, 6 filesystems together have 10.6 TB, with 50 percent NAS system identified changes per day. Therefore, 10.6 x 50% = 5.3 TB 6 filesystems = 900 GB per filesystem. Using a 200 GB/hr scan rate, 900 GB/filesystem 200 GB/hr = 4.5 hr/filesystem. For the 7-hour backup window defined by the customer, this allows 1.55 filesystems per backup window (7 hours 4.5 hr/filesystem = 1.55 filesystems). 6 filesystems 1.55 filesystems per NDMP Accelerator backup window = ~3.9 Accelerator units needed. 19

Note: The term Accelerator unit is not an actual unit of measure, but rather a percentage of total usage of accelerators. This value should be rounded up to the nearest integer when finding the total amount of accelerators needed. Using this methodology assumes that filesystem backups can be intermixed within the same account configuration, all pointing to the same Avamar grid. An account within a given NDMP Accelerator can only direct data to one target Avamar grid; however, multiple accounts can be created on a single NDMP Accelerator by using the avsetupndmp configuration menu. Additional useful logs for NDMP backups Log files that can be useful are located under the /usr/local/avamar/var/<account name>/ directory on the NDMP Accelerator. 20