Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert
Introduction With virtualization systems becoming mainstream in recent years, backups have evolved and new challenges have emerged as data is being generated. One of the most common, yet complex challenges is that of protecting data outside the datacenter, over hundreds of kilometers, while keeping or increasing retention rate. With RTOs and RPOs being constantly compressed because of the growing importance of IT within the enterprise, it is essential to have a simple, reliable and cost-effective solution. This white paper presents a concrete, real-life scenario of protecting Veeam backup data over hundreds of kilometers. Objectives and challenges in context A company that has virtualized 95% of its servers during past years has two production sites that are hundreds of kilometers apart and connected through a 40-Mbit/s MPLS network. Its business requires a recovery plan to respond to a major issue, such as a terrorist attack, industrial accident, natural disaster, etc. Such a plan necessitates being able to reboot the whole system from a considerable distance in a timely manner. The RTO for this type of lowprobability situation is set to 24 hours and the RPO should be less than 48 hours. The main challenge is to have a reliable copy of the backup data of the virtualized system from a 24- to 48-hour interval, in order to restart essential virtual machines, such as Microsoft infrastructure servers, email system and business applications. To provide hardware resources to run those virtual machines, the company installed the production system on one site and the test and development systems on another site. Thus, in case of a major crisis, the second site would be used as the backup site. The second challenge, parallel to securing the data over a long distance, is to preserve or enhance the retention rate. Choosing the tools The backup tool It was natural to choose Veeam Backup & Replication as the main backup tool for vsphere virtualized environments considering what it offers for the price in terms of performance, features and reliability. Indeed, its reliability in association with innovative and strategic features such as Instant Recovery or SureBackup make it the ideal tool to meet the requirements mentioned above. The storage system While Veeam Backup & Replication has many advanced features in terms of distributed backup architecture, deduplication and replication, it does not yet allow replication of backup sets. That s why Veeam has associated Backup & Replication with the HP StoreOnce D2D array, a storage system that offers optimized deduplication and replication features. 2
The D2D arrays offer NAS storage (CIFS or NFS) and VTL (iscsi or FC) powered by an inline deduplication engine working on 4-KB blocks, leading to a high deduplication rate (note that Veeam deduplicates data with a minimum block size of 256 KB). Unique blocks are then compressed to optimize storage needs. These arrays also provide an asynchronous replication feature based on the transfer of the compressed blocks from the deduplication process, thus guaranteeing a minimal amount of data being exchanged. D2D arrays are thus meeting the two main requirements: Enhanced retention, thanks to high deduplication rate Replication efficiency using minimal bandwidth The big picture Figure 1. Schematic of the production site (A) and the backup site (B) Each site has its own D2D array and at least one Veeam Backup & Replication server. Site A is the production site and Site B is the backup site. 3
Veeam Backup & Replication configuration Take advantage of a distributed architecture Veeam Backup & Replication relies on a distributed architecture based on three roles: Veeam server: manages backup tasks and executes the configuration graphical interface Veeam Proxy: handles backup tasks by reading the data from the production storage, through Ethernet, SAN or disk hot-add. This role can be deployed on multiple servers Veeam Repository: handles the storage of the backup data coming from the Proxy. This role can be deployed on multiple servers Adding more proxies allows backup tasks to be parallelized and thus optimizes the backup timeframe. The present setup uses two Veeam servers, the first having all roles and the second acting as both proxy and repository. Both servers are deployed as virtual machines and are thus taking advantage of the hot-add mode. The following schematic summarizes the whole setup: Figure 2. Network configuration of the Veeam infrastructure Backup mode Veeam Backup & Replication offers different backup modes, each optimized for specific needs and storage devices. In fact, when using a D2D array, you have to avoid any backup mode that creates a synthetic full backup. Indeed, this type of array is not optimized for random reads and in-place writes. The use of such a mechanism will result in degraded performance and extremely slow backups, often exceeding the duration of an active full backup. 4
In addition, the retention time of reverse incremental backups would be affected by the maximal number of simultaneously opened files on a share (discussed later in this document). The solution, in this context, is to use the forward incremental backup mode with weekly active full backup. Full backups are distributed on different days of the week to maximize performance and minimize the impact on the production storage. Note that the D2D deduplicates all the data within a NAS share, so each additional active full backup file (.vbk) consumes the same space as a delta file, making it possible to store many active full backups. Figure 3. Operation in forward incremental backup mode Figure 4. Settings for incremental backup with weekly active full backup Advanced parameters It is imperative to send uncompressed data to ensure functionality and efficiency of HP s deduplication and to guarantee the detection of similar blocks. Therefore, you should disable compression on all backup jobs targeting the D2D storage. Similarly, you can achieve improved sequential reading speed (restore tasks, offloading to tape) by disabling the internal deduplication mechanisms of Veeam Backup & Replication. 5
Figure 5. Storage settings to disable compression Finally, Veeam now offers two more optimizations in the repository role: Backup file data block alignment Data decompression before writing to the target storage. This helps reduce data moved between roles, thus optimizing network utilization without losing target storage deduplication rate. Figure 6. Deduplication storage optimization settings Specific case of a mixed gigabit/10-gigabit Ethernet network In the specific case where the entire network operates at 10 gigabit, including hypervisors, and the only gigabit device is the D2D array, you should take measures to avoid underperformance when writing to the share. Note that this is not such an uncommon scenario. Indeed, the last equipment in the chain that makes the connection between the two types of systems should be able to manage the speed reduction. The switch buffers are often too small to support this kind of situation, which in turn leads to quality problems when the TCP resiliency mechanisms come into play. Similarly, if the switch is active and is equipped with FlowControl technology, pause frames are sent to the Veeam server (the sender), thus slowing down its transmission rate and heavily penalizing throughput. 6
Take advantage of the bandwidth limitation feature in the VMware vswitch to ensure that maximum out speed is 1 Gb/s. Simply create port groups dedicated to Veeam VMs and apply bandwidth limitation to them. This solution has proven its efficiency in production by ensuring a constant maximum throughput during writes. In addition, the configuration of the load balancing between the physical interfaces in the port group allows you to be sure that each virtual machine interface uses a dedicated physical interface, ensuring that Veeam servers have the ability to write at 2 x 1 Gb/s. A quick schematic to summarize the solution: Figure 7. Bandwidth limitation ensures maximum throughput during writes Configuration of the HP D2D array Achieving the best performance HP D2D arrays are used through CIFS file protocol using two network interfaces that can be configured in different modes: load balancing (LACP), active/passive (failover) or standalone. In the scenario here, each interface is configured in standalone mode, meaning that each has its own IP address from a dedicated subnet. This choice was motivated by the desire to maximize the throughput of each D2D array, because the performance is better when you parallelize streams. For example, a D2D 4112FC will provide a maximum write throughput of 360 MB/s, whereas a single stream would not exceed 80 MB/s. This model has two gigabit interfaces; it is clear that the active/passive configuration does not provide the maximum performance. As for load sharing, it is unlikely that it really takes advantage of both interfaces because of the inefficiency of LACP when there are only one or two source machines (Veeam server). Thus, the two interfaces are set up in two different networks, so that CIFS shares can be accessed simultaneously (multiple jobs running in parallel), guaranteeing that the two physical interfaces are used. This model theoretically gives you the ability to write at 250 MB/s by configuring four jobs to run simultaneously. 7
Of course, this configuration implies that the Veeam servers with the Repository role have a configured network interface in each IP network that is connected to the D2D array. CIFS shares configuration When working with D2D arrays, keep in mind the following: All data stored in a share use the same deduplication index. There is no deduplication between shares. Files under 24 MB are not deduplicated. There is a strict limit on how many deduplicated files can be opened simultaneously. The number varies depending on the model, but it is the point requiring the most attention when choosing the number of shares. Keep in mind that retention in Veeam Backup & Replication is based on increments, so that when restoring something, all increments between the selected restore point and the last full backup will be opened simultaneously. Replication is configured at the share level. Replication configuration The D2D arrays allow replication based on a schedule in which you define when it should and should not replicate. Similarly, it is possible to define a time schedule for bandwidth management. In the context of this scenario, the range of replication is defined to be from 3:30 am 19:00 pm, with the rest of the time being the backup window. Two bandwidth limitations have been set up to reduce the time needed for replication without having too much of an impact on the MPLS network. The replication bandwidth is limited to 35 Mb/s until 7:00 am and thereafter 28 Mb/s. With two simultaneous replication streams, all the backups (290 GB of delta files per day) are synchronized to the backup site in less than four hours. The backup site To avoid any technical dependency with the production site, the backup site has an autonomous fallback instance of Veeam Backup & Replication, which is able to access the local D2D shares. Those shares are also preconfigured as a Veeam repository on a local instance to save time in case of a crisis. Therefore, a simple rescan of the repository and the last replicated restore points will show up. It is important to note that D2D arrays are not optimized for random reads, so do not attempt to restart your entire system with vpower, NFS and Instant Recovery. Taking into account the maximum number of simultaneously opened files by share, it is recommended that you dedicate Instant Recovery to basic infrastructure services (Active Directory, DHCP, Proxy, SMTP relay) and restore to a local storage the I/O intensive VMs (email, databases, filers). Your RTO will be determined by the size of data restored locally. 8
Real-world numbers High write throughput on a single stream : Figure 8. Write throughput of a backup flow Interesting deduplication rates : Figure 9. Deduplication storage ratio compared to standard disk Figure 10. Deduplication ratio for a network share 9
Good replication windows: Figure 11. Replication optimization performance The duration is 1hr 26 min to replicate a VIB file of 99.5 GB, equivalent to 153 Mbit/s, while the bandwidth actually used is 21.2 Mbit/s. Conclusion The combination of these two powerful tools allows you to meet the most demanding requirements in terms of backup data protection over a long distance, while having a reasonable financial impact. The scalability of the system is linear considering that HP offers various models that can respond to any size of structure. However, before starting such a project, storage and bandwidth needs should be evaluated precisely. As a rule of thumb, the storage sizing should be based on the size of a full backup (with no compression or deduplication) plus 30%. The required bandwidth will be defined according to the desired replication window, taking into account that only 15% of daily changes will be transferred. Thus, to replicate a 10-TB system that is modified 10% per day to the backup site within an 8-hour window, you need the following: A D2D array with 13 TB (10 TB + 30%) A bandwidth of 42 Mbit/s (10 TB x 10% x 15% = 150 GB/day) 10
About Author Eric Machabert is an IT infrastructure consultant and wellknown virtualization expert. His experience with many critical environments in France and internationally make him a recognized market expert. Currently as a consultant, he teaches higher education courses in schools and business organizations. About Veeam Software Veeam Software develops innovative solutions for VMware backup, Hyper-V backup, and virtualization management. Veeam Backup & Replication is the #1 VM Backup solution. Veeam ONE is a single solution for real time monitoring, resource optimization, documentation and management reporting for VMware and Hyper-V. Veeam extends deep VMware monitoring to Microsoft System Center with Veeam Management Pack (MP), and to HP Operations Manager with Veeam Smart Plug-In (SPI). Veeam also provides free virtualization tools. Learn more by visiting www.veeam.com. 11
Modern Data Protection Built for Virtualization Powerful Easy-to-Use Veeam Backup & Replication #1 VM Backup for VMware and Hyper-V Virtualization changes everything especially backup. If you ve virtualized on VMware or Hyper-V, now is the time to move up to the data protection solution Built for Virtualization: Veeam Backup & Replication. Unlike traditional backup that suffers from the 3C problem (missing capabilities, complexity and cost), Veeam is: Powerful: Restore an entire virtual machine (VM) or an individual file, email or database record in 2 minutes Easy-to-Use: It just works! Affordable: No agents to license or maintain, works with your existing storage, and includes deduplication, VM replication, Microsoft Exchange recovery, and more! Join the 58,000 organizations who have already modernized their data protection with Veeam. Download Veeam Backup & Replication today! GOLD AWARD NEW TECHNOLOGY GOLD AWARD NEW TECHNOLOGY To learn more, visit http://www.veeam.com/backup 12