A CW buyer s guide to VM backup & disaster recovery

Save this PDF as:

Size: px
Start display at page:

Download "A CW buyer s guide to VM backup & disaster recovery"


1 with in A CW buyer s guide to VM & recovery One benefit of is the substantial improvement virtual s (VMs) have brought to live migration, server recovery. In this 10-page buyer s guide, Computer Weekly looks at the products suppliers have brought to ; the competition for the playing out between Microsoft and ; how to some of the of virtual seamless data ; and asks, if cloud is good enough as a secondary for our critical applications now, why should it not become the standard primary platform of the future? These articles were originally published in the Computer Weekly ezine Thinkstock computerweekly.com buyer s guide 1

2 with in with in have seen many, reports Jens Soeldner Thinkstock Explore Microsoft Hyper-V Live Migration under the hood Key resources to prevent VM live migration failures Buyer s guide vm & recovery Live migration entails moving active virtual s (VMs) between physical hosts with no service interruption or downtime. It launched 11 years ago as a landmark development in datacentre infrastructure and is now a crucial part of infrastructure software and deployment. A VM live migration allows administrators to perform maintenance and resolve a problem on a host without affecting users. Moving active VMs from one hypervisor to another means you can balance the performance and load of hypervisors or, in the case of hardware maintenance, evacuate hypervisors from active VMs. It enables users to conserve resources during non-peak hours by moving VMs to fewer servers. You can also optimise network throughput by running VMs on the same hypervisor. When live migration of VMs appeared in 2003, with s ESX 2.0, it became popular in the IT community. computerweekly.com buyer s guide 2

3 with in Six years after pioneered VM live 2003, Microsoft introduced a similar feature in Hyper-V that was shipped with Windows Server 2008 R2 the previous version, Quick Migration, in Windows Server 2008 required a short service interruption during migration. Getting to grips with live migration To understand how live migration works, it is important to be aware of the VM s basic components: storage ( hard disk) and the configuration or state. Storage is often located on a storage area network (SAN) and its configuration runs in a host server s processor and memory. With the process of a live migration, the VM s state and configuration is copied from one physical host to another, but the VM s storage does not move. Storage live migration moving the disks of a VM from one location to another while the VM continues to run on the same physical host became available in 2006/2007 with ESX 3.0/3.5. s current offering, vsphere 5.5, vmotion (live migration of VMs) and Storage vmotion (live migration of disks) are part of the vsphere standard edition. Automatic load- Shared- balancing of VMs (distributed resource scheduling, or DRS) is available with vsphere Enterprise, automatic load balancing of disks (Storage DRS) and vsphere Enterprise Plus. Leveraging vmotion requires that ESXi servers are being managed by Virtual Center and that they are compatible (boiling down to compatible CPUs and a couple of minor requirements) with the same physical subnet. Moving VMs between hypervisors that are not on the same physical network segment is not supported. Administrators need to tag an existing port or create a new VMkernel port for vmotion usage and live migration to be used by with one click in the vsphere Client (either the client or the web client). Using the web client, even live migration of VMs without shared storage is possible (shared-nothing live migration, introduced with vsphere 5.1). nothing live migration is a combination of VM live migration and storage migration Shared-nothing live migration is a combination of VM live migration and storage migration. The VM s state and configuration is copied to a destination host and the file system is moved to the destination storage device. To prevent downtime, the VM s state and storage remain running on the original host and storage location until the copying process is completed. Evolution of s live migration has its live migration capabilities over the years and the application can now leverage multiple network interfaces to speed up live migration. In s upcoming vsphere 6, rumoured to be launching in March 2015, live migration over longer distances with higher latencies and between virtual centre instances is expected to be available. DRS, which leverages vmotion to balance VM workload between physical hosts, has also been in recent product versions. It now boasts rules that take preferences into account and can evacuate hypervisors during non-peak hours to conserve resources using distributed power management (DPM), available with DRS as part of vsphere Enterprise Edition. also updated Storage vmotion in vsphere version 5.0 by moving from a dirty block tracking algorithm to I/O mirroring, improving the performance and reliability of its storage live migration capabilities. computerweekly.com buyer s guide 3

4 with in Microsoft stealing s thunder Microsoft introduced the ability to move VMs across Hyper-V hosts with Windows Server 2008 R2. This required VMs to reside on shared storage as part of a cluster. Even then, Hyper-V wasn t able to move multiple s simultaneously. However, with Windows Server 2012 and Server 2012 R2, Microsoft continued to gain ground on, introducing additional migration capabilities that put Microsoft more or less on par with when looking at this specific feature. Since Windows Server 2012 R2 Hyper-V can store VMs on server message block (SMB) file shares, performing live migration on running VMs stored on a central SMB share is now possible between nonclustered and clustered servers, so users can benefit from live migration capabilities without investing in clustering infrastructure. Windows Server 2012 R2 s live migration can also leverage compression, reducing the time needed to perform live migration by 50%, according to Microsoft. With the current version of Live Windows Server 2012 R2 can use Hyper-V, you can transfer a improvements in the SMB 3.0 protocol too, which virtual s backing storage files accelerate live migration without the VM having to a new location with no downtime to be stored on a SMB 3.0 share. If the customer is using network interfaces that support remote direct memory access (RDMA), the flow of live migration traffic is faster and has less impact on the CPUs of the hosts involved. Storage live migration was introduced to the Hyper-V feature set with Windows Server Windows Server 2008 R2 allowed users to move a running VM using live migration, but you had to shut down a VM to move its storage in Windows Server 2008 R2. With the current version of Hyper-V, you can transfer a VM s backing storage files to a new location with no downtime, a feature that is critical for migrating or updating storage, or when a load redistribution on the storage side is needed. vsphere 5.5 versus Microsoft Windows Server 2012 R2 Hyper-V In their current versions, s vsphere 5.5 and Microsoft Windows Server 2012 R2 Hyper-V support shared-nothing live migration, which makes it possible to simultaneously change the location where the VM is being run as well as the backing storage location for the running VM a feature that provides additional flexibility, especially in small business environments where centralised storage is not always present. Microsoft has gained substantial ground in many areas, but experts agree there is still a gap between Hyper-V and vsphere when looking at enterprise-level features. Hyper-V lacks features, such as vsphere Storage DRS, though other features, such as Storage Spaces, offer similar functionalities. But Hyper-V comes in a powerful free version, Hyper-V Server 2012, which includes native support for Let s get this straight: VM live migration Keys to a painless data migration process Key choices in virtual Live Migration of VMs across clustered and non clustered hosts at no extra cost, while s free hypervisor has limited functionality. Going beyond live migration, both suppliers support replication capabilities, which is easy to set up with vsphere and Microsoft s Hyper-V. Combined with the cloud offerings of the suppliers, vcloud Air Disaster Recovery and Microsoft Azure Site Recovery, users can replicate and failover VMs to their suppliers cloud offerings, giving extra options for self-service recovery and business continuity. n Thinkstock Jens Soeldner is a cloud infrastructure and specialist working at German consultancy Soeldner Consult. computerweekly.com buyer s guide 4

5 with in Chris Evans discusses some of the and how they can be d shock/istock/thinkstock Backup versus replication, snapshots, CDP in data strategy Virtual products surveyed Buyer s guide vm & recovery Data is an essential part of all IT operations and has, until recently, been achieved by directly backing up physical servers over the network. But the move to environments has changed forever the landscape for successful of applications data, presenting a number of challenges. Virtual : The performance problem The move from physical to s provided many IT organisations with the opportunity to consolidate and reduce the amount of hardware resources needed. This was one of the main selling points of the first wave of consolidation to fewer servers because most of weren t fully utilised. But the infrastructure is an area that has always struggled with performance issues, even when there is a dedicated network. Therefore, backing up s using physical server infrastructure and methods has often resulted in big. Where once one app on one server was backed up, now multiple s in a single box require. For that reason, virtual s can experience severe bottlenecks when using methods that copy data from each virtual (VM) as if it were a physical server. computerweekly.com buyer s guide 5

6 with in Computer Weekly buyer s Guide The answer here is to avoid backing up data from the guest VM and instead to deploy applications that can copy directly from the host using -specific application programming interfaces (APIs) such as s vstorage APIs for Data Protection (VADP). All VM-aware products are capable of using these APIs to back up data without having to access each guest. One benefit of running host-based s is that agents can be eliminated, removing a whole set of maintenance and management tasks needed to keep the agents up to date. Virtual : The tracking problem In the physical server world, the server is clearly identifiable and tracked through an IP address and/or DNS name. Servers rarely move or change IP address, so a that fails due to an inability to contact the server can be easily resolved. In world, things aren t as simple. While it is true to say most s don t change their IP address, most also aren t backed up directly, but backed up through the host hypervisor. Virtual s can easily be migrated between physical servers and storage, so keeping track of each VM in the infrastructure becomes more complex. The result is that a VM migration may well cause the next to fail. The answer is to reference a virtual, not through the physical host on which it resides, but via a more abstract reference to the group of physical servers that support the VM, such as the cluster name or, in the case of vsphere, the datacentre object. By abstracting the reference to the VM, both restore processes are no longer dependent on the physical host hardware, which provides operational benefits by reducing the work involved in restores for clusters that have been physically or logically reconfigured. Virtual s can easily be migrated between physical servers and storage, so keeping track of each VM in the infrastructure becomes more complex Virtual : The granularity problem During data recovery, most restore requests are for individual files, a directory, or for data within an application such as an attachment. It is rare that an entire server needs to be recovered. Most restores are therefore very granular in nature, and require the recovery of a small piece of the data that constitutes a server or application. Virtual s that simply back up the files that comprise the VM may have restoring individual pieces of data unless the software is aware of the contents of the is able to understand virtual disk formats. Worse still, if the software cannot decode the contents of the, it may be necessary to restore the entire VM, albeit to a temporary location, to recover a single file, Backup challenges: Backing up s resulting in restore delays and unnecessary network traffic. Key choices in virtual Backup software needs to be able to understand the content Podcast: Virtual fundamentals of the restore objects from within files directly, without having to restore more data than necessary. Today s more advanced products are able to understand the format of application data systems and databases, for example and offer restores of individual application objects. Obviously, these technologies need to be used with care, as restoring parts of data into an application could lead to logical corruption. computerweekly.com buyer s guide 6

7 with in Virtual : The media problem Contemporary technology uses techniques such as changed block tracking to back up virtual s. These systems are well-suited to storing data on disk, as they require access to the initial plus all data changes to perform restores. But subsystems that rely solely on disk come with some caveats. Disk-based targets aren t necessarily scalable at least not in a way that is economically desirable and don t offer easy portability to Disk-based take data off site for full recovery, for example. The solution is to look at systems that are capable of supporting multiple media types, including targets aren t tape, and those that offer the ability to create synthetic s, such as a full system based on necessarily the original plus all subsequent incremental scalable and block changes. don t offer Virtual : The process problem Virtual server data can be easy portability achieved through various methods and technologies. As well as software, there are other ways to to take data secure virtual s that rely on the fact that VMs off site for are stored as files on disk. This means s can be made via snapshots or replication on shared storage. full Although array-based replication and snapshot functionality can work well, care has to be taken recovery that using these methods will result in a consistent and comprehensive policy. For example, snapshots don t cover the scenario of total array failure, such as could be experienced through fire or flood, and replication may not provide the right level of granularity for recovery when the minimum recovery point is a logical unit number (LUN). That leads to the conclusion that data is best implemented using a variety of techniques. n everythingpossible/istock/thinkstock computerweekly.com buyer s guide 7

8 with in recovery in One of the using virtual s is the scope for. Bob Tarzey looks at the product offerings from suppliers How do VM tools fit in a cloud recovery environment? VM methods for quick VM recovery Thinkstock Buyer s guide vm & recovery We do s because we know we have to in case we lose the primary versions of data and/or the systems that create and manage that data. It could just be that the original gets accidentally deleted or changed; however, the possibility of system failure will be a top priority for many. That could be anything from a disk crash on a user s device to a datacentre crushed by a meteorite. When such a failure happens, it is not just data that needs restoring, but the full working environment; in other words, recovery. recovery are not directly interchangeable terms; but recovery is not possible without in the first place. Disaster recovery is having the tested wherewithal to get systems restored and running as quickly as possible, including the data. The increasing use of has changed the way recovery is carried out because, in a virtual world, a system can be recovered by duplicating images of virtual s (VM) and recreating elsewhere. VM replication, recovery and the way the has adapted to are critical topics to consider. computerweekly.com buyer s guide 8

9 with in In the old days, if a server crashed then you would probably go through the following steps: n Get a new server. Hopefully you would have a spare to hand probably an out-of-date model, if it had not been needed for some time; n Then, either: Install all the systems and applications software, attempting to get all the settings as they were before, unless of course you had done that in advance which would not have been possible if you had only invested in one or two redundant servers on standby for many more live ones, not knowing which would fail; n Or, for a really critical application, you may have had a hot standby, all fired-up and ready to go. However, that would have doubled the costs of application ownership, with all the hardware and software costs paid twice; n Restore the most recent data, for a database that might be almost up to date, but for a file server, an overnight may be all that is available, so only as far back as the end of the last working day. Anything that was in memory at the time of the failure is likely to have been lost. How far back you aim to go is defined in a plan as the recovery point objective (RPO). Backup in Virtualisation changes everything and increases the number of options. First, data can be easily backed-up as part of an image of a given virtual (VM), including application software, local data, settings and memory. Second, there is no need for a physical server rebuild; the VM can be recreated in any other compatible virtual environment. This may be spare in-house capacity or acquired from a third-party cloud service provider. This means most of the costs of redundant systems disappear. Disaster recovery is cheaper, quicker, easier and more complete in a virtual world. In the idiom of, faster recovery time objectives (RTOs) are easier to achieve. At least, that is the theory, but it can get more complicated with the need to co-ordinate different VMs that rely on each other for example an application VM and a database VM so testing recovery is still paramount and can forestall in live systems. There are a number of different approaches, from tightly integrated hypervisor-level VM replication through to recovery as a service (DRaaS). Integrated hypervisor replication The leading platform suppliers including, Microsoft Hyper-V and Citrix Xen offer varying levels of VM replication services embedded in their products. They are tightly integrated into the hypervisor itself and so limited to a given virtual environment. However, this does give the potential to achieve the performance needed for continuous data (CDP) using shadow VMs as virtual hot standbys, minimising both RPOs and RTOs. There are other products that tightly integrate VM replication at the hypervisor level, for example EMC s RecoverPoint, which supports the co-ordinated replication and recovery of multiple VMs, so it can ensure a VM running an application is consistent with an database VM. Currently this is only for but Hyper-V and cloud management stacks such as OpenStack are on the horizon. Another is Zerto, which says it has built in better automation and orchestration than the platform suppliers, further minimising the impact on the run-time environment. Zerto currently supports just but has plans to extend support for Hyper-V and Amazon Web Services (AWS) which means, in the future, it will support failover from an in-house system to, say, AWS or another non--based system. Its product could also be used for pre-planned migration of workloads. VM snapshotting Many other virtual-aware tools work by taking snapshots of VMs at given intervals. This involves pausing the VM for long enough to copy its data, settings and memory before computerweekly.com buyer s guide 9

10 with in Computer Weekly buyer s Guide returning it to its previous state. The snapshot can be used to recreate the VM over and again. The RPO depends on how often snapshots are taken (which could be often enough to be close to CDP, but that would affect overall performance). The RTO depends on little more than how quickly access can be gained to an alternative virtual resource which, with the right preparation, should be almost immediately. A number of new suppliers specialise in virtual environment. Swiss-based Veeam launched its product in 2008 and supports and Microsoft Hyper-V. Nakivo (founded 2012) only supports. As these products have been built for a virtual world, they have many of the required adaptations built-in from the start, for example creating VM snapshotting and network acceleration to make off-site replication more efficient. The suppliers have adapted their products. For example, Symantec has just released Backup Exec 2014, which it believes matches the capability and performance of the new arrivals. Dell claims that its AppAssure mimics CDP by using a smart agent that avoids freezing the VM and takes a snapshot at least once every five minutes. CommVault s Simplana and Arcserve have also had the challenge of catching up. One difference with many of the suppliers is their capability to support both older physical environments alongside virtual ones, which remains the situation in many organisations. It also means their products are often used for migration, that is, for backing up a physical server and restoring it as a VM. Many cloud infrastructure service providers, for example Rackspace and Amazon provide VM replication, enabling customers to put their own failover in place, but generally this is limited to their own platforms. Disaster recovery as a service (DRaaS) providers The widespread use of and availability of cloud platforms for recovering workloads has led to a proliferation of DRaaS offerings. Here the replication of VMs is embedded in the service, so the customer has little to do other than due diligence and to sign on the dotted line. Some are offered by cloud/hosting service providers; for example NTT Communications has a European offering in partnership with US-based DRaaS provider Geminare. Broader recovery specialists such as SunGard and IBM include DRaaS in their portfolios. DRaaS providers provide unique value to make it worth their customers while. Some take this to a new level, for example UK-based Plan B Disaster Recovery says its Microsoft Windows Server DRaaS offering can guarantee recovery, because it includes nightly testing of the recoverability of the images it takes of its customers server environments. This not only ensures recoverability but often pre-empts the customer has yet to notice. Plan B operates at the application level so is hypervisor-neutral, supporting, Hyper-V and Xen. Plan B s service can image physical servers as well as virtual ones. Quorum offers a service called onq that was originally developed for the US Navy to enable the rapid movement of processing from one part of a ship to another in times of battle damage, so it is very fast and very resilient, supporting physical or virtual Linux and Windows servers. OnQ is also hypervisor-agnostic. In the UK it uses a local datacentre partner to recover the customer server images as VMs, which it claims allows RTOs as quick as a server reboot. How changed recovery Cloud, and recovery Interestingly, Plan B says that, whenever its service has been Best practice in voked to recover a physical server in a virtual environment, the customer does not go back. In other words, recovery services can be used to migrate to virtual environments, but can also provide the motivation to do so in the first place. And that may have got you thinking if cloud is good enough as a secondary for even our most critical applications, could it not actually also become our primary platform in the longer term? n Bob Tarzey is a director at IT analyst company Quocirca computerweekly.com buyer s guide 10