Data Loss in a Virtual Environment

Similar documents
Emergence of Business Continuity to Ensure Business and IT Operations. Solutions to successfully meet the requirements of business continuity.

Kroll Ontrack VMware Forum. Survey and Report

After the Attack. Business Continuity. Planning and Testing Steps. Disaster Recovery. Business Impact Analysis (BIA) Succession Planning

The Nuances of Backup and Recovery Solutions

Real-time Protection for Microsoft Hyper-V

BUSINESS CONTINUITY: THE PROFIT SCENARIO

Availability and the Always-on Enterprise: Why Backup is Dead

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

Annual Public Safety PSAP Survey results

Network Performance, Security and Reliability Assessment

arcserve r16.5 Hybrid data protection

White Paper: Backup vs. Business Continuity. Backup vs. Business Continuity: Using RTO to Better Plan for Your Business

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

The 10 Disaster Planning Essentials For A Small Business Network

The Expanding Role of Backup and Recovery in the Hybrid Enterprise

Backup vs. Business Continuity: Using RTO to Better Plan for Your Business

CLOUDALLY EBOOK. Best Practices for Business Continuity

Are You Protected. Get Ahead of the Curve

7 Ways Compellent Optimizes VMware Server Virtualization WHITE PAPER FEBRUARY 2009

Don t Jeopardize Your Business: 5 Key Business Continuity Use Cases for Cloud

Provided as an educational service by: Introduction

User Survey Analysis: Next Steps for Server Virtualization in the Midmarket

OL Connect Backup licenses

INFORMATION SECURITY- DISASTER RECOVERY

Business Continuity and Disaster Recovery Disaster-Proof Your Business

Recovery ROI: Doing More with Less. How to Save Time and Money when Recovering

The Importance of Data Protection

Conducted by Vanson Bourne Research

iseries Failover 5/28/2010 Revised 6/10/2010

What to Look for in a DRaaS Solution

Exam : Title : ASAM Advanced Security for Account Managers Exam. Version : Demo

3.3 Understanding Disk Fault Tolerance Windows May 15th, 2007

Security+ Guide to Network Security Fundamentals, Third Edition. Chapter 13 Business Continuity

Disaster Recovery and Mitigation: Is your business prepared when disaster hits?

Carbonite Availability. Technical overview

The 10 Disaster Planning Essentials

Digital Transformation Drives Distributed Store Networks To The Breaking Point

EMC RecoverPoint V3.0 Asks Why Not Both? to CDP and Remote Replication

THE MORE THINGS CHANGE THE MORE THEY STAY THE SAME FOR BACKUP!

Protecting Microsoft Hyper-V 3.0 Environments with Arcserve

Aljex Software, Inc. Business Continuity & Disaster Recovery Plan. Last Updated: 1/30/2017.

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Integrated Data Management:

RAIFFEISENBANK BULGARIA

Simplifying HDS Thin Image (HTI) Operations

Controlling Costs and Driving Agility in the Datacenter

Step into the future. HP Storage Summit Converged storage for the next era of IT

Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX

The Data Protection Rule and Hybrid Cloud Backup

Cyber Resilience. Think18. Felicity March IBM Corporation

Build a viable plan for disaster recovery and crisis management.

Disaster Recovery Self-Audit

SERVICE DESCRIPTION MANAGED BACKUP & RECOVERY

Are you protected? Get ahead of the curve Global data protection index

Six Sigma in the datacenter drives a zero-defects culture

High availability and disaster recovery with Microsoft, Citrix and HP

ON CALL, ALL THE TIME DISASTER RECOVERY AS A SERVICE FROM WINDSTREAM

NOW! Manage ALL workloads virtual, physical and cloud from a single console!

FairCom White Paper Caching and Data Integrity Recommendations

High Availability and Disaster Recovery Solutions for Perforce

Storage Virtualization Explained

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

Data Sheet: High Availability Veritas Cluster Server from Symantec Reduce Application Downtime

Disaster Planning Essentials and Disaster Planning Checklist

Ten things hyperconvergence can do for you

WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY

Protecting VMware vsphere/esx Environments with Arcserve

Arcserve Reduces Backup Windows and Streamlines Data Retrieval for First Securities

Disaster Recovery Is A Business Strategy

Three Steps Toward Zero Downtime. Guide. Solution Guide Server.

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Guide. A small business guide to data storage and backup

Introduction to Business continuity Planning

Creating and Testing Your IT Recovery Plan

vsan Disaster Recovery November 19, 2017

Backup and Restore Strategies

Why the Threat of Downtime Should Be Keeping You Up at Night

SharePoint Cincy 2011

ReadyNAS OS 6 Desktop Storage Systems Hardware Manual

DISASTER RECOVERY (DR): SOMETHING EVERYBODY NEEDS, BUT NEVER WANTS TO USE

BUSINESS CONTINUITY PLAN Document Number: 100-P-01 v1.4

Cutting Backup and Recovery Time in Half with Solutions from Symantec

Disaster Recovery for All VMware vcenter Site Recovery Manager 5 Puts Real-Time Replication and Disaster Recovery Within the Reach of Every Business

A Practical Guide to Cost-Effective Disaster Recovery Planning

Balancing the pressures of a healthcare SQL Server DBA

Disaster Unpreparedness June 3, 2013

Hitachi Freedom Storage Thunder 9200

Repairing the Broken State of Data Protection

VMware vsphere with ESX 4.1 and vcenter 4.1

IBM System Storage DS5020 Express

Ensuring Business Resilience Jim Neumann, Vice President of Marketing, Power Analytics Corp.

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

5 Things Small Businesses Need to Know About Disaster Recovery

A Model for Resilience

Introduction. Read on and learn some facts about backup and recovery that could protect your small business.

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

Escaping PCI purgatory.

Table of Contents. Sample

The Microsoft Large Mailbox Vision

Transform Availability

Transcription:

Data Loss in a Virtual Environment Solutions to successfully meet the requirements of business continuity. An Altegrity Company

2 3 4 5 Introduction Common Virtual Data Loss Scenarios Recent Virtualization Data Loss Case Studies Data Loss Consequences 1 Copyright 2012 Kroll Ontrack Inc. All Rights Reserved. Kroll Ontrack, Ontrack and other Kroll Ontrack brand and product names referred to herein are trademarks or registered trademarks of Kroll Ontrack Inc. and/or its parent company, Kroll Inc., in the United States and/or other countries.

Introduction The advent of virtualization technology such as Oracle VM and Oracle VM VirtualBox have enabled business continuity planning and execution for many organizations. However, virtualization technology is complex and requires specialized skill and knowledge from both IT staff and management. In fact, if deployed or managed carelessly, virtualization can itself create business disruptions or data disasters as outlined by the graphs below. According to Forrester Research s report on the business state of disaster recovery preparedness, a joint effort with the Disaster Recovery Journal, 2 many organizations have improved their disaster recovery capabilities over the past few years. Despite a slow economy, survey respondents reported an increased confidence in being prepared for a data center disaster or site failure. If deployed or managed carelessly, virtualization can itself create business disruptions or data disasters. Traditional System 74 % Machine 26 % Virtual System Human Source of Failure: Human vs. Machine Human error Lack of education System and hardware failure Environmental (power outage, over-voltage, etc.) 35 % Machine 65 % Human (figure 1) Seventy-six percent of survey respondents reported no disaster or major disruption in the past five years, yet Forrester Researcher reports that companies should not take comfort in this statistic. Instead, it should serve as a wake-up call because a whopping 25 percent of companies are likely to declare a disaster. Furthermore, business disruptions are much more common than declared disasters. Getting an organization to declare a disaster can be a matter of perspective, according to Don Stewart, director of professional services at Ongoing Operations, a non-profit business continuity service provider for U.S. credit unions. In some events, IT is so focused on fixing the problem that they don t inform senior management of 1 For purposes of this article a business disruption is anything that prevents day-to-day work from being done, including power disruption, downed phone lines, and so forth. A data disaster occurs when data is corrupted. Hence, a data disaster is a subset of business disruption. 2 Forrester Research s 2010 report on the business state of disaster recovery preparedness, a joint effort with the Disaster Recovery Journal, http://www.drj.com/images/surveys_pdf/forrester/2011forrester_survey.pdf 2

the disaster event, Stewart relates. Some organizations have not defined what a business disruption is, therefore senior management will hesitate to declare a disaster if the event is perceived to be minor; for instance in the case of a phone system failure, or delays in e-mail messaging. Fifteen percent of respondents knew the cost of their business downtime; it averaged nearly $145,000 USD per hour. Staying prepared requires more than having a documented business continuity plan; it requires teamwork from all stakeholders. Having a stake in planning at this level ensures that business operations would be maintained in the event of a disruption. Stewart recommends that a good plan starts with a risk impact analysis. Most companies, according to Stewart, will purchase an in-depth risk assessment and then do nothing about it; the report just sits there with no further actions being taken. This is as effective as making a list of essentials to pack in a kit in case of a house fire but never assembling the kit. Common Virtual Data Loss Scenarios When data loss happens within a virtual data center it is usually due to human error. Other virtual data loss events result from hardware failure and are exacerbated by the lack of a disaster recovery plan. Disaster recovery plans that are weak or not regularly tested force IT staff to focus on un-tested repairs based on faulty troubleshooting. Obviously, nobody wants a data loss or business disruption on the systems they are responsible for. Too often, a serious data loss or business disruption results in unemployment for all responsible or thought to be responsible. Other data loss scenarios have been due to overconfidence in a SAN s redundancy. Precious recovery time is lost when it is discovered that important backups are corrupted or not readable during the middle of a disaster. This is the worst possible time to learn that backups have been failing or that backup software has not been reporting media failures during backup sessions. Reference the pie chart provided by Kroll Ontrack that provides a breakdown of 2010 virtual data loss types: Virtual Data Loss Types 6 % All Other 13 % Format & Reinstall 28 % Virtual Disk Corruption 28 % 25 % RAID/Hardware Fault Deleted Files on Virtual File System (figure 2) 3

Recent Virtualization Data Loss Case Studies A virtualization data loss can be catastrophic for an organization. Determining the financial impact of a business disruption is difficult because there are both tangible factors, including productivity loss, missed sales opportunities and staff s hourly time, but also less tangible factors to downtime such as potential non-compliance penalties, damage to corporate image and weakened customer confidence. The previously quoted Forrester-DRJ survey noted that 15 percent of respondents knew the cost of their business downtime; it averaged nearly $145,000 USD per hour. That is an estimate that would make any director or CIO take notice of the readiness of their business continuity plan. Virtualization technology can compound those losses as illustrated by the following cases. The Case of the Reformatted Server A business in Italy recently experienced a business disruption when their 4TB virtual host server lost access to the storage system. The virtual environment contained 40 virtual machines in a mixed operating system environment; some were Linux, a few were legacy UNIX systems and the remainder consisted of Microsoft Windows servers. These supported business application servers, web servers and database servers. The virtual host server was operating a Linux-based hypervisor with two 2TB LUNs attached. At some point, the storage LUNs were reformatted. The reason for the reformat wasn t divulged, but the damage to the existing file system structures was severe and extensive. During the reformat process, the Linux storage manager writes EXT file system metadata in predefined areas throughout the volume. This metadata contains only a couple of thousand bytes of information, yet the impact upon the virtual host server s file system and virtual disk files was devastating. Each virtual machine had four to six virtual disk files totaling 70-90 virtual disk files stored by the host server. Some of the virtualized Microsoft Windows servers employed dynamic disk volume configurations (i.e., logical volume manager in Linux parlance) between multiple virtual disk files, further complicating recovery efforts. After the organization s IT department exhausted all internal resources, a professional data recovery firm was engaged to recover the data. Despite the damage, virtual disk files were found and critical data was restored. It is industry best practices combined with IT management procedures that ensure data protection. The Case of the Disastrous Data Merge A United States business merger suffered a disaster while the two company s IT departments were merging their data. Evidence suggests the disaster was caused by employee sabotage and the cause is still under investigation by computer forensic investigators. The first company s virtual host server held over 400 virtual machines across 20 storage LUNs. During the data merge, someone with administrative access to the virtual host server systematically deleted the 400 virtual machines and their virtual disk files, causing the loss of over 440 virtual disk files and over a thousand snapshot files. 4

The merging company quickly engaged emergency data recovery services and prioritized core servers that provided essential services. In three days, those systems were up and running. For the next two weeks emergency recovery efforts continued on the rest of the storage system. This required extensive recovery engineering efforts to search the unallocated areas of the storage LUN for potential virtual disk files, identifiable only by their file system attributes. Through a combined effort of backup restoration and original volume recovery, data was recovered. Most of the virtual disk files were complete, while other virtual disks required the file contents to be extracted due to file system damage. The Case of the Off-Site SAN Reformat Disaster recovery efforts went from bad to worse for a company in Luxemburg. During routine maintenance on the company s SAN storage that housed its virtual machines, the SAN was presented to a different physical server by accident. When the SAN storage was identified as unknown, the volume was automatically reformatted. Initially, some staff panicked due to the potential for data loss. They were relieved when they were reminded of the identical SAN storage located off-site which employed the SAN equipment s automated site replication technology. It was thought that this would be a minor business disruption. Upon logging into the remote SAN, the IT team discovered that the remote SAN was an identical copy of the primary site; the SAN s automated site replication technology had not been disabled prior to the maintenance. Thus, when the reformat occurred at the primary site, the secondary SAN was reformatted as well. Through the efforts of experienced data recovery engineers, the virtual machines and virtual disk files were successfully recovered. This organization did not have any backups because it was assumed that dual storage architecture and site replication mechanisms provided complete data and system redundancy. This case is especially compelling because storage equipment features provided a false sense of security. In reality, it is industry best practices combined with IT management procedures that ensure data protection. 5

Data Loss in Virtual Environments Data loss on virtual environments often has a high impact on the user organization, as multiple virtual machines are often affected by a single technical failure or human error. Virtualization brings an extra layer of complexity to a host system and when data loss occurs, Kroll Ontrack has the tools to respond. Kroll Ontrack supports file formats including VHD, VDI and VMDK which can be used by Oracle VM and Oracle VM VirtualBox, and has solutions to the following scenarios plus more:» Guest and host file system corruption» Deleted virtual machines and snapshots» Internal virtual disk corruption» RAID and other storage/server hardware failures» Deleted or corrupt files contained within the virtual disk Tips on How to Safely Recover From Virtualized Data Loss» Restore backups to a different volume. This ensures that all important files are good on the backup before possibly overwriting data on the active volume.» Do not create any new files on the disk needing recovery or continue to run virtual machines until the important data is recovered. New files can overwrite the files that need recovery if restoring the backup fails. VMs using snapshots and thin provisioned virtual disks that are still in use after the data loss can also overwrite files that need recovery.» Do not run FSCK or CHKDSK file system repair tools on a virtual disk unless a good backup has been validated by restoring it to a different volume. These repair tools assume that there is a good backup of the data and can overwrite file pointers to make a file system consistent. If desired, these tools can be run in read-only mode to find any major corruption before repairs are made. Tips on How to Improve Data Recovery Service Success» Do not delete any additional files prior to a data recovery of deleted data. Additional deleted files can complicate the data recovery.» Do not try recovery software unless you are sure it will not write anything to the disk that needs recovery. Some recovery software will attempt to write to the source disk and could damage later recovery attempts.» If one virtual disk needs recovery but others are still running from the same volume and cannot be shut down during the recovery, clone or migrate them to another volume. If a deleted virtual disk or snapshot needs recovery, it is best to copy or clone the virtual machines instead of migrating them so they are not found as part of the deleted recovery.» Shutdown or migrate any other active VMs on the same volume that are thin provisioned or are using snapshots. Any writing to the new blocks on the volume can overwrite recoverable data. 3 Worldwide Disk Storage Systems Finishes 2010 with Double-Digit Growth on Strong Fourth Quarter Results, IDC, March, 2011 6

For more information, call or visit us online. 800.872.2599 in the U.S. and Canada +1.952.937.5161 www.ontrackdatarecovery.com Copyright 2012 Kroll Ontrack Inc. All Rights Reserved. Kroll Ontrack, Ontrack and other Kroll Ontrack brand and product names referred to herein are trademarks or registered trademarks of Kroll Ontrack Inc. and/or its parent company, Kroll Inc., in the United States and/or other countries. All other brand and product names are trademarks or registered trademarks of their respective owners. E1212