WHITE PAPER BCDR: 4 CRITICAL QUESTIONS FOR YOUR COMMUNICATIONS PROVIDER

Similar documents
Hybrid Cloud for Business Communications

Building Infrastructure for Private Clouds Cloud InterOp 2014"

Differentiating Your Datacentre in the Networked Future John Duffin

WHITE PAPER. Header Title. Side Bar Copy. Header Title 5 Reasons to Consider Disaster Recovery as a Service for IBM i WHITEPAPER

Diverse Connect. Guaranteed connectivity for all your critical applications

Appendix 3 Disaster Recovery Plan

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

A Practical Guide to Cost-Effective Disaster Recovery Planning

MPLS SOLUTION How to Make the Best Choice for Your Business

IBM TS7700 grid solutions for business continuity

Transform your network and your customer experience. Introducing SD-WAN Concierge

MPLS VERSUS CARRIER ETHERNET 7 REASONS WHY ETHERNET IS AN ECONOMICAL ALTERNATIVE

Connect To. A Better Communications Experience

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

WHY NETWORK FAILOVER PROTECTION IS A BUSINESS NECESSITY

Why the Threat of Downtime Should Be Keeping You Up at Night

Understanding Virtual System Data Protection

BT Connect Networks that think Optical Connect UK

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Step into the future. HP Storage Summit Converged storage for the next era of IT

Network Diversity and Survivability:

Transform your network and your customer experience. Introducing SD-WAN Concierge

Veritas Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL 2008

HUAWEI OceanStor Enterprise Unified Storage System. HyperReplication Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

IT your way - Hybrid IT FAQs

RingCentral White Paper UCaaS Connectivity Options in the New Age. White Paper. UCaaS Connectivity Options in the New Age: Best Practices

WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY

High Availability for Cisco Unified Communications on the Cisco Unified Computing System (UC on UCS)

Disaster Recovery Is A Business Strategy

Network Service Description

Balancing RTO, RPO, and budget. Table of Contents. White Paper Seven steps to disaster recovery nirvana for wholesale distributors

THE FASTEST WAY TO CONNECT YOUR NETWORK. Accelerate Multiple Location Connectivity with Ethernet Private Line Solutions FIBER

ALCATEL Edge Services Router

Transport is now key for extended SAN applications. Main factors required in SAN interconnect transport solutions are:

CONSIDERATIONS BEFORE MOVING TO THE CLOUD

Advanced Architecture Design for Cloud-Based Disaster Recovery WHITE PAPER

W H I T E P A P E R : O P E N. V P N C L O U D. Implementing A Secure OpenVPN Cloud

Introduction to Business continuity Planning

IPMA State of Washington. Disaster Recovery in. State and Local. Governments

Reliable, fast data connectivity

SD-WAN Transform Your Agency

Choosing the Right. Ethernet Solution. How to Make the Best Choice for Your Business

YOUR CONDUIT TO THE CLOUD

Data Services. Reliable, high-speed data connectivity

SIMPLE, FLEXIBLE CONNECTIONS FOR TODAY S BUSINESS. Ethernet Services from Verizon

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

Virtualization with Protection for SMBs Using the ReadyDATA 5200

How to license Oracle Database programs in DR environments

PRODUCT OVERVIEW. Storage and Backup. Flexible Scalable Storage Solutions for. Product Overview. Storage and Backup

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

Help your Cisco customers defend against downtime and reduce costs

Managing Data Center Interconnect Performance for Disaster Recovery

Disk-Based Data Protection Architecture Comparisons

Projectplace: A Secure Project Collaboration Solution

RECOVERY & BUSINESS CONTINUITY SERVICES. Protect your data. Recover your environment. Manage your recovery.

arcserve r16.5 Hybrid data protection

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

Annual Public Safety PSAP Survey results

Improving Business Continuity for the

What happens to my phones if the Internet connection goes down?

What can the OnBase Cloud do for you? lbmctech.com

EVERYTHING YOU NEED TO KNOW ABOUT NETWORK FAILOVER

INNOVATIVE SD-WAN TECHNOLOGY

BUSINESS CONTINUITY MANAGEMENT PROGRAM OVERVIEW

A Model for Resilience

INFORMATION SECURITY- DISASTER RECOVERY

UPS system failure. Cyber crime (DDoS ) Accidential/human error. Water, heat or CRAC failure. W eather related. Generator failure

Total Cost of Ownership: Benefits of the OpenText Cloud

Backup and Restore Strategies

Emergence of Business Continuity to Ensure Business and IT Operations. Solutions to successfully meet the requirements of business continuity.

Backup vs. Business Continuity

Module 4 STORAGE NETWORK BACKUP & RECOVERY

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Connectivity to Cloud-First Applications

IBM TotalStorage Enterprise Storage Server Model 800

Executive Report. Using the CyrusOne IX for Active-Active, Active-Passive and Active-DR Interconnection

EdgeConnectSP The Premier SD-WAN Solution

Smart Fiber is Smart Business. Connectivity Benefits to Property Managers and Tenants

Solution Brief. IBM eserver BladeCenter & VERITAS Solutions for Microsoft Exchange

Introduction. Read on and learn some facts about backup and recovery that could protect your small business.

What's in this guide... 4 Documents related to NetBackup in highly available environments... 5

CLOUDALLY EBOOK. Best Practices for Business Continuity

Public and Private Interdependencies Filling a Gap in Most Continuity Plans

A Ready Business rises above infrastructure limitations. Vodacom Power to you

Peer Software and Scality - A Distributed File System Approach to Scale-out Storage

Data Services. Reliable, high-speed data connectivity. Group Ltd

Certified Information Systems Auditor (CISA)

CASE STUDY: Borrego Health

Continuous Processing versus Oracle RAC: An Analyst s Review

SQL Server HA and DR: A Simple Strategy for Realizing Dramatic Cost Savings

Disaster Happens; Don t Be Held

Virtual Disaster Recovery

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

Data Services. Reliable, high-speed data connectivity

Data center interconnect for the enterprise hybrid cloud

Information Technology Disaster Recovery Planning Audit Redacted Public Report

FLEXIBLE NETWORK SERVICES TO DRIVE YOUR ENTERPRISE AT CLOUD SPEED. Solution Primer

Features. HDX WAN optimization. QoS

CONTENTS. 1. Introduction. 2. How To Store Data. 3. How To Access Data. 4. Manage Data Storage. 5. Benefits Of SAN. 6. Conclusion

Avoiding the Cost of Confusion: SQL Server Failover Cluster Instances versus Basic Availability Group on Standard Edition

Transcription:

WHITE PAPER BCDR: 4 CRITICAL QUESTIONS FOR YOUR COMMUNICATIONS PROVIDER

Insurance for IT Infrastructure? We can buy insurance for contract performance, pitching arms and even vacations. But when it comes to information technology (IT) processes and systems, the risk factors to business are too great and the level of complexity makes things difficult to quantify. As an enterprise business, the only choice is to protect your own interests. Many BCDR sources provide information about strategy, data center design, and data replication techniques. But few sources directly address one of the most fundamental elements of an effective BCDR strategy: the underlying communications network. This has lead to the growth of Business Continuity and Disaster Recovery (BCDR) planning, a business function that works as self insurance against the loss of IT services, among other things. It starts with thoroughly assessing risks to determine the proper amount of coverage then working to mitigate those risks as effectively as possible. Many BCDR sources provide information about strategy, data center design, and data replication techniques. But few sources directly address one of the most fundamental elements of an effective BCDR strategy: the underlying communications network. This white paper examines how the network impacts the BCDR plan, describes what a highly reliable communications infrastructure is and why it might belong in your BCDR plan. We give you four critical questions to ask based on common points of failure, to help you reduce the risk of a network outage and make you better able to recover if it happens. The paper concludes with a case study scenario based on a national, geographically dispersed company. Separate Topics, Common Considerations Business continuity and disaster recovery are two separate disciplines. Business continuity refers to the processes, policies and procedures related to recovering or continuing critical services in the event of a natural disaster or human-induced incident. Disaster recovery, in turn, is the process by which an organization resumes business after a disruptive event. The two topics are often grouped together because of their many common considerations. Business Continuity An effective business continuity plan must first classify each IT process and resource as it relates to business operations. Are they critical, important, or marginal? Next, an appropriate level of risk needs to be established by the executive team based on the business impact of a failure to a system or process. For example, there may be little business impact associated with an interruption to a cafeteria pointof-sale application, but an e-commerce website might be considered critical. Lastly, processes should be developed and tested to ensure that each IT function is protected in line with its established level of risk. There are systems for which any interruption would represent a significant loss. Others, although still critical, could still recover from a small delay during a switch to a backup server. In other words, there are systems we must protect from ever failing, and systems we must be able to recover. The challenge is less in identifying what is critical, but rather in knowing how to protect it. To guard the availability of various IT services, you can find many options. Many applications have built-in resiliency to certain types of failures. Protection for many systems must be manually designed and implemented. But even with the most well-engineered and executed business continuity plan, outages can occur. This brings us to the second phase of IT risk management: disaster recovery. Disaster Recovery Disaster recovery planning is the way we support recovery from a loss. Disasters come in many different forms: natural, accidental and malicious. Likewise, recovery plans are often developed with multiple layers of protection. For example, the protection of a critical system may involve the use of a RAID storage array combined with local mirroring and nightly off-site backups. Most recovery strategies rely on the use of

diverse resources either logical or physical to avoid putting all eggs in one basket. High Availability Networks vs. Highly Available Services If you hear the terms high availability and broadband in the same sentence, it s probably referring to geography. That is, you can connect to this service anywhere. However, you ll seldom hear about things like five nines availability in respect to a DSL connection. These types of services are typically classified as consumer services and, more often than not, are delivered on a best-effort basis. Highly available services are business-class services with built-in provisions for fault tolerance and disaster recovery. Recovery techniques and strategies are as varied as the types of disasters they are designed to mitigate. New solutions continue to help reduce risk and improve recovery times. Whether you re maintaining redundant servers or operating hardened data centers, businesses have many options for recovery strategies. One such revolution in recovery services has emerged with Cloud computing. Though virtualization, Cloud computing enables businesses to deploy recovery solutions in the Cloud that are both logically and physically diverse. Virtualization technology has had a tremendous impact on disaster recovery and has introduced a whole new level of complexity. Cloud solutions are simpler to implement, yet they introduce their own unique set of BCDR concerns. In fact, these types of solutions highlight one element of the overall BCDR plan that is easy to take for granted: the communications network. The Role of the Communications Network Almost every BCDR plan, at some level, relies on network communications. Whether you re connecting to a redundant disk array over a local area network (LAN) or using an off-site cloud storage facility over a high-speed storage area network (SAN) connection, your communications link should be a key consideration in both business continuity and subsequent disaster recovery planning. The communications link is often overlooked when considering networks and disaster recovery. We tend to first consider protecting ourselves against the problems that occur locally, at our location. For example, if your business is located in a hurricane zone, you think about protection from flooding and other physical damage to your place of business. But odds are that your business also relies on connections to the outside world of suppliers, sales staff or remote offices. Maybe the hurricane misses your office and instead destroys the central office eight blocks away through which all of your network connections originate. While your building may be physically untouched, the business impact from the loss of your network connection could be tremendous. How do you protect against this type of disaster? Eliminating the 4 Critical Points of Failure On-From a communications network perspective, eliminating common failure points is subject to the engineering rules and best practices of each carrier. But for most BCDR planning purposes, addressing four critical points of failure can help incrementally reduce your overall risk of a network outage. You should ask your carrier pointed questions about physical network routes, gateway facility design, equipment reliability, and technical features. Question 1: Can you demonstrate physical route diversity? Route diversity is one of the most challenging items to address. The reasons are both physical and operational. There are limited ways to physically connect to any given location. Rights of way are strictly controlled and, although it would seem that there are an infinite number of ways to connect City A to City B, there may be as few as three or four major long-haul route paths between those cities. All telecommunications carriers serving a particular city share these common routes. This is why a single fiber cut often impacts multiple carriers.

Further complicating the issue, only a few carriers actually build and maintain their own fiber connections. Many carriers simply lease or purchase connectivity from these carriers. From a business perspective, who builds and who leases is kept somewhat secret. Dark Fiber Considerations Many businesses consider building their own communications network to gain greater control of their network. Keep in mind that if you are looking to acquire rights to use dark fiber, you ll need to deploy your own equipment at the point where the fiber terminates in a carrier s gateway. In addition to the task of lighting the fiber, you ll need to ensure that your own gateway within the carrier s facility is built reliably. For example, to ensure that you are protected from Farmer John s annual spring plow fiber cut, you arrange to have communication services from your data center to your main office delivered by two different carriers. Unbeknownst to you, Carrier A has purchased fiber from Carrier B, and everyone s fiber is in the same cable running through Farmer John s field. Unless you directly ask, you may not know whether your carrier owns or leases their connections in a given area. The bottom line: working with multiple carriers does not guarantee multiple routes. In some cases, separate paths might be best provided by a single carrier that actually operates their fiber resources and can provide you with routes that they know to be physically diverse. Question 2: How is your gateway facility designed for power, security, cooling and diverse connectivity? A carrier gateway or central office represents another single point of failure, due to its single physical location. The gateway is home to the equipment used to create communications services. Whether the service is data, voice or video, the equipment that creates it requires a secure location with space, power and cool air. How do you know if your gateway is a carrier grade facility? Ask your provider about: Redundant Power Gateways can be designed with multiple power connections using separate paths to the main power grid. Multiple power supplies within the building should have UPS and generator backups. Physical Security Physical security protects equipment from unauthorized access. Gateways should be protected by multiple layers of access control; for example, systems may need to provide separate access to areas managed by different carriers. Resilient Cooling Within a gateway, highly reliable and well-designed cooling systems are critical. Diverse Connectivity Just as carriers share rights of way, they also share gateway space. You cannot assume that because you use multiple carriers they have separate network resources. Ask you carrier where your connections are terminated. Is it in their gateway or a leased facility? There is an enormous difference between enterprise-grade and carrier-grade communications equipment. You cannot take for granted that every communications provider deploys carrier-grade equipment. Also, consider route diversity within the gateway. Separate physical paths can often home back to a single gateway. While this could pose an increased risk, it can be mitigated by diverse building penetrations, laterals, and cable tray routing. Question 3: Do you use carrier-grade communications equipment? There is an enormous difference between enterprisegrade and carrier-grade communications equipment. Both a large enterprise and a carrier may have gigabyte routers in their data centers, and those routers may have very similar performance characteristics. But that s where the similarity ends. Carrier-grade equipment is designed to allow for multiple redundant power feeds. It will likely incorporate redundant power supplies and data processing cards. It is typically designed to be mounted in a hardened environment and to operate in extreme environmental conditions.

You cannot take for granted that every communications provider deploys carrier-grade equipment. This isn t to say that in some cases the enterprise-grade equipment may actually provide a reliable solution. Ultimately, the best way to understand the reliability the carrier expects to derive from any particular system is to ask them to put in writing, in the form of a Service Level Agreement (SLA). If there is no SLA for service availability, or if the SLA is best-effort, you need to carefully consider how the loss of connectivity could impact both your business continuity and disaster recovery efforts. Question 4: Does the SLA support the service from end-to-end? When it comes to technical features for fault tolerance, each communications technology is designed differently. Some technologies are purposefully designed with no fault tolerance. For example, if a message were lost for a certain application, trying to recover it might cost more in terms of resources than the impact of the loss itself. Optical transport systems, on the other hand, are usually designed with mechanisms that provide for nearly instantaneous fault recovery. While understanding how these technologies work would be of value, you don t need to be a telecommunications engineer to decipher the strength or weaknesses of each technology as it pertains to your BCDR planning. The solution is as simple as a close examination of the service SLA. The metric that you want to consider is the network availability. You should understand what the availability specification is and how it is specified. You ll want to know if it includes the entire connection path or if it only includes the core carrier network. The best SLAs will include the entire service connection from end-to-end. Data Replication Overview Data replication is often one of the most key processes business use to provide for business continuity. Unlike a backup, which are just snapshots in time, a replicated database is a live copy that changes with the primary image. Synchronous replication provides the highest protection against data loss, yet also requires the most resources. In a synchronous system, whenever a new piece of information is stored, it is written to two separate storage systems. The application must wait until a write acknowledgement is received from both systems before the operation can be completed. When storage systems are in different physical locations, the connection bandwidth, reliability and latency between the locations are very important. Asynchronous replication does not require an immediate write acknowledgement. Because of this, the replicated storage system can be located much further away and has less stringent network connectivity requirements. However, these solutions are subject to lag there can be a significant difference between the two storage images due to the delay time it takes for the write action to complete on the replicated system. The network is the primary source for this delay, so be sure to manage the amount of acceptable lag and provide a sufficient amount of bandwidth. Backup Datacenter MPLS Network Secondary asynchronous replication via MPLS Network Medical Manufacturing Facility Primary Secondary Corporate HQ Datacenter Primary synchronous replication via redundant diversely routed DWDM links

Incorporating Highly Available Services into the BCDR Plan With an understanding of how the communications network can be designed to provide highly available services, we can better take advantage of the service benefits. Knowing the right questions to ask about how the service is provided by the carrier can help you specify the correct type of connections for a particular service in light of your specific business continuity plans. Example Scenario Through the assessment of their operations, an international medical products company determined that their supply chain management database is mission-critical and needs to be protected with a storage replication solution. Their manufacturing facility is located in an area subject to extreme weather, so they desired a geographically diverse backup system. Since the database is accessed and updated often, they sought solutions for both synchronous and asynchronous replication. The synchronous replication storage was placed within one of their branch locations located in the same city as their main data center. Since DWDM would provide very low latency, they chose to use a dedicated DWDM service for the connectivity. Additionally, they chose a protected service so that a fiber cut wouldn t interrupt their operations. They also worked with the carrier to ensure that their protection fiber was built over a diverse physical route. To allow for geographic diversity, the company set up an asynchronous replicated storage system with the remote disk array located in a leased data center space in a different city. Performance of the replication system was still important, but because it wouldn t impact their day-to-day operations as significantly, they chose to use a private MPLS-based VPN for this connectivity. While replication can require large amounts of bandwidth, its demands can be a very bursty. To control cost, a usage-based service was used that allows the company to better manage costs yet still have access to high bandwidth as required. And since this service is redundant to their primary synchronous solution, they selected a communications service with an availability SLA of 99.99% that covered their entire connection from end-to-end. Your communications are vital; from providing the voice connections you need to contact your employees and customers, to the data connections used to access to your off-line storage. Summary Business continuity and disaster recovery planning requires businesses to asses their entire operation. As we look to self insure, we must think like an insurance company. The first step to putting a policy in place should be a careful risk assessment. As your organization prepares its BCDR plan, assess all of your risks. Know the points of failure. Your communications are vital; from providing the voice connections you need to contact your employees and customers, to the data connections used to access to your off-line storage. There are highly available communications solutions available that, when properly applied, can greatly help you manage your risks. We build, operate and take end-to-end responsibility for the network solutions that connect you to the world. We put customers first and take ownership of reliability and security across our broad portfolio. 2014 Level 3 Communications, LLC. All Rights Reserved. Level 3, Level 3 Communications, the Level 3 Logo, MyLevel3 and Vyvx are either registered service marks or service marks of Level 3 Communications, LLC and/or one of its Affiliates in the United States and/or other countries. Level 3 services are provided by wholly owned subsidiaries of Level 3 Communications, Inc. Any other service names, product names, company names or logos included herein are the trademarks or service marks of their respective owners. REV. 08/14