A Survival Guide to Continuity of Operations David B. Little Senior Principal Product Specialist
Customer Perspective: Recovery Time & Objective Asynchronous Replication Synchronous Replication WAN Clustering Global Clustering Replication and Remote Mirroring >RPO/RTO LAN Clustering Storage Checkpoints Online Volume Management Local Clustering (HA) Online Volume Management, Storage Checkpoints, Point-in-Time Copies Vaulting Backup Data Protection (Backup, Recovery, Vaulting) Security Security Management (Firewall, IPS/IDS, Critical Systems Protection, Encryption, VM, AV) Low-Level SLA Medium-Level SLA High-Level SLA INVESTMENT 2
Ongoing Challenges for Achieving Operational Continuity Cost Security Availability Performance Security Threats Continuity Compliance Complexity Web Server Application Database Server Storage 3
List of IT s That Create Outages is Growing Business Other s Market risk Credit risk Interest rate risk Currency risk Operational s Non IT s Business process People and talent Environment Physical infrastructure IT s Security Availability Performance Scalability Recoverability Compliance Computer crimes Internal breaches Cyber terrorism Configuration changes Lack of redundancy in architectures Human errors Distributed architectures Peak Demand Heterogeneity in the IT landscape Business growth Provisioning bottlenecks Silo-ed architectures Hardware and/or software failures External threats such as security Natural disasters Government regulations Corporate governance guidelines Internal policy 4
IT s For A Government Tax Collection Organization Security Availability Performance Scalability Recoverability Compliance Identity Theft Inability to Process Transactions Form Entry Bottleneck Inability to Handle Demand Non Reconciliation of Accounts Procedural Compliance Unauthorized access to or compromise of citizen data stored on the network System or network failure interrupts the ability process transactions Citizens can t transmit their returns or check refund status during peak season because of access bottlenecks in the infrastructure Systems unable to handle unforecasted growth in electronic submissions Data center disaster results in transaction loss Loss of data results in incomplete reconciliation of accounts Inability to audit who accessed what and validate that internal procedures and external guidance has been followed Must address all to achieve operational continuity 5
Case Development Get the problem statement right: recovery objectives Start with the most severe threat you organization faces: Natural Disaster Intentional Acts By Third Parties Have neutral facilitator work with operations staff to determine objectives Work to determine recovery objectives for agency operation, not the technology Have senior executive approve objectives Get the capabilities right: account for delays 6
Case Development Continued Layout Objectives Government organizations must be able to execute mission critical functions at all times and under all conditions. Establish Capabilities Given today s resources we can.. Develop Alternative Courses of Action We can continue mission critical applications by splitting them into multiple locations. Align Service Level Agreements (SLA) With Appropriate Organizations 7
Operational Vigilance Key Steps Update objectives at least once a year using same business approach methodology Update capabilities report after significant technology changes, each test and each real incident Present an update on gap between business requirements to prevent risk and loss and current capabilities and provide solution options Maintain consistent methodology and consistent reporting Document, document, document 8
After Action Reporting Tips When recovery goes BETTER than expected AS expected LESS than expected Report it! Be the hero! Report it! Call attention to how well you understand meeting business requirements with technology investment, planning and staff capabilities Report it! Show real-world results & how investment should be made to improve recovery times 9
Recovery Objectives Methodology Challenges Lack of common definitions IT staff trying to facilitate a business decision Absence of education on the balance between process and technology solutions Lack of understanding that disasters are supposed to cost money and be uncomfortable and incur some loss 10
Capabilities Assessment Methodology Issues Not accounting for the time it takes to: Identify a potential problem Make a go/no go decision to relocate Absence of critical staff Time it takes to deploy staff and assets Technology failures 25% of all media typically bad at time of incident; etc. 11
Business Case Development Pitfalls to Avoid Objectives developed with: Limited or no involvement from agency operations staff No involvement from agency executives Inconsistent definitions What do you want approach vs. what you need to prevent X loss? Capabilities: Reported as too ambitious Not realistic Presented: In technology terms instead of business terms As availability you get for $$ spent instead of reduction of bankruptcy risk for $$ invested Requested capital vs. delivering strategy options 12
Symantec Continuity of Operations Solutions Overview 13
Continuity of Operations Solutions from Symantec Continuity of Operations Prevention: Protect Against and Prevent Data Loss and Downtime Avoid outages via proactively monitoring threats and patch management policies Remediation: Fix the Problem Identify systems to patch, points of attack, application failures, and data loss Recovery: Reach RTOs/RPOs Restore data, application services to meet business recovery time objectives (RTO), recovery point objectives (RPO) 14
Continuity of Operations from Symantec Spans from Prevention to Remediation to Recovery Business Continuity Prevention Remediation Recovery Vulnerability Identified and/or Infrastructure Instrumentation & Early Warnings Sent Vulnerability Proactively Blocked, Application Failed Over Availability of Application, Systems, and Data Assured Prevention Internet Reports on attacks and outages; updates to policies and SLAs; archiving for audit Remediation Recovery Identification of Systems to Patch, Points of Attack, Application Failures, Data Loss Patches & Updates Implemented Across Infrastructure; Applications Recovered; Data Restored 15
Continuity of Operations from Symantec Spans from Prevention to Remediation to Recovery Symantec DeepSight Threat Mgmt System and Alert Services Symantec Managed Security Services VERITAS Business Continuity Management Service Performance Management/i3 Suite Symantec Client Security Symantec Gateway Security Symantec Network Security VERITAS NetBackup (or VERITAS Backup Exec) VERITAS Storage Foundation VERITAS Volume Replicator VERITAS Cluster Server Symantec LiveState Recovery Prevention Internet Reports on attacks and outages; updates to policies and SLAs; archiving for audit Remediation Recovery Symantec ESM Symantec Incident Manager RTO/RPO steps Symantec LiveState Client Management Suite VERITAS OpForce - Veritas Provisioning Manager 16
Continuity of Operations Solution Capabilities Challenges Protect against and prevent data loss and downtime Fix the Problem Reach RTO/RPOs Symantec Solution Characterize threats, deploy policies for shielding, patch management, deploy mitigation efforts Conduct root-cause analysis; isolate application, systems, data problems; identify points of attack, patches Invest in just-enough business continuity, monitor continuously, tune and test 17
Choose the Correct Configuration Align Continuity of Operations objectives with business and risk management requirements If not, your solutions can cost more than they should Present your case in risk management terms Secure needed funding, protect mission critical applications, and reset unreasonable SLAs Compliance guidance can be met Avoid the fear factor RTO/RPO Realtime 2 24 hour 24+ hours 18
Conclusions & Recommendations 19
Issues with continuity of operations Misaligned recovery objectives Budgets don t align with SLA s Compliance is costly Recommendation: Comprehensive Planning Match objectives with requirements Negotiate SLAs first Build recoverable environments Document for compliance 20
Issues with continuity of operations Unclear recovery capabilities App & network dependencies Unclear of recovery definition Recommendation: Generate SLA on recover configurations Document and test all applications and connectivity requirements SLA to business users on restoring business processes 21
Do s and don ts in the real world Structure tests to pass Make assumptions as to what is available Rely on just data availability Recommendation: Push tests to failure Test in real life environment Understand agency process and include all resources 22
Do s and don ts in the real world Single Points of Failure Cross-train staff RTO for agency functions Recommendation: Work through all dependencies Train staff at recovery site Include agency functions and not just technology 23
Why Symantec Has the Best Solutions From leading vendor, the ability to: Prevent, remediate and recover from security risks and downtime of applications and data Span a heterogeneous environment from client to storage/systems Easily tailor solution to availability and/or uptime commitments No compromising on product quality 24
Thank you