A Practical Guide to Avoiding Disasters in Mission-Critical Facilities. What is a Disaster? Associated Business Issues.

Similar documents
Introduction to Business continuity Planning

Table of Contents. Sample

Railroad Infrastructure Security

Appendix 3 Disaster Recovery Plan

Business Continuity Management Program Overview

Business Continuity Management Standards A Side-by-Side Comparison

ASSURING BUSINESS CONTINUITY THROUGH CONTROLLED DATA CENTER

Now I can sleep at night

An SMB s Guide to Business Continuity and Disaster Recovery

3.4 DISASTER RECOVERY (L , M.3.9, comp_req_id 806)

Emergence of Business Continuity to Ensure Business and IT Operations. Solutions to successfully meet the requirements of business continuity.

Disaster Recovery and Mitigation: Is your business prepared when disaster hits?

The Importance of Data Protection

TVA Response to the Fukushima Event and Lessons Learned from Recent Natural Disasters

Introduction to Business Continuity Management

BUSINESS CONTINUITY MANAGEMENT PROGRAM OVERVIEW

Business Continuity: How to Keep City Departments in Business after a Disaster

RFP Questions Guideline For Data Center Buyers

A Survival Guide to Continuity of Operations. David B. Little Senior Principal Product Specialist

Retro-Commissioning of Data Centers

Keys to a more secure data environment

Template. IT Disaster Recovery Planning: A Template

Disaster Recovery Planning: Is Your Plan in Place? Presented by: Steve Shofner, CISA, CGEIT

Cloud Disaster Recovery: Public, Private or Hybrid Cloud Solutions Supporting Disaster Recovery

Business continuity management and cyber resiliency

: Course CDFOM : Certified Data Centre Facilities Operations Manager

Florida State University

Hazard Management Cayman Islands

The Project Charter. Date of Issue Author Description. Revision Number. Version 0.9 October 27 th, 2014 Moe Yousof Initial Draft

Disaster Recovery Self-Audit

Dude Solutions Business Continuity Overview

DISTRICT OF COLUMBIA WATER AND SEWER AUTHORITY ATTACHMENT A A-1: BACKGROUND AND CONTRACTOR QUALIFICATIONS A-2: SCOPE OF WORK

Data Center Operations Guide

Disclaimer Ground Rules

POWERING NETWORK RESILIENCY WITH UPS LIFECYCLE MANAGEMENT

The Office of Infrastructure Protection

Certified Information Systems Auditor (CISA)

3.3 Understanding Disk Fault Tolerance Windows May 15th, 2007

IBM Global Technology Services Provide around-the-clock expertise and protect against Internet threats.

NHS Gloucestershire Clinical Commissioning Group. Business Continuity Strategy

Emergency Management & Disaster Planning

Internet Area Network Inc.

Cisco Director Class SAN Planning and Design Service

SE Engineering, PC strives to be a leader in the power system engineering field by providing our customers with the highest level of quality,

Backup, Disaster Recovery: Defining & Managing Your Risk. Dave Kinsey - 5/9/17

6/21/2013. The Business Risk of NOT Considering a Cloud/Managed IT Services Strategy. Christian Brothers Information & Technology Services

What is Penetration Testing?

Continuity of Business

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security

All-Hazards Approach to Water Sector Security & Preparedness ANSI-HSSP Arlington, VA November 9, 2011

SERVICE DESCRIPTION MANAGED BACKUP & RECOVERY

Symantec Business Continuity Solutions for Operational Risk Management

Security+ Guide to Network Security Fundamentals, Third Edition. Chapter 13 Business Continuity

Audit & Advisory Services. IT Disaster Recovery Audit 2015 Report Date January 28, 2015

Business Continuity Planning

STRATEGY ATIONAL. National Strategy. for Critical Infrastructure. Government

REPORT 2015/149 INTERNAL AUDIT DIVISION

Understanding Cyber Insurance & Regulatory Drivers for Business Continuity

Building the Business Case for Emergency Notification

Symantec Security Monitoring Services

IPMA State of Washington. Disaster Recovery in. State and Local. Governments

CANVAS DISASTER RECOVERY PLAN AND PROCEDURES

TARGET2-SECURITIES INFORMATION SECURITY REQUIREMENTS

Business Continuity An Integral Part of Risk Management At Constellation Energy

Power System Vulnerabilities : AHCA 2016 Copyright (c) 2016 SSR, Inc. All rights reserved

Trust Services Principles and Criteria

Framework for Improving Critical Infrastructure Cybersecurity

Ensuring Business Resilience Jim Neumann, Vice President of Marketing, Power Analytics Corp.

Corporate Security & Emergency Management Summary of Submitted 2015 Budget From Rates

Protect Your End-of-Life Windows Server 2003 Operating System

Introduction. Read on and learn some facts about backup and recovery that could protect your small business.

Disaster recovery planning for health care data and HIPAA compliance regulations

Emergency Management BC Update

Choosing the Right Security Assessment

Why the Threat of Downtime Should Be Keeping You Up at Night

Schneider Electric Critical Power & Cooling Services. Services to keep your mission-critical applications operating at optimal performance

Policy Document. PomSec-AllSitesBinder\Policy Docs, CompanyWide\Policy

INFORMATION SECURITY- DISASTER RECOVERY

Technical Vulnerability and Patch Management Policy Document Number: OIL-IS-POL-TVPM

EQUINIX BUSINESS CONTINUITY ADVANCED SERVICES KEEP YOUR BUSINESS UP AND RUNNING

ITIL overview Service Delivery. Jaroslav Procházka

An ICS Whitepaper Choosing the Right Security Assessment

SERVICE DESCRIPTION MANAGED FIREWALL/VPN

Metropolitan Washington Airports Authority PROCUREMENT AND CONTRACTS DEPT. AMENDMENT OF SOLICITATION

Principles for BCM requirements for the Dutch financial sector and its providers.

Emergencies: Protecting Staff & Assets. Presented By: Tom Heebner, CSP, ARM, ABCP AVP / Risk Consultant HUB International Limited

SAS 70 Audit Concepts. and Benefits JAYACHANDRAN.B,CISA,CISM. August 2010

Information Security in Corporation

Port Facility Cyber Security

Emergency Management Success Requires Local Leadership

MultiPlan Selects CyrusOne for Exceptional Colocation and Flexible Solutions

ISO/IEC INTERNATIONAL STANDARD

The J100 RAMCAP Method

The Problem. Business Continuity/ Disaster Recovery. Course Outline and Structure. The Problem The Coverage. Sean Gunasekera

Information Technology Security Plan Policies, Controls, and Procedures Identify Governance ID.GV

Protect Your End-of-Life Windows Server 2003 Operating System

Symantec Data Center Migration Service

TSA/FTA Security and Emergency Management Action Items for Transit Agencies

CONSIDERATIONS BEFORE MOVING TO THE CLOUD

IEEE 2014 T&D Conference Paper 14TD Storm & Flood Hardening of Electrical Substations

Transcription:

A Practical Guide to Avoiding Disasters in Mission-Critical Facilities Todd Bermont What is a Disaster? An event that can unexpectedly impact the continuity of your business Anything that injures or has the potential to injure your employees, data, the environment, or your facility itself Accidents, HAZMAT spills, fires, floods, tornadoes, hurricanes, terrorism, earthquake, utility outages, human-error, equipment failures, and virtually any other event that may injure people, data, the environment, or property Associated Business Issues Significant Costs Associated with Downtime META/Gartner Group: $330,000/Hour Strategic Research Group For a Brokerage Firm: >$6.5 Million/Minute Continued Pressure on the Infrastructure Fragile Power Grid, Nature, Terrorist Threats, Hackers, Viruses Major Changes to Equipment in Data Centers Blade Servers = High Performance, High Density, Heat Generating Equipment Now Actual $$$ LIABILITY! Government Regulations (HIPAA, Sarbanes-Oxley, SAS 70, SEC & FDA) Service Level Agreements that Mandate Uptime The Bottom Line is: Downtime is Unacceptable! 1

Risk Exposure in the Data Center 86% of Data Center Downtime is Due to Infrastructure Failure & Human Error! - Gartner 60% of all declared disasters due to power or hardware failure! DR Firm Infrastructure Yet until recently, most firms have spent a disproportionate amount of their IT budget on disaster recovery instead of disaster avoidance Failure 71% Human Error 15% Environmental Factors 14% Infrastructure Risk Over 95% of all infrastructure failures occur between the UPS and the load! Uptime Institute.DR. (Disaster Recovery Plan).IT (Equipment). The Key is To Focus on Mitigating RISK in this Gap!. UPS, Generators, HVAC. Four Categories of Failures Leading to Disaster Design Failures Catastrophic Failures Compounding Failures Human-error Failures 2

Preventing Design Failures Develop a comprehensive design intent Select appropriate design firms that have experience in your specific application Be an active member of the design process Review, check and recheck - - Consider using a peer review de Havilland Comet 1A Preventing Catastrophic Failures Comprehensive maintenance program Predictive analysis Implement a Lessons Learned program If it can break it will Plan for it! Solar storm causes transformer failure, dropping the Quebec power grid, and causing power problems throughout the U.S. Preventing Compounding Failures Sweat the small stuff! Test mission-critical infrastructure as an integrated system Proactively maintain your equipment It was just a small leak 3

Preventing Human-Error Failures Training Use switch level detailed Method Operating Procedures (MOPs) & verify accuracy Use a pilot / co-pilot approach during switching operations USE THE MOP! Most of us are NOT Einstein Data Center State = Ability to Succeed OR Disaster Avoidance Considerations ID Vulnerabilities Catalog Equipment Quantify Capacity ID Procedural Risks Conduct Annually Physical Assessment Design Redundancy Maintainability Scalability Safety 59% of Companies say New Equipment is Purchased w/o Regard for Power & Cooling*** Integrated Systems Testing Detailed MOPs & SOPs Match Build to Design Intent Testing & Commissioning Maintenance & Monitoring 52% of Companies had Operations Interrupted due to Hardware Failure* A Typical Large Data Center Requires Hundreds of Maintenance Activities** Early Detection = Minimal Disruption * Dulles Conference on Emergency Response Planning ** Lee Technologies Maintained Facilities *** Joint InterUnity Group AFCOM Study April, 2005 4

Physical Assessment Objective Comprehensive Documented Design Design with the End in Mind Outline Your Goals & Objectives Quantify Your Cost of Downtime Match Resiliency with Impact to the Business My HVAC Needs to be on the Generator too? Testing & Commissioning Perform realistic testing, even though it can take time & $ Utilize a systematic process of verifying and documenting the performance of the facility s equipment Use some one impartial who did not design or engineer your facility Vendor Start-up is not Commissioning! 5

Maintenance & Monitoring Conductive preventive maintenance (PM) as recommended by vendor Develop a comprehensive template & perform a daily walk-through Monitor devices most critical and most likely to fail Proactive Disaster Avoidance MOPs & SOPs Warning Signs Predictive Maintenance Escalation & DR Safety Ongoing Training Internal Controls & Safety Detailed MOPS & SOPS Documented Maintenance Tickets Daily Walk-thru Detailed Maintenance Schedule Weekly Prioritization Meetings Safety As Built Drawings Facility One-Line MOPs & SOPs Maintenance Tickets Training Manuals Daily Logs Assessment Reports Accurate Inventory Documentation Proactive Maintenance PM s as Recommended Regular Maintenance of Filters, Fuel, Coolant, etc Just Like Your Car Escalation Procedures for Surprise Issues 7x24x365 monitoring Ongoing Training Understand how your equipment functions Train operators and supervisors Training should include: Modules for equipment Modules for procedures Operations Procedures Maintenance Procedures Disaster Recover/Emergency Response Procedures Safety Procedures Don t lose knowledge at your site. Capture it! 6

Internal Controls & Safety Supervise maintenance activities Have someone present who possesses the proper knowledge and skill set of the equipment being maintained Document revisions when there are changes to the scope of work and procedures Be consistent There is no such thing as an un-safe, reliable data center - - Make sure all safety standards are followed in both operations and maintenance Document, Document, Document: Include record keeping requirements in service contracts Documentation generated by the service contractor provides building operations staff and management with critical information for comparing past and current conditions of equipment and system performance All work should be documented in an organized fashion: Completed Methods of Procedures (MOPs) Defective Items & Corrections Made Parts Used Before and after data Document, Document, Document Continued: Keep documentation in soft copies & back up your data! Document all Preventive Maintenance (PM) activities: Will help locate recurring problems Provide an understanding of when equipment performance is degrading Ensure that the contractor is performing to scope of work Increase total system reliability Document all Lessons Learned 7

Proactive Maintenance Fix it before it breaks Utilize predictive maintenance & monitoring Understand how your system operates and know where the weak points are Use your data and past experiences Correct weak items before they fail Modify procedures & scope of work to address such items Adjust your data gathering and collecting as necessary What next? How can you avoid potential disasters moving forward? Conduct an objective physical assessment of your mission-critical facilities Identify the most critical vulnerabilities in equipment & operations Prioritize most critical issues Develop a plan to address those issues (Training, Operations, Expansion, Maintenance, & Disaster Recovery) Implement your plan Thank You for Attending! For More Information, Please Contact: Todd L. Bermont Email: tbermont@leetechnologies.com Phone: (847) 680-8809 8