HowTo DR. Josh Berkus PostgreSQL Experts SCALE 2014

Similar documents
HowTo DR. Josh Berkus PostgreSQL Experts pgcon 2014

Virtual protection gets real

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

What to Look for in a DRaaS Solution

Red Hat Enterprise Virtualization (RHEV) Backups by SEP

HIGH-AVAILABILITY & D/R OPTIONS FOR MICROSOFT SQL SERVER

Asigra Cloud Backup Provides Comprehensive Virtual Machine Data Protection Including Replication

PostgreSQL migration from AWS RDS to EC2

Aurora, RDS, or On-Prem, Which is right for you

Disaster Recovery (DR) Planning with the Cloud Desktop

Annual Public Safety PSAP Survey results

Kroll Ontrack VMware Forum. Survey and Report

Buyer s Guide: DRaaS features and functionality

Which technology to choose in AWS?

SERVICE CATALOG. Find more information here RDX.com /

How to Protect SAP HANA Applications with the Data Protection Suite

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

ArcGIS Enterprise: Configuring Backups, Disaster Recovery, and Replication. Harrold Sompotan and Patrick Jackson

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

Virtual Disaster Recovery

Real-time Protection for Microsoft Hyper-V

PGConf.Russia 2019, Moscow. Alexander Kukushkin

Business Continuity and Disaster Recovery Disaster-Proof Your Business

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency.

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

The Best Storage for Virtualized Environments

Recovery at a Click - where to be in 18 months

Complete Data Protection & Disaster Recovery Solution

Microsoft E xchange 2010 on VMware

BC/DR Strategy with VMware

Introducing VMware Validated Designs for Software-Defined Data Center

SQL Server Virtualization 201

Veritas Backup Exec. Powerful, flexible and reliable data protection designed for cloud-ready organizations. Key Features and Benefits OVERVIEW

SAP HANA. HA and DR Guide. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Virtualization And High Availability. Howard Chow Microsoft MVP

Advanced Architectures for Oracle Database on Amazon EC2

The OnApp Cloud Platform

Oracle Autonomous Database

MySQL Backup Best Practices and Case Study:.IE Continuous Restore Process

Backup and Restore System

WHITE PAPER. Header Title. Side Bar Copy. Header Title 5 Reasons to Consider Disaster Recovery as a Service for IBM i WHITEPAPER

AWS_SOA-C00 Exam. Volume: 758 Questions

Consolidated Disaster Recovery. Paul Kangro Applied Technology Strategiest

Disaster Recovery and Mitigation: Is your business prepared when disaster hits?

The Key to Disaster Recovery

Extending the BOSH Backup and Restore Framework. Therese Stowell, Product Manager Chunyi Lyu, Engineer Platform Recovery Team, Pivotal

EMC Integrated Infrastructure for VMware. Business Continuity

Don t Panic Website disaster planning for the rest of us

How Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud. Santa Clara, California April 23th 25th, 2018

Microsoft SQL Server

Arcserve Unified Data Protection Virtualization Solution Brief

20745B: Implementing a Software- Defined DataCenter Using System Center Virtual Machine Manager

Silicon House. Phone: / / / Enquiry: Visit:

High Availability and Disaster Recovery Solutions for Perforce

SteelEye Solutions Extend Citrix XenServer. Bob Williamson

Introduction to OpenStack Trove

White Paper. How to select a cloud disaster recovery method that meets your requirements.

Simplifying HDS Thin Image (HTI) Operations

IBM Compose Managed Platform for Multiple Open Source Databases

VMware - VMware vsphere: Install, Configure, Manage [V6.7]

SQL Server Consolidation with Server Virtualization on NetApp Storage

Amazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:

VMware vsphere 5.5 Advanced Administration

Architecting DR Solutions with VMware Site Recovery Manager

Maximizing Availability With Hyper-Converged Infrastructure

Transform Availability

SRM 8.1 Technical Overview First Published On: Last Updated On:

Dell Storage Compellent Integration Tools for VMware

SRM 6.5 Technical Overview February 26, 2018

Přehled novinek v Hyper-V 2016 Kamil Roman

Advanced Architecture Design for Cloud-Based Disaster Recovery WHITE PAPER

Introducing VMware Validated Designs for Software-Defined Data Center

VMware HA: Overview & Technical Best Practices

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

StorageCraft OneXafe and Veeam 9.5

Data Management at Cloud Scale CommVault Simpana v10. VMware Partner Exchange Session SPO2308 February 2013

Introducing VMware Validated Designs for Software-Defined Data Center

Conference Oracle Database Appliance Virtualized Implementation with HA and DR for Banner Database and Application Servers.

Disaster Recovery-to-the- Cloud Best Practices

Best practices for protecting Virtualization, SDDC, Cloud, and the Modern Data Center, with NetBackup

Introduction to Database Services

Resilient & Ready. May 21 23, 2018

Installing Acronis Backup Advanced Edition

Simplify Backups. Dell PowerVault DL2000 Family

4. Are you only looking for a D/R as a service target for the VMware and IBM systems? Yes

Postgres in Amazon RDS. Denish Patel Lead Database Architect

Virtualizing Oracle on VMware

PostgreSQL on Solaris. PGCon Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems

VPLEX & RECOVERPOINT CONTINUOUS DATA PROTECTION AND AVAILABILITY FOR YOUR MOST CRITICAL DATA IDAN KENTOR

Architecting a Highly Available Infrastructure: An Overview SLN187. Mark Milow Mike DiPetrillo

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

Enhanced Protection and Manageability of Virtual Servers Scalable Options for VMware Server and ESX Server

Deployment. Chris Wilson, AfNOG / 26

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD

SRM University, Kattankulathur Department of Information Technology IT1019 Information Storage and Management Model Examination

Modernize Your Backup and DR Using Actifio in AWS

MySQL and Virtualization Guide

P a g e 1. Teknologisk Institut. Online kursus k SysAdmin & DevOps Collection

MyCloud Computing Business computing in the cloud, ready to go in minutes

Transcription:

HowTo DR Josh Berkus PostgreSQL Experts SCALE 2014

Disaster Recovery The process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster. Wikipedia, February 2014

Disaster Recovery Restoring services after the unexpected.

Disaster Recovery Limiting: 1. Downtime 2. Data Loss

Do you have a DR Plan?

Is it fairly complete?

Have you tested it?

Dilbert is a copyright of Scott Adams. Used here as parody. All rights reserved to Scott Adams.

Threat Model

server failure getting hacked natural disaster

server failure storage failure network failure getting traffic hacked natural spike admin error OS / VM problem disaster bad update software bugs

server failure getting hacked natural storage failure traffic spike admin error network failure OS / VM problem disaster bad update software bugs

Accepting Loss

The Nines Nines Down/Year 99.9% 9 hours 99.99% 1 hour 99.999% 5 minutes

The Nines Treats all downtime causes as identical except the ones it ignores Doesn't address data loss Really Business Continuity also unrealistic

Disaster Downtime Data Loss Detect Server Failure Network Failure Admin Error Bad Update Storage Failure Getting Hacked Natural Disaster

Disaster Downtime Data Loss Detect Server Failure 0 0 Network Failure 0 0 Admin Error 0 0 10 yrs Bad Update 0 0 Storage Failure 0 0 Getting Hacked 0 0 10 yrs Natural Disaster 0 0

$$$$$$$$$$$$$$$$$$$$$$$

Disaster Downtime Data Loss Detect Server Failure 5min 1min Network Failure 3hrs 10min Admin Error 1hr 1hr 3 mo Bad Update 1hr 1hr Storage Failure 5min 30min Getting Hacked 1hr 1hr 3 mo Natural Disaster 6hrs 1hr

Your DR Plan

Elements of a Plan 1. Backups/Replicas 2. Replacements 3. Procedures 4. People

Backups pg_dump mysqldump rsync zfs snapshot SAN exportable snapshot

Backups++ Periodic Portable Simple Recover point-in-time

Backups-- Slow to restore Data loss interval

Backups Good for: natural disaster admin error, bad update software bugs getting hacked Bad for everything else

Replication Postgres binary replication MySQL replication Redundant HBase nodes Redis clustering DRBD GlusterFS etc.

Replication++ Continuous Fast failover Low data loss

Replication-- Extra hardware Complex High-maintenance Can hurt performance Can replicate failures

Replication Good For: server, storage, network failure Bad For: admin error, getting hacked software bugs

Continuous Backup Also PITR Continous like replication Partial recovery like backups

where you gonna restore those backups to?

Replacing Services servers network storage OS image software reversion

Procedures

Written Procedures

3AM is not the time to improvise

Procedures for each recovery step for deciding what steps

Database Server Does Not Respond 1. Determine if physical server is down a. if network is down, use plan N1. 2. If not, try to restart database using command 3. Still down? Fail over to replica using command 4. Check replica. 5. Not working? Restore backup to test server 1 using command...

Good: detailed written procedures Better: written procedures with pastable commands Best: tested single-command scripts

Fallback Procedures Sometimes recovery fails Have fallback procedures If the fallback fails time for a meeting!

Never ever ever ever improvise.

People Who You Gonna Call?

Know who to call on call staff experts in each service consultants/contractors vendors required authorizations

Contact Book Include as much contact information as possible Put copies in more than one place including paper! Keep it up to date

Test Your DR Good: when you create the procedure Better: quarterly Best: as part of daily/weekly provisioning

An untested backup is one which doesn't work.

DR in the Cloud

It's a cloud, right? That means it's redundant, right?

not necessarily for your servers unless you pay for it!

Some new problems Instance failure Resource overcommit Zone failures Admin error at scale

Some new solutions Redundant services RDS, VIP, S3 Rapid server deployment Cheap replicas

otherwise pretty much the same.

backup locations shared instance storage (EBS) fast failover for instance fail long-term storage API (S3) redundant large

Use your rapid deploy! Continuous backup to S3 Deploy scripts + server images Chef/Salt/Puppet/etc. helps here = fast recovery with low running costs

DR Tips Have multiple copies of your plan in multiple locations A SAN is not a DR solution One form of backup is seldom enough

Questions? Josh Berkus www.databasesoup.com www.pgexperts.com Coming up: NYC pgday April 3-4 pgcon May 21-24 Copyright 2013 PostgreSQL Experts Inc. Released under the Creative Commons Share-Alike 3.0 License. All images, logos and trademarks are the property of their respective owners and are used under principles of fair use unless otherwise noted.