Thanks for Live Snapshots, Where's Live Merge?

Similar documents
Backing Chain Management in libvirt and qemu. Eric Blake KVM Forum, August 2015

Facilitating Incremental Backup. Eric Blake KVM Forum, 26 October 2018

Managed Conversion of Guests to ovirt. Arik Hadas Senior Software Engineer Red Hat 21/8/15

Live Block Device Operations in QEMU. Kashyap Chamarthy FOSDEM 2018 Brussels

Live block device operations in QEMU

Backup and Restore System

Virtualization Management the ovirt way

Facilitating Incremental Backup. Eric Blake KVM Forum, 26 October 2018

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E)

Integrating ovirt, Foreman And Katello To Empower Your Data-Center Utilization

A Resource Management Mechanism and Its Implementation for Virtual Machines

REDCENTRIC VSPHERE AGENT VERSION

Live Block Operations: Snapshots, Merging, and Mirroring

Migration. 22 AUG 2017 VMware Validated Design 4.1 VMware Validated Design for Software-Defined Data Center 4.1

Managing the New Block Layer. Kevin Wolf Max Reitz KVM Forum 2017

Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

Advanced Memory Management

How to Handle Globally Distributed QCOW2 Chains? Eyal Moscovici & Amit Abir Oracle-Ravello

Vmware Workstation Delete Snapshot Cleaning Up Deleted Files

KVM 在 OpenStack 中的应用. Dexin(Mark) Wu

Red Hat Virtualization 4.1 Product Guide

HUAWEI OceanStor Enterprise Unified Storage System. HyperReplication Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Status Update About COLO FT

Veeam Backup & Replication v6

Virtualization Management the ovirt way

QEMU Backup. Maxim Nestratov, Virtuozzo Vladimir Sementsov-Ogievskiy, Virtuozzo

ZettaMirror Install Guide

VMware Mirage Getting Started Guide

LIBVIRT ADMIN API A DIFFERENT KIND OF MANAGEMENT FOR LIBVIRT. Erik Skultety - Associate Software Engineer

Bacula Systems Virtual Machine Performance Backup Suite

University Information Technology Data Backup and Recovery Policy

STORWARE.EU. vprotect Data Protection for Open Virtualization Platforms

Chapter 1: API Reference 1

REDCENTRIC VSPHERE AGENT VERSION

#jenkinsconf. Managing jenkins with multiple components project. Jenkins User Conference Israel. Presenter Name Ohad Basan

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

The Google File System

Oracle Database 11g for Experienced 9i Database Administrators

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

SharePoint 2016 Administrator's Survival Camp

DNE2 High Level Design

CS5460: Operating Systems Lecture 20: File System Reliability

Libvirt: a virtualization API and beyond

Backup and Restore Operations

CPU models after Spectre & Meltdown. Paolo Bonzini Red Hat, Inc. KVM Forum 2018

Red Hat HyperConverged Infrastructure. RHUG Q Marc Skinner Principal Solutions Architect 8/23/2017

BookKeeper overview. Table of contents

The Salesforce Migration Playbook

Ovirt Storage Overview

Securing the Data Center against

ECE Engineering Robust Server Software. Spring 2018

Colligo Briefcase for Mac. Release Notes

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS

FAULT TOLERANT SYSTEMS

ovirt-optimizer deep-dive Probabilistic load balancing engine

Deploy instance from VM snapshot

Avaya Aura System Manager 5.2 HA and CLI Restore

Breaking Barriers Exploding with Possibilities

INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION

RUNNING VIRTUAL MACHINES ON KUBERNETES. Roman Mohr & Fabian Deutsch, Red Hat, KVM Forum, 2017

Rethinking Machine Types

Manual Backup Hyper V Server 2008 R2 Host

Hyper-V Rapid Recovery - Recovery in Place

OL Connect Backup licenses

Pure Storage FlashArray OpenStack Cinder Volume Driver Setup Guide

EMC RecoverPoint. EMC RecoverPoint Support

Storage Integration with Host-based Write-back Caching

BusinessObjects XI Release 2

ovirt Node June 9, 2012 Mike Burns ovirt Node 1

BrightStor ARCserve Backup for Windows

System Malfunctions. Implementing Atomicity and Durability. Failures: Crash. Failures: Abort. Log. Failures: Media

CS3600 SYSTEMS AND NETWORKS

High Availability with No Split Brains! Arik Hadas Principal Software Engineer Red Hat 27/01/2018

Replication is the process of creating an

Shared snapshots. 1 Abstract. 2 Introduction. Mikulas Patocka Red Hat Czech, s.r.o. Purkynova , Brno Czech Republic

Tasktop Sync - Cheat Sheet

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version :

Veeam Endpoint Backup

VMware Mirage Getting Started Guide

Virtualization Device Emulator Testing Technology. Speaker: Qinghao Tang Title 360 Marvel Team Leader

Intro to ovirt. Itamar Heim Virtualization Management the ovirt way

Data Protection Guide

VMware Mirage Web Management Guide. VMware Mirage 5.9.1

REC (Remote Equivalent Copy) ETERNUS DX Advanced Copy Functions

Managing Group Policy application and infrastructure

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

McAfee Enterprise Security Manager 11.1.x Release Notes

AVAYA Avaya Aura System Platform R6.2.2 Release Notes Issue 1.3

INSTALL VERITAS BACKUP EXEC v ON WINDOWS SERVER 2019 DOMAIN CONTROLLER

! AntZero LLC - All rights reserved! AntZero and AtomicView are registered trademarks USER GUIDE.!!!!!!!!!! Swiss Made

File systems CS 241. May 2, University of Illinois

Web Self Service Administrator Guide. Version 1.1.2

VINCHIN BACKUP & RECOVERY V4.0

Ten Essential Skills for the Next Generation of Hyper-V Brien Posey

Microsoft SQL Server

New England Data Camp v2.0 It is all about the data! Caregroup Healthcare System. Ayad Shammout Lead Technical DBA

Maintaining a Microsoft SQL Server 2005 Database Course 2780: Three days; Instructor-Led

The following issues and enhancements have been addressed in this release:

Mapping Vmware datacenter Cloudstack zone test execution

Transcription:

Thanks for Live Snapshots, Where's Live Merge? KVM Forum 16 October 2014 Adam Litke Red Hat Adam Litke, Thanks for Live Snapshots, Where's Live Merge 1

Agenda Introduction of Live Snapshots and Live Merge Managing Live Merge for Reliability and Simplicity Future work Adam Litke, Thanks for Live Snapshots, Where's Live Merge 2

Introduction of Live Snapshots and Live Merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 3

Live snapshots Capture disks and memory at a point in time Implemented using qcow2 volume chains Uses Preview or revert VM live backup Live storage migration Adam Litke, Thanks for Live Snapshots, Where's Live Merge 4

Creating a live snapshot of a disk Adam Litke, Thanks for Live Snapshots, Where's Live Merge 6

Creating a live snapshot of a disk Adam Litke, Thanks for Live Snapshots, Where's Live Merge 7

Creating a live snapshot of a disk Adam Litke, Thanks for Live Snapshots, Where's Live Merge 8

Creating a live snapshot of a disk Adam Litke, Thanks for Live Snapshots, Where's Live Merge 9

Deleting a snapshot Why? Increase performance May free up some storage space To support symmetric snapshot operations Historically, ovirt has only supported deleting snapshots when a VM is powered off Live snapshot deletion is called live merge because two or more adjacent volumes are merged together Scenarios Merge direction: forward vs. Backward Merge source: active layer vs. Internal Adam Litke, Thanks for Live Snapshots, Where's Live Merge 10

Backward internal merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 12

Backward internal merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 13

Backward internal merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 14

Backward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 16

Backward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 17

Backward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 18

Backward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 19

Backward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 20

Forward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 22

Forward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 23

Forward active merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 24

Choosing a merge strategy Forward merge libvirt virdomainblockrebase API Must merge to the active layer Backward merge libvirt virdomainblockcommit API May merge any volume into its parent Backward merge is the only feasible choice today but we'd like to have both directions in the future Adam Litke, Thanks for Live Snapshots, Where's Live Merge 25

Libvirt APIs int virdomainblockcommit(virdomainptr dom, const char * disk, const char * base, const char * top, unsigned long bandwidth, unsigned int flags) Starts a live merge operation Merges top into base You can rate-limit the copy operation if desired Flags allow you to set special behavior Request an active layer merge Preserve relative backingstore pointers Adam Litke, Thanks for Live Snapshots, Where's Live Merge 26

Libvirt APIs int virdomaingetblockjobinfo(virdomainptr dom, const char * disk, virdomainblockjobinfoptr info, unsigned int flags) Get information about currently running block jobs Job type, bandwidth limit, progress cursor When jobs finish an event is emitted and they will no longer be reported by this API Adam Litke, Thanks for Live Snapshots, Where's Live Merge 27

Libvirt APIs int virdomainblockjobabort(virdomainptr dom, const char * disk, unsigned int flags) Cancel the currently running block job on a disk Also used to request a pivot after active layer merge Adam Litke, Thanks for Live Snapshots, Where's Live Merge 28

Managing Live Merge for Reliability and Simplicity Adam Litke, Thanks for Live Snapshots, Where's Live Merge 29

ovirt Architecture Adam Litke, Thanks for Live Snapshots, Where's Live Merge 30

Entry points Adam Litke, Thanks for Live Snapshots, Where's Live Merge 31

Entry points Adam Litke, Thanks for Live Snapshots, Where's Live Merge 32

Management value-add Hide complexity behind a simple UI and API Snapshots involving multiple disks and memory Prevent invalid or potentially dangerous operations Revert to a partially merged snapshot Migration during live merge Merge into a shared volume Fault tolerance and recovery Adam Litke, Thanks for Live Snapshots, Where's Live Merge 33

High level flow Adam Litke, Thanks for Live Snapshots, Where's Live Merge 34

1. Begin merge job Ask vdsm to start a live merge for a single vm disk Assume merge started unless explicit error is returned Vdsm stores some job info to local storage Job UUID, top and base volumes, etc Allows vdsm to gracefully recover state when restarting Can be repeated for each disk in the snapshot Adam Litke, Thanks for Live Snapshots, Where's Live Merge 35

2. Wait for merge job Engine periodically polls vdsm to get active jobs Just wait for the job to disappear from the results Meanwhile, vdsm polls libvirt for active jobs When libvirt stops reporting a job vdsm must clean up After cleanup, vdsm stops reporting the job to engine This is a synchronization point with the following rules Vdsm must report job until it's completely resolved Even if it restarts If engine does not receive any job info it must try again When the job stops being reported advance to next step Adam Litke, Thanks for Live Snapshots, Where's Live Merge 36

2. Wait for merge job (cleanup) Step 1: Manual pivot (active layer merge only) Indicator: libvirt job info cursor is at 100% Vdsm asks libvirt to pivot to the new active volume Step 2: Volume chain synchronization (all merges) Indicator: libvirt job no longer reported Vdsm gets new volume chain from libvirt XML Vdsm syncs volume chain metadata on storage Vdsm syncs in-memory volume chain info for VM Step 3: Stop reporting live merge job to engine Adam Litke, Thanks for Live Snapshots, Where's Live Merge 37

3. Resolve merge status Engine requests live VM configuration from vdsm Vdsm has updated volume chain after merge finished Merge succeeded if 'top' volume is no longer in chain If VM is not running (recovery flow) Are we sure the VM is not running (fence)? Use SPM host to walk volume chain with qemu img Adam Litke, Thanks for Live Snapshots, Where's Live Merge 38

4. Delete merged volume Merge was successful and 'top' should be deleted Use SPM host to remove volume Sanity checks prevent: Deleting a volume which still has children Starting a VM using an old/stale leaf Adam Litke, Thanks for Live Snapshots, Where's Live Merge 39

Polling vs. event handling Events would be passed from qemu -> libvirt -> vdsm -> ovirt-engine If any component goes down we might miss an event To ensure recoverability, it must be possible to query the current state by asking two questions of the system Is the job active? If not, did it succeed or fail? Event handling could be added in the future as an optimization to eliminate polling delays for the common case. Adam Litke, Thanks for Live Snapshots, Where's Live Merge 40

Scenario: qemu crash Is merge running? No. Use SPM host to run a recovery command Run qemu img to get current volume chain Discard mirrored leaf if present Correct vdsm metadata and return new chain Mirrored leafs are flagged in vdsm metadata before a pivot is requested Adam Litke, Thanks for Live Snapshots, Where's Live Merge 41

Scenario: manager and vdsm restart VDSM restart Manager has a lapse in merge job info during restart and will continue to poll until reporting resumes Upon restart vdsm recovers the list of tracked jobs and handling will resume the next time libvirt is polled Manager restart Live merge is a sequence of individual commands Current progress is saved to the DB and the command state will be restored upon restart Adam Litke, Thanks for Live Snapshots, Where's Live Merge 42

Scenario: manager / virt host destroyed Manager In this case the DB is lost and it will be necessary to rebuild the ovirt environment Data domain import allows you to recover VMs and templates from previous install We must check all Vms on import as if qemu crashed Virtualization host Admin must confirm that the host has been rebooted Use qemu crash logic to synchronize and then the VM can be restarted on a different host Adam Litke, Thanks for Live Snapshots, Where's Live Merge 43

Future Work Adam Litke, Thanks for Live Snapshots, Where's Live Merge 44

Future work Automatic removal of temporary snapshots after live storage migration and VM backup Data domain import hook to sync vm volume chains Use forward merge if it would reduce I/O Improve responsiveness by using libvirt events and emitting vdsm events Delete multiple adjacent snapshots in one operation Rolling snapshots Adam Litke, Thanks for Live Snapshots, Where's Live Merge 45

THANK YOU! http://www.ovirt.org/features/live_merge devel@ovirt.org #ovirt irc.oftc.net Adam Litke, Thanks for Live Snapshots, Where's Live Merge 46