Dirty Memory Tracking for Performant Checkpointing Solutions

Similar documents
Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT

Status Update About COLO FT

VM Migration, Containers (Lecture 12, cs262a)

Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

Live Migration with Mdev Device

Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service)

RDMA Migration and RDMA Fault Tolerance for QEMU

Research on Availability of Virtual Machine Hot Standby based on Double Shadow Page Tables

Performance Optimization on Huawei Public and Private Cloud

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Virtualisation: The KVM Way. Amit Shah

Chapter 5 C. Virtual machines

Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

The only open-source type-1 hypervisor

KVM Live Migration: Weather forecast

Server Virtualization Approaches

Virtualization. Pradipta De

Modernization of Kemari using HVM with PV Drivers

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Advanced Memory Management

Live Migration: Even faster, now with a dedicated thread!

CS 550 Operating Systems Spring Introduction to Virtual Machines

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

CHAPTER 16 - VIRTUAL MACHINES

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Increase KVM Performance/Density

Dan Noé University of New Hampshire / VeloBit

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

HPVM & OpenVMS. Sandeep Ramavana OpenVMS Engineering Sep Germany Technical Update Days 2009

CS370 Operating Systems

Introduction of AMD Advanced Virtual Interrupt Controller

A three phase optimization method for precopy based VM live migration

Virtualization. Michael Tsai 2018/4/16

University of Alberta. Zhu Pang. Master of Science. Department of Computing Science

Parallels Virtuozzo Containers

Originally prepared by Lehigh graduate Greg Bosch; last modified April 2016 by B. Davison

Virtualization And High Availability. Howard Chow Microsoft MVP

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Enhancing Oracle VM Business Continuity Using Dell Compellent Live Volume

Live Virtual Machine Migration with Efficient Working Set Prediction

Making Nested Virtualization Real by Using Hardware Virtualization Features

COLO: COarse-grained LOck-stepping Virtual Machines for Non-stop Service

Live Migration of Virtualized Edge Networks: Analytical Modeling and Performance Evaluation

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Micro VMMs and Nested Virtualization

Virtualization and memory hierarchy

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

How CloudEndure Works

Deploying Application and OS Virtualization Together: Citrix and Virtuozzo

Distributed Systems COMP 212. Lecture 18 Othon Michail

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Cloud Computing Lecture 4

A Userspace Packet Switch for Virtual Machines

double split driver model

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Resilire: Achieving High Availability at the Virtual Machine Level

QuartzV: Bringing Quality of Time to Virtual Machines

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

How CloudEndure Disaster Recovery Works

NEC SigmaSystemCenter Integrated Virtualization Platform Management Software

vsan Management Cluster First Published On: Last Updated On:

SUSE Linux Enterprise Server: Supported Virtualization Technologies

Evaluating CPU utilization in a Cloud Environment

Using Containers to Deliver an Efficient Private Cloud

Performance Measurement of Live Migration Algorithms

Live Migration of Virtual Machines

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

1. Memory technology & Hierarchy

Gavin Payne Senior Consultant.

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

vcacheshare: Automated Server Flash Cache Space Management in a Virtualiza;on Environment

How CloudEndure Disaster Recovery Works

RT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering(

Memory Externalization With userfaultfd Red Hat, Inc.

Services in the Virtualization Plane. Andrew Warfield Adjunct Professor, UBC Technical Director, Citrix Systems

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching

Reducing CPU usage of a Toro Appliance

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

RELIABLE service plays an important role in missioncritical

CAS8250 Introduction to Workload Migration November 2014

vsan Remote Office Deployment January 09, 2018

Oracle Real Application Clusters One Node

Migraine-Free Migrations

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

CHAPTER 16 - VIRTUAL MACHINES

Native vsphere Storage for Remote and Branch Offices

Real-Time Cache Management for Multi-Core Virtualization

A Fast and Iterative Migration for GPU Applications

Optimizing VM Checkpointing for Restore Performance in VMware ESXi

Πποχωπημένη Κατανεμημένη Υπολογιστική

Xen Community Update. Ian Pratt, Citrix Systems and Chairman of Xen.org

Xen Project Overview and Update. Ian Pratt, Chairman of Xen.org, and Chief Scientist, Citrix Systems Inc.

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)

vsan Mixed Workloads First Published On: Last Updated On:

How it can help your organisation

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

Who stole my CPU? Leonid Podolny Vineeth Remanan Pillai. Systems DigitalOcean

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

ARM Interrupt Virtualization. Andre Przywara

Transcription:

Dirty Memory Tracking for Performant Checkpointing Solutions Lei Cao lei.cao@stratus.com August 25, 2016 1

Software Fault Tolerance Checkpointing is a technique to create a fault tolerant virtual machine by connecting a pair of servers and periodically send VM state from a primary server to a standby server Checkpointing supplies a greater level of availability relative to typical HA or cluster style solutions in that failures cause no downtime and no data transaction loss. This presentation overviews checkpointing and then describes a set of KVM changes to improve checkpointing performance 2

Agenda Fault tolerance via checkpointing Motivation Design goals Proposed KVM Changes Upstream status 3

Checkpointing Overview A protected guest (OS and applications) runs inside a virtual machine The hypervisor contains support to: Pause the VM Capture static and incremental IO state Capture incremental memory state Pages dirtied since last checkpoint Resume the VM Epoch Guest Running Checkpoint (guest paused) The above operations are called a checkpoint This captured state is sent to another physical (standby) server whose hypervisor runs a paused VM with the same configuration In case of a failure of the active server/vm, the standby has sufficient state to resume guest operation from the last checkpoint. 4

Guest Run Epoch N-1 Checkpoint N-1 Guest Run Epoch N Checkpoint N Guest Run Epoch N+1 Active Host Standby Host Checkpoint N-1 Checkpoint N Guest Run Epoch N+1 Time

Release I/O CP N Release I/O CP N+1 Guest Run Epoch N Checkpoint N Guest Run Epoch N+1 Checkpoint N+1 Guest Run Epoch N+2 Active Host Standby Host Transmit CP N ACK Transmit CP N+1 ACK Time Apply CP N Apply CP N+1

Known Open-Source Checkpointing Solutions Active COLO A checkpointing enhancement, needs an underlying checkpointing mechanism. Originally released for Xen in 2012 by Intel/Huawei leveraging Remus KVM upstream effort started in 2014 leveraging MicroCheckpointing project Patch submission started 2015, project is very active with widening participation. 7

Known Open-Source Checkpointing Solutions Inactive or Lessactive Remus Created in 2007 at the University of British Columbia (and Citrix). Accepted upstream in Xen 4.0 in 2009, no KVM activity. Kemari Created in 2008 at NTT Cyber Space Labs for Xen. KVM patches created in 2010 but never upstreamed. MicroCheckpointing Created in 2013 at the IBM Watson Research Center. Upstreaming activity now dormant, possibly superseded by COLO. 8

Known Proprietary Checkpointing Solutions Vmware FT 2015 (preceded by a non-checkpointing single core version) Stratus everrun Build on former Marathon MX product (released in 2010, preceded by non-checkpointing single core version), portions GPL (e.g. KVM mods) and portions proprietary. Avaya Machine Preserving High Availability option for Aura Application Enablement Services 2012, available only for Avaya environment (not general purpose) 9

Motivation for Proposed KVM Changes Checkpointing performs anywhere from >90% of a non-checkpointing VM for CPU intensive loads to 25% for high-bandwidth low-latency network intensive loads. Realistic commercial workloads typically perform at around 50% of a noncheckpointing VM. Majority of a checkpoint is spent on Epoch capturing dirty pages Guest Running Checkpoint (guest paused) 10

Current Memory Tracking Mechanism Use of VM-sized bitmap to track dirty memory The number of dirty pages is bounded in a checkpointing system For commercial workloads: Number of checkpoints per second: 150 to 1500 Number of dirty pages per checkpoint: 300 to 3000 Compare to 2300k total pages (8G VM) Traversing a large, sparsely populated bitmap every checkpoint is time-consuming Copying bitmap to user space every checkpoint is time-consuming 11

Design Goals Easily portable to various kernel versions CentOS 6.4, CentOS 6.5, CentOS 6.6, CentOS 6.7, CentOS 7.2 Ubuntu 14.04 SLES12 No change of existing KVM functionality New ioctls Co-exist with current dirty memory logging facilities Usable by live migration as well as checkpointing Avoid dynamic memory allocation and freeing during checkpointing cycle Done when VM enters/exists checkpointing mode 12

Proposed Changes (1 of 3) Compact lists of dirty GFNs One list per online vcpu Avoid locking when vcpus dirty memory One global list Pages dirtied by KVM Overflow dirty pages from per-vcpu lists Avoid duplicates via bitmap Duplicates undesirable due to fixed size list Duplicates from guest time update by KVM, PV EOI set/clear by KVM Can reuse current bitmap 13

Proposed Changes (2 of 3) Dirty log full force VM exit Number of dirty pages is bounded per epoch due to limited buffering Exceeding buffer size results in expensive resynchronization Force VM exit to user space when number of dirty pages reaches the threshold Threshold calculated by user space and passed to KVM during memory tracking initialization 14

Proposed Changes (3 of 3) Initialization/cleanup (KVM_INIT_MT) During initialization User space indicates initialization or cleanup User space specifies max number of dirty pages per checkpoint cycle Activate/deactivate (KVM_ENABLE_MT) Allocate/free dirty lists Enable/disable dirty traps 15

Proposed Changes (3 of 3) continued Prepare for new checkpoint cycle (KVM_PREPARE_MT_CP) Reset the indexes/counters for all dirty lists Fetch dirty list (KVM_MT_SUBLIST_FETCH) Support fetch from multiple user space threads Rearm the dirty traps (KVM_RESET_DIRTY_PAGES) 16

Execution flow Init Enable Enter checkpointing mode Prepar e Checkpoint cycles Reset Fetch Disabl e Cleanu p Exit checkpointing mode 17

How about live migration? The proposed changes do not break live migration Checkpointing mode can be used for live migration Need user space support Improve the predictability of live migrations of memory write intensive workloads Autoconverge tries to address this problem via cpu throttling Cpu throttling may not be effective for some workloads where memory write speed is not dependent on CPU execution speed 18

Upstream Status Version 1 submitted to KVM mailing list [PATCH 0/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration http://www.spinics.net/lists/kvm/msg131356.html Version 2 planned for September submission 19