Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service)

Similar documents
Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

Status Update About COLO FT

COLO: COarse-grained LOck-stepping Virtual Machines for Non-stop Service

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

Dirty Memory Tracking for Performant Checkpointing Solutions

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

IsoStack Highly Efficient Network Processing on Dedicated Cores

VDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT

Preserving I/O Prioritization in Virtualized OSes

AIST Super Green Cloud

Xen Community Update. Ian Pratt, Citrix Systems and Chairman of Xen.org

Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT

Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski

Tackling the Management Challenges of Server Consolidation on Multi-core System

Virtualization, Xen and Denali

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup

Deterministic Process Groups in

Vshadow: Promoting Physical Servers into Virtualization World

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Real-Time Systems and Intel take industrial embedded systems to the next level

Xen Network I/O Performance Analysis and Opportunities for Improvement

Yabusame: Postcopy Live Migration for Qemu/KVM

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

CS 550 Operating Systems Spring Introduction to Virtual Machines

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班

Oracle Corporation. Speaker: Dan Magenheimer. without the Commitment. Memory Overcommit. <Insert Picture Here>

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated Desktops

GPU Consolidation for Cloud Games: Are We There Yet?

Abstractions for Practical Virtual Machine Replay. Anton Burtsev, David Johnson, Mike Hibler, Eric Eride, John Regehr University of Utah

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel

A Case for High Performance Computing with Virtual Machines

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

Resilire: Achieving High Availability at the Virtual Machine Level

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Using persistent memory and RDMA for Ceph client write-back caching Scott Peterson, Senior Software Engineer Intel

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Total Cost of Ownership Analysis for a Wireless Access Gateway

Live Migration: Even faster, now with a dedicated thread!

GFS: The Google File System

Server Specifications

Server Virtualisation with Windows Server Hyper-V & System Centre

The only open-source type-1 hypervisor

FUJITSU Software Interstage Information Integrator V11

Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

Implementation and Analysis of Large Receive Offload in a Virtualized System

Running DME on VMware ESX

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

CSE543 - Computer and Network Security Module: Virtualization

Intel Speed Select Technology Base Frequency - Enhancing Performance

To EL2, and Beyond! connect.linaro.org. Optimizing the Design and Implementation of KVM/ARM

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania.

Virtualization and memory hierarchy

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

The Transition to PCI Express* for Client SSDs

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Improve VNF safety with Vhost-User/DPDK IOMMU support

Optimizing TCP Receive Performance

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

Advanced RDMA-based Admission Control for Modern Data-Centers

Amazon EC2 Deep Dive. Michael #awssummit

Live Migration with Mdev Device

Server Virtualization with Windows Server Hyper- V and System Center

Virtualization. Pradipta De

Fairness Issues in Software Virtual Routers

CSE543 - Computer and Network Security Module: Virtualization

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

SR-IOV support in Xen. Yaozu (Eddie) Dong Yunhong Jiang Kun (Kevin) Tian

RDMA Migration and RDMA Fault Tolerance for QEMU

Engineering Fault-Tolerant TCP/IP servers using FT-TCP. Dmitrii Zagorodnov University of California San Diego

CSE543 - Computer and Network Security Module: Virtualization

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

QuartzV: Bringing Quality of Time to Virtual Machines

vsan Remote Office Deployment January 09, 2018

Virtualization. Michael Tsai 2018/4/16

How to abstract hardware acceleration device in cloud environment. Maciej Grochowski Intel DCG Ireland

Nested Virtualization and Server Consolidation

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Virtualization and Performance

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Transparent TCP Recovery

Xen on ARM ARMv7 with virtualization extensions

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Introduction to SGX (Software Guard Extensions) and SGX Virtualization. Kai Huang, Jun Nakajima (Speaker) July 12, 2017

Xen on ARM. How fast is it, really? Stefano Stabellini. 18 August 2014

Applying Polling Techniques to QEMU

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

Native vsphere Storage for Remote and Branch Offices

SFO17-403: Optimizing the Design and Implementation of KVM/ARM

Making Nested Virtualization Real by Using Hardware Virtualization Features

Keeping up with the hardware

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

Chapter 5 C. Virtual machines

Transcription:

Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service) eddie.dong@intel.com arei.gonglei@huawei.com yanghy@cn.fujitsu.com

Agenda Background Introduction Of COLO Current Status Performance Summary 2

What is COLO? COLO(COarse-grain LOck-stepping) is an ideal Application-agnostic Solution for Non-stop service in the cloud. Typical Non-stop Service Requires Expensive hardware for redundancy Extensive software customization COLO Cheap Application-agnostic Non-stop Virtual Machine 3

What Happened? Goal: make COLO upstream into KVM/XEN 2013 2014 2015 Academia paper published at ACM Symposium on Cloud Computing (SOCC 13) (originated by Intel) Collaboration between Huawei and Intel, announced in FusionSphere 3.0 Introduced COLO in KVM forum 2013, and got active response from the community Collaboration between Huawei, Intel and Fujitsu, focus on open source development in KVM community Introduced COLO into KVM/XEN, proposal already accepted, and active response 4

Agenda Background Introduction Of COLO Current Status Performance Summary 5

Non-stop Service Solutions Hardware Solution Robust Hardware Components Very expensive Still Unpredictable such as alpha-particles in cosmic ray Software Solution Replication Construct backup replicas at run time Failover: backups take over when the primary fails 6

Different Layer of Replication Application layer Extensive software customization Impractical to legacy software OS layer Large complexity, not commoditized Virtual machine layer Application and OS agnostic 7

VM Replication Primary Secondary Hardwar e Failure APPs OS PVM VMM Hardware Fail Over VM Replication Network Storage APPs OS SVM VMM Hardware 8

Existing VM Replication Approaches Lock-stepping: Replicating per instruction Execute in parallel for deterministic instructions Lock and step for nondeterministic instructions Checkpoint: Replicating per epoch Output is buffered within an epoch Exact machine state matching from external observers Replicating Exact Machine State from PVM to SVM 9

Problems Lock-stepping Excessive replication overhead for MP-guest memory access in an MP-guest is nondeterministic Periodic Checkpoint Extra network latency Excessive VM checkpoint overhead 10

Why Exact Machine State Matching? Valid / Invalid replica (SVM) Be able / unable to take over the service respecting the application semantics Exact machine state matching is an overly strong condition o Machine State Divergence Valid Replica Hard Boundary (Valid / Invalid Replica) Invalid Replica t 11

COarse-grain LOck-stepping (COLO) Replicating with less machine state matching Executes in parallel, as if the machine state is within the hard boundary Non-stop service system machine state contrail fully within the hard boundary o Machine State Divergence Valid Replica Hard Boundary (Valid / Invalid Replica) Machine State Contrail Invalid Replica t 12

Challenge 1 How to guarantee the machine state contrail is fully within the hard boundary The initial SVM state may be identical to the PVM May enforce machine state matching any time o Machine State Divergence Valid Replica State Enforcement Invalid Replica t 13

Challenge - 2 How to identify the boundary? Impractical to find the exact boundary of Any boundary within the hard boundary works o Machine State Divergence Valid Replica Hard Boundary Workable Boundary Invalid Replica t Can We Find a Practical Workable Boundary? 14

An Example Solution A practical solution to COLO Identify the workable boundary ( ) base on the output response The initial SVM state is identical to PVM SVM is a valid replica if and only if it can generate identical responses with PVM SVM and PVM are feed with same inputs Checkpoint if the SVM is no longer a valid replica Further Explorations Are Required to Identify More Workable Boundary 15

Why It Works Response Model for C/S System & are the request and the execution result of an nondeterministic instruction Each response packet from the equation is a semantics response Successfully failover at k th packet if (C is the packet series the client received) 16

Why Better Comparing with Periodic VM checkpoint No buffering-introduced latency Less checkpoint frequency On demand vs. periodic Comparing with lock-stepping Eliminate excessive overhead of nondeterministic instruction execution due to MP-guest memory access 17

Proxy Module Proxy Module Internal Network Architecture of COLO Primary Node Secondary Node Primary VM Heartbeat Heartbeat Secondary VM Qemu Failover Qemu Failover VM Checkpoint VM Checkpoint COLO Disk Manager COLO Disk Manager KVM Proxy Module KVM Kernel Kernel Disk IO Net IO Storage External Network External Network Storage Pnode: primary node; PVM: primary VM; Snode: secondary node; SVM: secondary VM COarse-grain LOck-stepping Virtual Machine for Non-stop Service 18

Implementations Targeting fail-stop failure Most hardware failures are self-corrected in modern server Unrecoverable failures are fail-stop failures Base on Xen VM Checkpointing Solution (Remus) Extend Remus passive-checkpointing to support activecheckpointing 19

Performance Consideration The frequency of Checkpoint is critical Highly dependent on the Output Similarity, or Response Similarity Key Focus is TCP packet! Might be even worse if there are too frequent VM checkpoint Response similarity determines the frequency of checkpoint 20

Improving Response Similarity Minor Modification to Guest TCP/IP Stack Coarse-grained time stamps Per-Connection comparison Highly-deterministic ACK mechanism Deterministic segmentation with Nagle algorithm Coarse-grained notification Window Size 21

Failover Mode 22

Device State Lock-Stepping Local Storage View as external interaction Consider the write operations as responses to external observer lead to increased checkpointing frequency View as internal interaction Write operation result is part of internal machine state Remote Storage may not lead to immediate VM checkpointing Need to forward and compare storage write operations Shared access: Rely on the remote storage system Exclusive access: Same policy with local storage 23

Agenda Background Introduction Of COLO Current Status Performance Summary 24

Current Status Academia paper published at ACM Symposium on Cloud Computing (SOCC 13) COLO: COarse-grained LOck-stepping Virtual Machines for Non-stop Service http://www.socc2013.org/home/program Refer to the paper for technical details Industry announcement Huawei FusionSphere uses COLO http://e.huawei.com/en/news/global/2013/hw_308817 Wiki pages COLO on Xen: http://wiki.xen.org/wiki/colo_-_coarse_grain_lock_stepping COLO on Qemu/KVM: http://wiki.qemu.org/features/colo 25

Upstream Status COLO-Frame: patch series v8 has been posted on QEMU maillist. http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg05713.html [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Block replication: patch series v8 has been posted on QEMU maillist. http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg01585.html [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints COLO-proxy in QEMU: poc been posted on QEMU maillist. http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html The netfilter for QEMU which could be useful for colo-proxy been posted v6 http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00883.html Patches for Xen are sent to the mailinglist (v8) http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg02911.html [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service 26

Agenda Background Introduction Of COLO Current Status Performance Summary 27

Configurations & Benchmarking Hardware 2.7 GHz 8-core Xeon processor, 128 GB Ram 1 Gbps external conn, 10 Gbps internal conn Software Guest: RHEL5U5 with 2 GB memory, PV disk/nic Dom0: RHEL6U1 (kernel updated to 3.2) Hypervisor: Xen 4.1, 10 MB log buffer for disk write operations Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 28

Impact of TCP/IP Modification Concurrent TCP connections in COLO native TCP/IP stack in WAN Total BW: 80 Mbps, stddev: 4.4 Concurrent TCP connections in COLO modified TCP/IP stack in WAN Total BW: 80 Mbps, stddev: 3.1 For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 29

Throughput (Mbps) Web Server Performance - Web Bench 1000 Native Remus-20 Remus-40 COLO 800 600 400 200 0 1 4 16 64 256 Threads For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 30

Normalized Throughput (COLO/Native) Web Server Performance - Web Bench (MP) 1 1-CPU 2-CPU 4-CPU 0.8 0.6 0.4 0.2 0 1 4 16 64 256 Number of Threads For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 31

PostgreSQL Performance - Pgbench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 32

Normalized Throughput PostgreSQL Performance - Pgbench (MP) 1 1-CPU 2-CPU 4-CPU 0.8 0.6 0.4 0.2 0 1 4 16 64 256 Number of Request Clients For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 33

Agenda Background Introduction Of COLO Current Status Performance Summary 34

Summary COLO can achieve native performance for CPUintensive workload COLO is MP-neutral, and outperforms Remus by 69% in Web Bench, and 46% in Pgbench, respectively Next steps Fix based on reviewing comments Optimize performance Anything else? 35

Backups 36

Machine State Evolvement in COLO o Machine State Divergence Valid Replica Invalid Replica Matching Enforcement t Boundary Machine State Evolvement w/o matching enforcement Machine State Evolvement in COLO 37

Sysbench & Kernel Build For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 38

Kernel Build vs. Log Buffer Size For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 39

Output Similarity of Web Server Web Bench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 40

Output Similarity of PostgreSQL - Pgbench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 41

VM Checkpoint Cost Web Bench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 42

VM Checkpoint Cost Pgbench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 43

Limitations Performance Degradation from nonblock Sending in Web Bench For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Source: Intel 44