Virtualize Everything but Time

Similar documents
An IEEE-1588 Compatible RADclock

Counter Availability and Characteristics for Feed-forward Based Synchronization

Ten Microseconds Over LAN, for Free

Principles of Robust Timing over the Internet

A comparative analysis of Precision Time Protocol in native, virtual machines and container-based environments for consolidating automotive workloads

Distributed Systems. Clock Synchronization: Physical Clocks. Paul Krzyzanowski

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017

Clock-Synchronisation

Implementing a NTP-Based Time Service within a Distributed Middleware System

Precision Time Protocol, and Sub-Microsecond Synchronization

Kevin B. Stanton, Ph.D. Sr. Principal Engineer Intel Corporation

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

QuartzV: Bringing Quality of Time to Virtual Machines

I/O and virtualization

VALE: a switched ethernet for virtual machines

19: I/O Devices: Clocks, Power Management

DRAFT. Dual Time Scale in Factory & Energy Automation. White Paper about Industrial Time Synchronization. (IEEE 802.

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Optimizing and Enhancing VM for the Cloud Computing Era. 20 November 2009 Jun Nakajima, Sheng Yang, and Eddie Dong

SETTING UP AN NTP SERVER AT THE ROYAL OBSERVATORY OF BELGIUM

Lecture 5: February 3

One-way Delay Measurement Using NTP

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

IEEE 1588 PTP clock synchronization over a WAN backbone

Xen and the Art of Virtualization

Multiprocessor Scheduling. Multiprocessor Scheduling

Operating Systems. Operating System Structure. Lecture 2 Michael O Boyle

Lecture 10: Clocks and Time

Interrupt Coalescing in Xen

SFO17-403: Optimizing the Design and Implementation of KVM/ARM

Chapter 3 Virtualization Model for Cloud Computing Environment

Unit 5: Distributed, Real-Time, and Multimedia Systems

Delivering Sub-Microsecond Accurate Time to Linux Applications Around the World

Performance Considerations of Network Functions Virtualization using Containers

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Virtualization. Michael Tsai 2018/4/16

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Implementing NTPv4 in IPv6

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Chapter 5 C. Virtual machines

SETTING UP AN NTP SERVER AT THE ROYAL OBSERVATORY OF BELGIUM

This is an author-deposited version published in: Eprints ID: 10361

Embedded SDR for Small Form Factor Systems

Networks and distributed computing

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

CHARACTERISTICS OF TIME SYNCHRONIZATION RESPONSE OF NTP CLIENTS ON MS WINDOWS OS AND LINUX OS

Network Time Service SY-GPS-1-A

CHAPTER 16 - VIRTUAL MACHINES

Hardware assisted Virtualization in Embedded

DISTRIBUTED REAL-TIME SYSTEMS

CS 326: Operating Systems. Process Execution. Lecture 5

Dan Noé University of New Hampshire / VeloBit

Fairness Issues in Software Virtual Routers

EE 660: Computer Architecture Cloud Architecture: Virtualization

2017 Paul Krzyzanowski 1

Distributed Systems COMP 212. Lecture 17 Othon Michail

Using Time Division Multiplexing to support Real-time Networking on Ethernet

How to get realistic C-states latency and residency? Vincent Guittot

Synchronization of Network Devices Time by GPS Based Network Time Protocol Output

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtualization, Xen and Denali

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Lecture 12: Time Distributed Systems

MiFID II and beyond. In depth session on a slightly different approach to compliance validation. George Nowicki, TP ICAP ITSF 2017

Last Class: Naming. DNS Implementation

Originally prepared by Lehigh graduate Greg Bosch; last modified April 2016 by B. Davison

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

Time in Distributed Systems

CSE398: Network Systems Design

GPS IRIG-B/NTP Time Server GPS-2-E-NTP

I/O Systems (3): Clocks and Timers. CSE 2431: Introduction to Operating Systems

Irish NTP Subnet. Performance Issues. Hugh Melvin & Paraic Quinn. Dept. of Information Technology NUI,G

Network Time Protocol

Precise computer clock setting with MS Windows

Xenrelay: An Efficient Data Transmitting Approach for Tracing Guest Domain

Wireless Sensor Networks: Clustering, Routing, Localization, Time Synchronization

Xen Summit Spring 2007

ReVirt: Enabling Intrusion Analysis through Virtual Machine Logging and Replay

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Solarflare (Legacy) PTP User Guide

Who Watches the Watcher? Detecting Hypervisor Introspection from Unprivileged Guests

Distributed Coordination 1/39

CS533 Concepts of Operating Systems. Jonathan Walpole

CSE 5306 Distributed Systems. Synchronization

Architecture and OS. To do. q Architecture impact on OS q OS impact on architecture q Next time: OS components and structure

Nested Virtualization and Server Consolidation

Introduction to Kernel Programming. Luca Abeni

Data Path acceleration techniques in a NFV world

Primary-Backup Replication

My VM is Lighter (and Safer) than your Container

A Fast and Transparent Communication Protocol for Co-Resident Virtual Machines

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

Transcription:

Virtualize Everything but Time Timothy Broomhead ( t.broomhead@ugrad.unimelb.edu.au ) Laurence Cremean ( l.cremean@ugrad.unimelb.edu.au ) Julien Ridoux ( jrid@unimelb.edu.au ) Darryl Veitch ( dveitch@unimelb.edu.au ) Centre for Ultra-Broadband Information Networks THE UNIVERSITY OF MELBOURNE 1

Introduction! Clock synchronization, who cares? Network monitoring / Traffic analysis Telecommunications Industry; Finance; Gaming,... Distributed `scheduling : timestamps instead of message passing! Status quo under Xen Based on ntpd, amplifies its flaws Fails under live VM migration! We propose a new architecture Based on RADclock client synchronization solution Robust, accurate, scalable Enables dependent clock paradigm Seamless migration 2 2

Key Idea! Each physical host has a single clock which never migrates! Only a (stateless) clock read function migrates 3 3

Para-Virtualization and Xen! Hypervisor minimal kernel managing physical resources! Para-virtualization Guest OS s have access to hypervisor via hypercalls Fully-virtualized more complex, not addressed here! Focus on Xen But approach has general applicability! Focus on Linux OS s ( 2.6.31.13 Xen pvops branch ) Guest OS s: Dom0: privileged access to hardware devices DomU: access managed by Dom0 Use Hypervisor 4.0 mainly 4 4

Hardware Counters! Clocks built on local hardware (oscillators! counters) HPET, ACPI, TSC Counters imperfect, they drift (temperature driven) Affected by OS ticking rate access latency! TSC (counts CPU cycles) Highest resolution and lowest latency - preferred! but.. May be unreliable multi-core! multiple unsynchronised TSCs! HPET power management! variable rate, including stopping! Reliable, but Lower resolution, higher latency 5 5

Xen Clocksource A hardware/software hybrid timer provided by the hypervisor! Purpose Combine reliability of HPET with low latency of TSC Compensate for TSC unreliability Provides 1GHz 64-bit counter! Performance of XCS versus HPET XCS performs well: low latency and high stability HPET not that far behind, and a lot simpler 6 6

Clock Fundamentals! Timekeeping and timestamping are distinct! Raw timestamps and clock timestamps are distinct! A scaled counter is not a good clock: drift!! Purpose of clock sync algo is to correct for drift! Network based sync is convenient, exchange timing packets: Server d d Network Host time r! Two key problems Dealing with delay variability (complex, but possible) Path asymmetry (simple, but impossible) 7 7

Synchronisation Algorithms! NTP (ntpd) Status Quo Feedback based Event timestamps are system clock stamps Feedback controller (PLL,FLL) tries to lock onto rate Intimate relationship with system clock (API, dynamics..) In Xen, ntpd uses Xen Clocksource! RADclock (Robust Absolute and Difference Clock) Algo developed in 2004, extensively tested Feedforward based Event timestamps are raw stamps Clock error estimates made and removed when clock read `System clock has no dynamics, just a function call Can use any raw counter: here use HPET, Xen Clocksource 8 8

Experimental Methodology Internal Monitor External Monitor GPS Receiver Host Hypervisor Dom0 DomU ntpd-ntp RADclock ntpd-ntp RADclock DAG Card Hub DAG-GPS Unix PC UDP Sender & Receiver Atomic Clock SW-GPS NTP Server Stratum 1 PPS Sync. NTP flow UDP flow Timestamping 9 9

Wots the problem? ntpd can perform well! Ideal Setup Quality Stratum-1 time-server Client is on the same LAN, lightly loaded, barely any traffic Constrained and small polling period: 16 sec Clock error [µs] 80 ntpd 60 40 20 0 0 5 10 Time [day] 15 20 10 10

Or less well...! Different configuration (ntpd recommended!) Multiple servers Relax constraint on polling period Still no load, no traffic, high quality servers Clock Error [µs] 1000 Single server 3 Co Located Servers 3 Nearby Servers 500 0 500 1000 ntpd NTP 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 Hours When/Why? Loss of stability a complex function of parameters unreliable 11 11

The Xen Context! Three examples of inadequacy of ntpd based solution 1) Dependent ntpd clock 2) Independent ntpd clock 3) Migrating independent ntpd clock 12 12

1) Dependent ntpd Clock! The Solution Only Dom0 runs ntpd Periodically updates a `boot time variable in hypervisor DomU uses Xen Clocksource to interpolate! The Result (2.6.26 kernel) 4000 Clock error [µs] 2000 0 2000 4000 ntpd dependent 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Time [Hours] 13 13

2) Independent ntpd Clock (current solution)! The Solution All guests run entirely separate ntpd daemons Resource hungry! The Result When all is well, works as before but with a bit more noise When works: (parallel comparison on Dom0, stratum-1 on LAN) Clock error [µs] 80 60 40 20 0 ntpd RADclock 0 5 10 15 20 Time [day] 14 14

2) Independent ntpd Clock (current solution)! The Solution All guests run entirely separate ntpd daemons Resource hungry! The Result Increased noise makes instability more likely When fails: (DomU with some load, variable polling period, guest churn) Clock error [µs] 5000 0 ntpd 5000 0 2 4 6 8 10 12 14 16 Time [Hours] 15 15

3) Migrating Independent ntpd Clock! The Solution Independent clock as before, migrates Starts talking to new system clock, new counter! The Result Migration Shock! More Soon 16 16

RADclock Architecture Principles! Timestamping: raw counter reads, not clock reads independent of the clock algorithm! Synchronization Algorithm: based on raw timestamps and server timestamps (feedforward) estimates clock parameters and makes available concentrated in a single module (in userland)! Clock Reading combines a raw timestamp with retrieved clock parameters stateless 17 17

More Concretely! Timestamping read chosen counter, say HPET(t)! Sync Algorithm maintains: Period: a long term average (barely changes) rate stability K: sets origin to desired timescale (e.g. UTC) E: estimate of error updates on each stamp exchange! Clock Reading Absolute clock: Ca(t) = Period *HPET(t) + K - E(t) used for absolute, and differences above critical scale Difference clock: Cd(t1,t2) = Period * ( HPET(t2) - HPET(t1) ) used for time differences under some critical time scale 18 18

Implementation! Timestamping `feedforward support create cumulative and wide (64-bit) form of counter make accessible from both kernel and user context under Linux, modify Clocksource abstraction! Sync Algorithm Make clock parameters available via a user thread! Clock reading Read counter, retrieve clock data, compose Fixed-point code to enable clock to be read from kernel 19 19

On Xen!! Dependent Clock now very natural Dom0 maintains a RADclock daemon, talks to timeserver Makes Period, K, E available through Xenstore filesystem Each DomU can just reads counter, retrieve clockdata, compose! All Guest Clocks identically the same, but: Small delay (~1ms) in Xenstore update stale data possible but very unlikely small impact Latency to read counter higher on DomU! Support Needed Feedforward paradigm a perfect match to para-virtualisation Expose HPET to Clocksource in guest OSs Add hypercall to access platform timer (HPET here) Add read/right functions to access clockdata from Xenstore 20 20

Independent RADclock on Xen Concurrent test on two DomU s, separate NTP streams RADclock Error [µs] 20 10 0 10 Xen Clocksource HPET 20 0 50 100 150 200 Time [mn] 250 300 350 3 2 1 x 10 3 HPET Med: 2.5 IQR: 9.3 3 2 1 x 10 3 XEN Med: 3.4 IQR: 9.5 0 10 0 10 RADclock error [µs] 0 10 0 10 RADclock error [µs] 21 21

Migration On Xen! Feedforward paradigm a perfect match to migration! Clocks don t migrate, only a clock reading function does! Each Dom0 has its own RADclock daemon DomU only ever calls a function, no state is migrated! Caveats Local copy of clockdata used to limit syscalls - needs refreshing Host asymmetry will change, result in small clock jump asymmetry effects different for Dom0 (hence clock itself) and DomU 22 22

Migration Comparison! Clock error [µs] 250 200 150 100 50 0 Dom0 Tastiger Dom0 Kultarr Migrated Guest RADclock Migrated Guest ntpd 50 0 1 2 3 4 5 Time [Hours]! Setup Two machines, each Dom0 running a RADclock One DomU migrates with a dependent RADclock independent ntpd 23 23

Noise Overhead of Xen and Guests RTT Host [µs] RTT Host [µs] 70 60 50 40 30 200 150 100 Native Dom0 1 guest 2 guests 3 guests 4 guests DomU #1 DomU #2 DomU #3 DomU #4 1 guest 2 guests 3 guests 4 guests 24 24

Noise Penalty Under C-States 110 100 Xen Clocksource HPET Hypervisor RTT Host [µs] 90 80 70 60 50 C0 C1 C2 C3 25 25

Algo Performance Under C-States RADclock Error: E median(e) [µs] 20 15 10 5 0 5 10 15 20 RADclock Xen RADclock HPET C0 C1 C2 C3 26 26

Conclusion! Feed-Forward approach has many advantages Difference clock defined Absolute clock can be made much more robust Time can be replayed Simpler kernel support! Good match to needs of para-virtualisation Enables clock dependent mode that works Allows seamless live migration! RADclock project Aims to replace ntpd Client and Server code Packages for FreeBSD and Linux (Xen now supported) http://www.cubinlab.ee.unimelb.edu.au/radclock/ 27 27