Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel

Similar documents
Technion - Computer Science Department - Technical Report CS

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup

Isolating Failure-Inducing Thread Schedules

Multiprocessor Systems. Chapter 8, 8.1

Optimising Multicore JVMs. Khaled Alnowaiser

The Google File System

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor Systems. COMP s1

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Primary-Backup Replication

Hierarchical PLABs, CLABs, TLABs in Hotspot

Fault Tolerance Middleware for Cloud Computing

CS140 Operating Systems and Systems Programming Midterm Exam

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

The benefits and costs of writing a POSIX kernel in a high-level language

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

CSCI 4717 Computer Architecture

TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008.

The Google File System

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service)

CSE 153 Design of Operating Systems

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Chapter 10 DISTRIBUTED OBJECT-BASED SYSTEMS

Kernel Level Speculative DSM

Google File System. Arun Sundaram Operating Systems

VIProf: A Vertically Integrated Full-System Profiler

The Google File System

Revisiting Deterministic Multithreading Strategies

Engineering Fault-Tolerant TCP/IP servers using FT-TCP. Dmitrii Zagorodnov University of California San Diego

Multiprocessor Scheduling. Multiprocessor Scheduling

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines

The Google File System

GFS: The Google File System. Dr. Yingwu Zhu

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Multiprocessor and Real- Time Scheduling. Chapter 10

Authors : Ruslan Nikolaev Godmar Back Presented in SOSP 13 on Nov 3-6, 2013

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CS A331 Programming Language Concepts

Untyped Memory in the Java Virtual Machine

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

Java On Steroids: Sun s High-Performance Java Implementation. History

The Kernel Abstraction

Design and Implementation of a Consistent Time Service for Fault-Tolerant Distributed Systems

Sistemas Operacionais I. Valeria Menezes Bastos

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Process Description and Control

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

CSE Traditional Operating Systems deal with typical system software designed to be:

ECE519 Advanced Operating Systems

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

A Global Operating System for HPC Clusters

CSE 120 Principles of Operating Systems

CSCI-GA Operating Systems Lecture 3: Processes and Threads -Part 2 Scheduling Hubertus Franke

Lecture 9: Midterm Review

The VMKit project: Java (and.net) on top of LLVM

System Virtual Machines

Dealing with Issues for Interprocess Communication

Embedded Systems. 6. Real-Time Operating Systems

Storyboard: Optimistic Deterministic Multithreading

Jaguar: Enabling Efficient Communication and I/O in Java

OPERATING SYSTEMS. Prescribed Text Book. Operating System Principles, Seventh Edition. Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

CHAPTER 6: PROCESS SYNCHRONIZATION

The Google File System

System Virtual Machines

The Google File System (GFS)

Concurrency Race Conditions and Deadlocks

Distributed System. Gang Wu. Spring,2018

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Deterministic Process Groups in

Threads SPL/2010 SPL/20 1

CA485 Ray Walshe Google File System

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Virtualization, Xen and Denali

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

02 - Distributed Systems

Kernel Synchronization I. Changwoo Min

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

SMD149 - Operating Systems

Towards Fair and Efficient SMP Virtual Machine Scheduling

Concurrency. Chapter 5

Multi-threading in Java. Jeff HUANG

Virtualization and High-Availability

The Google File System

GFS: The Google File System

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Scheduler Support for Video-oriented Multimedia on Client-side Virtualization

BigData and Map Reduce VITMAC03

UNIT:2. Process Management

Problem Set 2. CS347: Operating Systems

CMSC421: Principles of Operating Systems

Deadlocks. Dr. Yingwu Zhu

Transcription:

Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel

Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly available apps Fast recovery and continuation of execution upon failure of one of the replicas Deterministic behavior on all replicas no divergence of state

At What Level to Replicate? Hardware / machine instructions Operating System At the application, using a middleware

At What Level to Replicate? Virtual machine A new opportunity due to the proliferation of virtual machines Advantages Transparent OS and hardware independent Disadvantages Many programs invoke native methods, or may execute external programs, which are beyond the control of the VM

In this talk An implementation of a replicated Java Virtual Machine using the active replication (or leader/follower) paradigm Technical analysis of realization Performance measurements

System Model Assumptions Benign failures only Similar architecture on all replicas All class files exist on all replicas No non deterministic native methods Files are accessible from all replicas On SMPs, no data-races When backup takes over, it can handle changes in environment variables values

Jikes Research Virtual Machine An open source JVM Mostly written in Java except for a few lines of C code that are platform specific In fact, can be used as a VM for other languages Ported to Intel and PowerPC, Linux and AIX. Uses a JIT compiler

JikesRVM

JikesRVM Threads Implements its own thread package Threads are not completely preemptive Instead, the JIT inserts safe yield points at method prologues and back-edges (if then else) Once every X safe yield points, the thread is preempted X is either a fixed number or based on a timeout By default, based on timeout Also, on wait, yield, sleep, or blocking I/O

(Old) JikesRVM Scheduling Threads are quasi-preemptive Yield at safe yield points, after a fixed number or a timeout Check for ready file-descriptors If there are, schedule corresponding thread Otherwise, schedule first ready thread When a thread asks for a lock, if the lock is not granted, the thread is placed on a special queue

Sources of Non-Determinism Preemption We use the deterministic preemption option I/O based scheduling We must ensure that I/O is available to the same threads at the same scheduling points Now scheduling is completely deterministic!

Sources of Non-Determinism Data-races On uniprocessors, deterministic scheduling is enough On SMPs, we must disallow data-races in the application In Java, programs must be thread-safe: access shared objects only within synchronized methods In this case, all that is needed is to ensure that locks are obtained in the same order on both primary and backup Environment specific attributes Clock Environment variables IP address, etc

Frames in FT JVM Frames ensure consistent I/O based scheduling A frame ends after a fixed number of context switches, or when all threads are waiting for I/O I/O completion is checked on the primary at the beginning of each frame On the backup, the data is taken from the primary Frames are also used to ensure that all accesses to environment attributed return the same values on both and that locks are obtained in the same order The backup always runs one frame after the primary

Frames in FT JVM Frame i Frame i+1 primary End-of-frame End-of-frame Frame i-1 Frame i backup

Failure Detection and Recovery Primary and backup exchange heartbeats If the primary fails, the backup runs its current frame to completion Starting in the next frame, if it has partial information, it uses it Once this is exhausted, the backup starts running as a primary To become primary, the replica may also need to restore file descriptors, sockets, etc. With more than two replicas, can use GC toolkit

Implementation Hurdles Internal JVM tasks are run within application threads These tasks do not necessary occur in both replicas at the same time (or at all) Their time is taken from the thread within they run They may acquire global locks that are beyond our control, yet these locks affect scheduling of all threads Examples JIT compilation Since primary does real I/O and backup gets the data from a buffer, the cost is different Solutions Save and restore preemption counters before and after such tasks Disable thread scheduling during such tasks Care was taken to make sure that this does not causes deadlocks when a thread acquired a global lock

Garbage Collector GC is done on a separate thread It is invoked when a request for memory cannot be fulfilled, so this is non-deterministic The calling thread is moved to the head of the running queue, and its preemption counters are saved, so when GC finishes, it will be rerun in a way that masks this interruption Problem On SMPs, a GC would not start in one processor until all GC threads on all processors were invoked Solution All queues on all processors, except for the idle thread, were disabled when GC runs

Performance Measurements Dual Intel Xeon 1.7 MHz with 1GB RDRAM memory RedHat Linux 7.2 Fast Ethernet (100Mbps) We compared the performance of FT-JVM on two replicas with unmodified non-replicated version of JikesRVM Measurements were done with JIT enabled We varied the length of frames Short frames imply frequent synchronization Higher overhead, but reduced data-loss With 1 second frames, the results are similar to original JVM

SciMark 2.0 Benchmark

I/O-bound LZW Address book

Overhead Breakdown 1 2 3 Application run time Send queue bottleneck Waiting for end of frame acknowledgement Raytrace LZW Address Book Percent of time per frame 100% 75% 50% 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 25% 0% 500 1000 5000 10000 50000 100000 Number of context switches per frame

SMP Performance

Limitations Benign failures only Assume data-race free programs (on SMPs) Assume that all native methods are deterministic Existing implementation does not replicate sockets Writes to files must be to deterministic locations High overhead for I/O intensive applications

Related Work Deterministic execution for active replication Jiménez-Peris, Patiño-Martínez, Arévalo - SRDS 00 Narasimhan, Moser, Mellier-Smith - SRDS 99 Deterministic replay of events Choi and Srinivasan - ACM SIGMETRICS SPDT 98 Konuru, Srinivasan, Choi IPDPS 00 Hypervisor fault-tolerance Bressoud and Schneider SOSP 95 FT JVM using primary/backup Napper, Alvisi and Vin DSN 03

Conclusion An active replication implementation of FT-JVM inside the VM layer Design that prevents non-determinism, especially on SMPs Performance measurements For non-i/o intensive applications, the performance compares favorably with original JVM with frames of about 1 second Open problem Dynamic load balancing between processors of an SMP, especially on the backup