Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel
Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly available apps Fast recovery and continuation of execution upon failure of one of the replicas Deterministic behavior on all replicas no divergence of state
At What Level to Replicate? Hardware / machine instructions Operating System At the application, using a middleware
At What Level to Replicate? Virtual machine A new opportunity due to the proliferation of virtual machines Advantages Transparent OS and hardware independent Disadvantages Many programs invoke native methods, or may execute external programs, which are beyond the control of the VM
In this talk An implementation of a replicated Java Virtual Machine using the active replication (or leader/follower) paradigm Technical analysis of realization Performance measurements
System Model Assumptions Benign failures only Similar architecture on all replicas All class files exist on all replicas No non deterministic native methods Files are accessible from all replicas On SMPs, no data-races When backup takes over, it can handle changes in environment variables values
Jikes Research Virtual Machine An open source JVM Mostly written in Java except for a few lines of C code that are platform specific In fact, can be used as a VM for other languages Ported to Intel and PowerPC, Linux and AIX. Uses a JIT compiler
JikesRVM
JikesRVM Threads Implements its own thread package Threads are not completely preemptive Instead, the JIT inserts safe yield points at method prologues and back-edges (if then else) Once every X safe yield points, the thread is preempted X is either a fixed number or based on a timeout By default, based on timeout Also, on wait, yield, sleep, or blocking I/O
(Old) JikesRVM Scheduling Threads are quasi-preemptive Yield at safe yield points, after a fixed number or a timeout Check for ready file-descriptors If there are, schedule corresponding thread Otherwise, schedule first ready thread When a thread asks for a lock, if the lock is not granted, the thread is placed on a special queue
Sources of Non-Determinism Preemption We use the deterministic preemption option I/O based scheduling We must ensure that I/O is available to the same threads at the same scheduling points Now scheduling is completely deterministic!
Sources of Non-Determinism Data-races On uniprocessors, deterministic scheduling is enough On SMPs, we must disallow data-races in the application In Java, programs must be thread-safe: access shared objects only within synchronized methods In this case, all that is needed is to ensure that locks are obtained in the same order on both primary and backup Environment specific attributes Clock Environment variables IP address, etc
Frames in FT JVM Frames ensure consistent I/O based scheduling A frame ends after a fixed number of context switches, or when all threads are waiting for I/O I/O completion is checked on the primary at the beginning of each frame On the backup, the data is taken from the primary Frames are also used to ensure that all accesses to environment attributed return the same values on both and that locks are obtained in the same order The backup always runs one frame after the primary
Frames in FT JVM Frame i Frame i+1 primary End-of-frame End-of-frame Frame i-1 Frame i backup
Failure Detection and Recovery Primary and backup exchange heartbeats If the primary fails, the backup runs its current frame to completion Starting in the next frame, if it has partial information, it uses it Once this is exhausted, the backup starts running as a primary To become primary, the replica may also need to restore file descriptors, sockets, etc. With more than two replicas, can use GC toolkit
Implementation Hurdles Internal JVM tasks are run within application threads These tasks do not necessary occur in both replicas at the same time (or at all) Their time is taken from the thread within they run They may acquire global locks that are beyond our control, yet these locks affect scheduling of all threads Examples JIT compilation Since primary does real I/O and backup gets the data from a buffer, the cost is different Solutions Save and restore preemption counters before and after such tasks Disable thread scheduling during such tasks Care was taken to make sure that this does not causes deadlocks when a thread acquired a global lock
Garbage Collector GC is done on a separate thread It is invoked when a request for memory cannot be fulfilled, so this is non-deterministic The calling thread is moved to the head of the running queue, and its preemption counters are saved, so when GC finishes, it will be rerun in a way that masks this interruption Problem On SMPs, a GC would not start in one processor until all GC threads on all processors were invoked Solution All queues on all processors, except for the idle thread, were disabled when GC runs
Performance Measurements Dual Intel Xeon 1.7 MHz with 1GB RDRAM memory RedHat Linux 7.2 Fast Ethernet (100Mbps) We compared the performance of FT-JVM on two replicas with unmodified non-replicated version of JikesRVM Measurements were done with JIT enabled We varied the length of frames Short frames imply frequent synchronization Higher overhead, but reduced data-loss With 1 second frames, the results are similar to original JVM
SciMark 2.0 Benchmark
I/O-bound LZW Address book
Overhead Breakdown 1 2 3 Application run time Send queue bottleneck Waiting for end of frame acknowledgement Raytrace LZW Address Book Percent of time per frame 100% 75% 50% 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 25% 0% 500 1000 5000 10000 50000 100000 Number of context switches per frame
SMP Performance
Limitations Benign failures only Assume data-race free programs (on SMPs) Assume that all native methods are deterministic Existing implementation does not replicate sockets Writes to files must be to deterministic locations High overhead for I/O intensive applications
Related Work Deterministic execution for active replication Jiménez-Peris, Patiño-Martínez, Arévalo - SRDS 00 Narasimhan, Moser, Mellier-Smith - SRDS 99 Deterministic replay of events Choi and Srinivasan - ACM SIGMETRICS SPDT 98 Konuru, Srinivasan, Choi IPDPS 00 Hypervisor fault-tolerance Bressoud and Schneider SOSP 95 FT JVM using primary/backup Napper, Alvisi and Vin DSN 03
Conclusion An active replication implementation of FT-JVM inside the VM layer Design that prevents non-determinism, especially on SMPs Performance measurements For non-i/o intensive applications, the performance compares favorably with original JVM with frames of about 1 second Open problem Dynamic load balancing between processors of an SMP, especially on the backup