Detecting and Fixing Memory-Related Performance Problems in Managed Languages

Similar documents
Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs

Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine

Yak: A High-Performance Big-Data-Friendly Garbage Collector. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu

PERFBLOWER: Quickly Detecting Memory-Related Performance Problems via Amplification

1 Overview and Research Impact

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

MapReduce Design Patterns

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

Big data systems 12/8/17

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Towards Parallel, Scalable VM Services

Dynamic Analysis of Inefficiently-Used Containers

Homework 3: Wikipedia Clustering Cliff Engle & Antonio Lupher CS 294-1

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced

Improving the MapReduce Big Data Processing Framework

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

CSE 444: Database Internals. Lecture 23 Spark

Zing Vision. Answering your toughest production Java performance questions

Yak: A High-Performance Big-Data-Friendly Garbage Collector

A Dynamic Evaluation of the Precision of Static Heap Abstractions

Speculative Region-based Memory Management for Big Data Systems

Developing MapReduce Programs

Exploiting the Behavior of Generational Garbage Collector

Free-Me: A Static Analysis for Automatic Individual Object Reclamation

Introduction to MapReduce

Yak: A High-Performance Big-Data-Friendly Garbage Collector

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

A new Mono GC. Paolo Molaro October 25, 2006

Large-Scale API Protocol Mining for Automated Bug Detection

Go with the Flow: Profiling Copies to Find Run-time Bloat

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Harp-DAAL for High Performance Big Data Computing

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Shark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley

OS-caused Long JVM Pauses - Deep Dive and Solutions

Efficient Runtime Tracking of Allocation Sites in Java

MapReduce: A Programming Model for Large-Scale Distributed Computation

TI2736-B Big Data Processing. Claudia Hauff

Managed runtimes & garbage collection

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Trace-based JIT Compilation

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine

Probabilistic Calling Context

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

2/26/2017. For instance, consider running Word Count across 20 splits

CS427 Multicore Architecture and Parallel Computing

Heapy a memory profiler and debugger for Python

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM

Apache Flink- A System for Batch and Realtime Stream Processing

Run-time Environments - 3

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu (University of California, Merced) Alin Dobra (University of Florida)

Towards Garbage Collection Modeling

Shark: Hive (SQL) on Spark

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

THE TROUBLE WITH MEMORY

Data Partitioning and MapReduce

Accelerate Big Data Insights

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services

DON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Introduction to MapReduce Algorithms and Analysis

String Deduplication for Java-based Middleware in Virtualized Environments

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Hadoop MapReduce Framework

Runtime Application Self-Protection (RASP) Performance Metrics

the gamedesigninitiative at cornell university Lecture 9 Memory Management

Recent Performance Analysis with Memphis. Collin McCurdy Future Technologies Group

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Huge market -- essentially all high performance databases work this way

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Tuning Intelligent Data Lake Performance

Evolution From Shark To Spark SQL:

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

A Parallel R Framework

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Efficient Data Race Detection for Unified Parallel C

Compilers. 8. Run-time Support. Laszlo Böszörmenyi Compilers Run-time - 1

Analysis in the Big Data Era

Transcription:

Detecting and Fixing -Related Performance Problems in Managed Languages Lu Fang Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine May 26, 2017, Irvine, CA, USA Lu Fang (UC Irvine) Final Defense May 26, 2017 1 / 51

Performance Problems in Real World Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

Performance Problems in Real World Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

Performance Problems in Real World Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

Performance Problems in Real World Many distributed systems, such as Spark, Hadoop, also suffer from performance problems java.lang.outoferror: Java heap space Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

Performance Problems Commonly exist in real world applications Single-machine apps, such as Eclipse, IE Traditional databases, web servers, such as MySQL, Tomcat Big Data systems, such as Hadoop, Spark Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

Performance Problems Commonly exist in real world applications Single-machine apps, such as Eclipse, IE Traditional databases, web servers, such as MySQL, Tomcat Big Data systems, such as Hadoop, Spark Further exacerbated by managed languages Such as Java, C# Big overhead introduced by automatic memory management Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

Performance Problems Commonly exist in real world applications Single-machine apps, such as Eclipse, IE Traditional databases, web servers, such as MySQL, Tomcat Big Data systems, such as Hadoop, Spark Further exacerbated by managed languages Such as Java, C# Big overhead introduced by automatic memory management Cannot be optimized by compilers Cannot understand the deep semantics Cannot guarantee the correctness Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

Performance Problems Difficult to find, especially during development Invisible effect Often escape to production runs Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

Performance Problems Difficult to find, especially during development Invisible effect Often escape to production runs Difficult to fix Large systems are complicated Enough diagnostic information is necessary Problems may be located deeply in systems Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

Performance Problems Difficult to find, especially during development Invisible effect Often escape to production runs Difficult to fix Large systems are complicated Enough diagnostic information is necessary Problems may be located deeply in systems Can lead to severe problems Scalability reductions Programs hang and crash Financial losses Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

Existing Solutions Many solutions are proposed Pattern-based Mining-based Learning-based Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51

Existing Solutions Many solutions are proposed Pattern-based Mining-based Learning-based Most are postmortem debugging techniques Require user logs/input to trigger bugs Bugs already escape to production runs Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51

Drawbacks in Existing Works Lacking a general way to describe problems Cannot detect problems under small workload Lacking a systematic approach to tune memory usage in data-intensive systems Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

Drawbacks in Existing Works Our Solutions Lacking a general way to describe problems Instrumentation Specification Language (ISL) Cannot detect problems under small workload Lacking a systematic approach to tune memory usage in data-intensive systems Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

Drawbacks in Existing Works Our Solutions Lacking a general way to describe problems Instrumentation Specification Language (ISL) Cannot detect problems under small workload PerfBlower Lacking a systematic approach to tune memory usage in data-intensive systems Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

Drawbacks in Existing Works Our Solutions Lacking a general way to describe problems Instrumentation Specification Language (ISL) Cannot detect problems under small workload PerfBlower Lacking a systematic approach to tune memory usage in data-intensive systems I Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

Lu Fang, Liang Dou, Guoqing Xu PerfBlower: Quickly Detecting -Related Performance Problems via Amplification ECOOP 15 Lu Fang (UC Irvine) Final Defense May 26, 2017 7 / 51

Instrumentation Specification Language Motivation 1: an easy way to develop new detectors Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51

Instrumentation Specification Language Motivation 1: an easy way to develop new detectors Motivation 2: detect the problems with small effects Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51

Instrumentation Specification Language Focus on problems with observable heap symptoms Users define symptoms/counter-evidence in events Two important actions: amplify and deamplify Lu Fang (UC Irvine) Final Defense May 26, 2017 9 / 51

Amplification and Deamplification amplify: increases the penalty Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

Amplification and Deamplification amplify: increases the penalty deamplify: resets the penalty Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

Amplification and Deamplification amplify: increases the penalty deamplify: resets the penalty Virtual space overhead (VSO) VSO = Sum penalty +Size live heap Size live heap Reflects the severity on 2 dementions: Time and Size Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

An ISL Program Example Detecting Leaking Object Arrays Context TypeContext { type = ''java.lang.object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useflag = false; } Event on_rw(object o, Field f, Word w1, Word w2) { o.useflag = true; deamplify(o); } Event on_reachedonce(object o) { UseHistory h = gethistory(o); h.update(o.useflag); if (h.isfull() &&!h.contains(true)) amplify(o); o.useflag = false; } Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

An ISL Program Example 1 Context defines the type 2 History of partition instance 3 Heap partitioning 4 Tracked objects Detecting Leaking Object Arrays Context TypeContext { type = ''java.lang.object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useflag = false; } Event on_rw(object o, Field f, Word w1, Word w2) { o.useflag = true; deamplify(o); } Event on_reachedonce(object o) { UseHistory h = gethistory(o); h.update(o.useflag); if (h.isfull() &&!h.contains(true)) amplify(o); o.useflag = false; } Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

An ISL Program Example 1 Context defines the type 2 History of partition instance 3 Heap partitioning 4 Tracked objects 5 The actions on events Detecting Leaking Object Arrays Context TypeContext { type = ''java.lang.object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useflag = false; } Event on_rw(object o, Field f, Word w1, Word w2) { o.useflag = true; deamplify(o); } Event on_reachedonce(object o) { UseHistory h = gethistory(o); h.update(o.useflag); if (h.isfull() &&!h.contains(true)) amplify(o); o.useflag = false; } Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

PerfBlower A general performance testing framework Supports ISL Can capture problems with small effects Reports reference path to problematic objects Lu Fang (UC Irvine) Final Defense May 26, 2017 12 / 51

PerfBlower Lu Fang (UC Irvine) Final Defense May 26, 2017 13 / 51

Heap Reference Path 1 Object leak is referenced by array Leak is reference by whom? Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak... //... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

Heap Reference Path 1 Object leak is referenced by array 2 Knowing allocation site 1 is not enough Leak is reference by whom? Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak... //... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

Heap Reference Path 1 Object leak is referenced by array 2 Knowing allocation site 1 is not enough 3 Key point: array keeps a reference to leak, which can be shown by leak s heap reference path Leak is reference by whom? Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak... //... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

Mirroring the reference path Original Objects Mirroring Ref. Path Stack stack = new stack; // Allocation site 1, creating the leak. Object obj = new Object(); // stack.elements[0] = leak stack.push(); // Keep using Object leak... Mirror Objects //... Never use obj again // However, leak is referenced by stack, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 15 / 51

Experiments Three detectors leak amplifier Under-utilized container amplifier Over-populated container amplifier DaCapo benchmarks with 500MB heap Lu Fang (UC Irvine) Final Defense May 26, 2017 16 / 51

VSOs Leak Reported Amplifierby Leak Amplifier 60 VSOs caused by confirmed memory leaks 50 Basic VSOs 40 30 20 10 0 VSO is large The program is likely to have leaks antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan Programs with confirmed unknown leaks Lu Fang (UC Irvine) Final Defense May 26, 2017 17 / 51

Under-Utilized VSOs Reported Container by Under-Utilized Amplifier Container Amplifier 60 50 40 VSOs caused by confirmed under-utilized containers Basic VSOs 30 20 10 0 VSO is large The program is very likely to have UUCs antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan Programs with confirmed unknown UUCs Lu Fang (UC Irvine) Final Defense May 26, 2017 18 / 51

Over-Populated VSOs Reported Container by Over-Populated Amplifier Container Amplifier 30 25 20 VSOs caused by confirmed over-populated containers Basic VSOs 15 10 VSO is large The program is very likely to have OPCs 5 0 antlr bloat eclipse fop luindexlusearch pmd xalan hsqldb jython Programs with confirmed unknown OPCs Lu Fang (UC Irvine) Final Defense May 26, 2017 19 / 51

Performance Improvements Benchmark Space Reduction Time Reduction xalan-leak 25.4% 14.6% jython-leak 24.3% 7.4% hsqldb-leak 15.6% 3.1% xalan-uuc 5.4% 34.1% jython-uuc 19.1% 1.1% hsqldb-uuc 17.4% 0.7% hsqldb-opc 14.9% 2.9% Lu Fang (UC Irvine) Final Defense May 26, 2017 20 / 51

The Effectiveness of PerfBlower VSOs indicate the existence of problems 8 unknown problems are detected All reports contain useful diagnostic information Low overhead Space overheads are 1.23 1.25 Time overheads are 2.39 2.74 Lu Fang (UC Irvine) Final Defense May 26, 2017 21 / 51

Fixing Performance Problems Fixing performance problems is hard Enough information is necessary Have to understand the logic of the system The problem exists deeply in the system pressure A common performance problem in data-paralle systems Lu Fang (UC Irvine) Final Defense May 26, 2017 22 / 51

Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, Shan Lu Interruptible s: Treating Pressure As Interrupts for Highly Scalable Data-Parallel Programs SOSP 15 Lu Fang (UC Irvine) Final Defense May 26, 2017 23 / 51

Pressure in Data-Parallel Systems Data-parallel system Input data are divided into independent partitions Many popular big data systems Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51

Pressure in Data-Parallel Systems Data-parallel system Input data are divided into independent partitions Many popular big data systems pressure on single nodes Our study Search out of memory and data parallel in StackOverflow We have collected 126 related problems Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51

Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time CRASH OutOf Error Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time CRASH OutOf Error SLOW Huge GC effort Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

Root Cause 1: Hot Keys Key-value pairs Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

Root Cause 1: Hot Keys Key-value pairs Popular keys have many associated values Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

Root Cause 1: Hot Keys Key-value pairs Popular keys have many associated values Case study (from StackOverflow) Process StackOverflow posts Long and popular posts Many tasks process long and popular posts Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

Root Cause 2: Large Intermediate Results Temporary data structures Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51

Root Cause 2: Large Intermediate Results Temporary data structures Case study (from StackOverflow) Use NLP library to process customers reviews Some reviews are quite long NLP library creates giant temporary data structures for long reviews Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51

Existing Solutions More memory? Not really! Data double in size every two years, [http://goo.gl/tm92i0] double in size every three years, [http://goo.gl/50rrgk] Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

Existing Solutions More memory? Not really! Data double in size every two years, [http://goo.gl/tm92i0] double in size every three years, [http://goo.gl/50rrgk] Application-level solutions Configuration tuning Skew fixing Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

Existing Solutions More memory? Not really! Data double in size every two years, [http://goo.gl/tm92i0] double in size every three years, [http://goo.gl/50rrgk] Application-level solutions Configuration tuning Skew fixing System-level solutions Cluster-wide resource manager, such as YARN Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

Existing Solutions More memory? Not really! Data double in size every two years, [http://goo.gl/tm92i0] double in size every three years, [http://goo.gl/50rrgk] Application-level solutions Configuration tuning Skew fixing System-level solutions Cluster-wide resource manager, such as YARN We need a systematic and effective solution! Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

Our Solution Interruptible : treat memory pressure as interrupt Dynamically change parallelism degree Lu Fang (UC Irvine) Final Defense May 26, 2017 29 / 51

Why Does Our Technique Help consumption Program starts with multiple tasks Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Program pushes heap limit Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Long and useless GC Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption OutOf Error Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Long and useless GCs are detected Heap size Execution time Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Long and useless GCs are detected, start interrupting tasks Heap size Execution time Killed Killed Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Processed Input Unprocessed Input Output Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Unprocessed Input Output Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Output Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Final result: push out and released Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Final result: push out and released Intermediate result: kept in memory, can be serialized Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption Program executes without memory pressure Heap size Execution time Killed Killed Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Why Does Our Technique Help consumption If there is enough memory, increase parallelism degree Heap size Execution time Killed Killed Newly created Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

Making Interruptible Is Non-trivial Interrupted???? Released or Kept in? Released or Kept in? Released or Kept in? Released or Kept in? Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51

Making Interruptible Is Non-trivial Interrupted? Released or Kept in?? Released or Kept in? Require Semantics? Released or Kept in?? Released or Kept in? Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51

Challenges How to expose semantics How to interrupt/reactivate tasks Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

Challenges How to expose semantics a programming model How to interrupt/reactivate tasks Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

Challenges How to expose semantics a programming model How to interrupt/reactivate tasks a runtime system Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

Challenges How to expose semantics a programming model How to interrupt/reactivate tasks a runtime system Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

The Programming Model An I requires more semantics Separate processed and unprocessed input Specify how to serialize and deserialize Safely interrupt tasks Specify the actions when interrupt happens Merge the intermediate results Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51

The Programming Model An I requires more semantics Separate processed and unprocessed input Specify how to serialize and deserialize Safely interrupt tasks Specify the actions when interrupt happens Merge the intermediate results A unified representation of input/output A definition of an interruptible task Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51

Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class 1 A cursor points to the first unprocessed tuple // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class 1 A cursor points to the first unprocessed tuple 2 Users implement serialize and deserialize methods // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class 1 In interrupt, we define how to deal with partial results // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class 1 In interrupt, we define how to deal with partial results 2 s are always interrupted at the beginning in the scaleloop // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

Multiple Input for an I How to merge intermediate results MI Abstract Class // The MI interface in the library abstract class MI extends I{ // Most parts are the same as I... // The only difference boolean scaleloop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasnext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition... } return true; } } Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51

Multiple Input for an I How to merge intermediate results 1 scaleloop takes a PartitionIterator as input MI Abstract Class // The MI interface in the library abstract class MI extends I{ // Most parts are the same as I... // The only difference boolean scaleloop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasnext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition... } return true; } } Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51

I WordCount on Hyracks HDFS Map Operator Map Operator Map Operator... Final Reduce Operator Final Final Shuffling Reduce Operator... Reduce Operator MapOperator class MapOperator extends I implements HyracksOperator { void interrupt() { // Push out final // results to shuffling... } // Some other fields and methods... } 1 1 1 n n Merge Operator Merge Operator HDFS Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

I WordCount on Hyracks HDFS 1 Map Operator Map Operator Map Operator Final Final Final Shuffling Reduce Operator Reduce Operator... Reduce Operator 1 1 n n... ReduceOperator class ReduceOperator extends I implements HyracksOperator { void interrupt() { // Tag the results; // Output as intermediate // results... } // Some other fields and methods... } Merge Operator Merge Operator HDFS Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

I WordCount on Hyracks HDFS Map Operator Map Operator Map Operator... Final Reduce Operator Final Final Shuffling Reduce Operator... Reduce Operator MergeOperator class Merge extends MI { void interrupt() { // Tag the results; // Output as intermediate // results } // Some other fields and methods... } 1 1 1 n n Merge Operator Merge Operator HDFS Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

Challenges How to expose semantics a programming model How to interrupt/activate tasks a runtime system Lu Fang (UC Irvine) Final Defense May 26, 2017 38 / 51

I Runtime System Scheduler Partition Manager Data Partition Data Partition Monitor Data Partition Data Partition I Runtime System Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

I Runtime System Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

I Runtime System Is Disk Input/Output Serialize/Deserialize Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

I Runtime System Is Disk Interrupt/Create Input/Output Serialize/Deserialize Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

Evaluation Environments We have implemented I on Hadoop 2.6.0 Hyracks 0.2.14 Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51

Evaluation Environments We have implemented I on Hadoop 2.6.0 Hyracks 0.2.14 An 11-node Amazon EC2 cluster Each machine: 8 cores, 15GB, 80GB*2 SSD Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51

Experiments on Hadoop Goal Show the effectiveness on real-world problems Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51

Experiments on Hadoop Goal Show the effectiveness on real-world problems Benchmarks Original: five real-world programs collected from Stack Overflow RFix: apply the fixes recommended on websites I: apply I on original programs Name Map-Side Aggregation (MSA) In-Map Combiner (IMC) Inverted-Index Building (IIB) Word Cooccurrence Matrix (WCM) Customer Review Processing (CRP) Dataset Stack Overflow Full Dump Wikipedia Full Dump Wikipedia Full Dump Wikipedia Full Dump Wikipedia Sample Dump Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51

Improvements Benchmark Original Time RFix Time I Time Speed Up MSA 1047 (crashed) 48 72-33.3% IMC 5200 (crashed) 337 238 41.6% IIB 1322 (crashed) 2568 1210 112.2% WCM 2643 (crashed) 2151 1287 67.1% CRP 567 (crashed) 6761 2001 237.9% With I, all programs survive memory pressure On average, I versions are 62.5% faster than RFix Lu Fang (UC Irvine) Final Defense May 26, 2017 42 / 51

Experiments on Hyracks Goal Show the improvements on performance Show the improvements on scalability Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51

Experiments on Hyracks Goal Show the improvements on performance Show the improvements on scalability Benchmarks Original: five hand-optimized applications from repository I: apply I on original programs Name WordCount (WC) Heap Sort (HS) Inverted Index (II) Hash Join (HJ) Group By (GR) Dataset Yahoo Web Map and Its Subgraphs Yahoo Web Map and Its Subgraphs Yahoo Web Map and Its Subgraphs TPC-H Data TPC-H Data Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51

Tuning Configurations for Original Programs Configurations for best performance Name Thread Number Granularity WordCount (WC) 2 32KB Heap Sort (HS) 6 32KB Inverted Index (II) 8 16KB Hash Join (HJ) 8 32KB Group By (GR) 6 16KB Configurations for best scalability Name Thread Number Granularity WordCount (WC) 1 4KB Heap Sort (HS) 1 4KB Inverted Index (II) 1 4KB Hash Join (HJ) 1 4KB Group By (GR) 1 4KB Lu Fang (UC Irvine) Final Defense May 26, 2017 44 / 51

Improvements on Performance Normalized Speed Up 2 1 0 Original Best I 1.67 1.61 1.4 1.28 1 1 1.11 1 1 1 WC HS II HJ GR On average, I is 34.4% faster Lu Fang (UC Irvine) Final Defense May 26, 2017 45 / 51

Improvements on Scalability Normalized Dataset Size 20 10 0 5.1 6 5 2.7 1 1 1 1 1 WC HS II HJ GR 24 Original Best I On average, I scales to 6.3 + larger datasets Lu Fang (UC Irvine) Final Defense May 26, 2017 46 / 51

The Effectiveness of I I is pratical it has helped 13 real-world applications survive memory problems I improves performance and scalability On Hadoop, I is 62.5% faster On Hyracks, I is 34.4% faster I helps programs scale to 6.3 larger datasets A programming model + a runtime system Non-intrusive Easy to use Lu Fang (UC Irvine) Final Defense May 26, 2017 47 / 51

Conclusions First general technique to amplify problems A class of performance problems Reveals pontential problems during testing A general performance testing framework Includes a compiler and a runtime system Very pratical First systematic approach to address memory pressure Consists of a programming model and a runtime system Solves real-world problems Significantly improves data-parallel tasks performance and scalability Lu Fang (UC Irvine) Final Defense May 26, 2017 48 / 51

Future Works Extend ISL Add support into production JVMs Consider more factors to improve test oracle Instantiate I in more data-parallel systems Lu Fang (UC Irvine) Final Defense May 26, 2017 49 / 51

Publications K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, O. Mutlu Yak: A High-Performance Big-Data-Friendly Garbage Collector OSDI 16 Z. Zuo, L. Fang, S. Khoo, G. Xu, S. Lu Low-Overhead and Fully Automated Statistical Debugging with Abstraction Refinement OOPSLA 16 K. Nguyen, L. Fang, G. Xu, B. Demsky. Speculative Region-based Management for Big Data Systems PLOS 15 L. Fang, K. Nguyen, G. Xu, B. Demsky, S. Lu Interruptible s: Treating Pressure As Interrupts for Highly Scalable Data-Parallel Programs SOSP 15 L. Fang, L. Dou, G. Xu PerfBlower: Quickly Detecting -Related Performance Problems via Amplification ECOOP 15 K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, G. Xu Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications ASPLOS 15 Lu Fang (UC Irvine) Final Defense May 26, 2017 50 / 51

Thank You Q & A Lu Fang (UC Irvine) Final Defense May 26, 2017 51 / 51