Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs

Size: px

Start display at page:

Download "Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs"

Maximilian O’Brien’
6 years ago
Views:

1 Interruptible s: Treating Pressure as Interrupts for Highly Scalable Data-Parallel Programs Lu Fang 1, Khanh Nguyen 1, Guoqing(Harry) Xu 1, Brian Demsky 1, Shan Lu 2 1 University of California, Irvine 2 University of Chicago SOSP 15, October 7, 2015, Monterey, California, USA Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

2 Motivation Data-parallel system Input data are divided into independent partitions Many popular big data systems Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

3 Motivation Data-parallel system Input data are divided into independent partitions Many popular big data systems pressure on single nodes Our study Search out of memory and data parallel in StackOverflow We have collected 126 related problems Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

4 Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

5 Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time CRASH OutOf Error Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

6 Pressure in the Real World pressure on individual nodes Executions push heap limit (using managed language) Data-parallel systems struggle for memory consumption OutOfError point Long and useless GC Heap size Execution time CRASH OutOf Error SLOW Huge GC effort Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

7 Root Cause 1: Hot Keys Key-value pairs Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

8 Root Cause 1: Hot Keys Key-value pairs Popular keys have many associated values Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

9 Root Cause 1: Hot Keys Key-value pairs Popular keys have many associated values Case study (from StackOverflow) Process StackOverflow posts Long and popular posts Many tasks process long and popular posts Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

10 Root Cause 2: Large Intermediate Results Temporary data structures Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

11 Root Cause 2: Large Intermediate Results Temporary data structures Case study (from StackOverflow) Use NLP library to process customers review Some reviews are quite long NLP library creates giant temporary data structures for long reviews Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

12 Existing Solutions More memory? Not really! Data double in size every two years, [ double in size every three years, [ Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

13 Existing Solutions More memory? Not really! Data double in size every two years, [ double in size every three years, [ Application-level solutions Configuration tuning Skew fixing Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

14 Existing Solutions More memory? Not really! Data double in size every two years, [ double in size every three years, [ Application-level solutions Configuration tuning Skew fixing System-level solutions Cluster-wide resource manager, such as YARN Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

15 Existing Solutions More memory? Not really! Data double in size every two years, [ double in size every three years, [ Application-level solutions Configuration tuning Skew fixing System-level solutions Cluster-wide resource manager, such as YARN We need a systematic and effective solution! Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

16 Our Solution Interruptible : treat memory pressure as interrupt Dynamically change parallelism degree Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

17 Why Does Our Technique Help consumption Program starts with multiple tasks Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

18 Why Does Our Technique Help consumption Program pushes heap limit Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

19 Why Does Our Technique Help consumption Long and useless GC Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

20 Why Does Our Technique Help consumption OutOf Error Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

21 Why Does Our Technique Help consumption Long and useless GCs are detected Heap size Execution time Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

22 Why Does Our Technique Help consumption Long and useless GCs are detected, start interrupting tasks Heap size Execution time Killed Killed Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

23 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Processed Input Unprocessed Input Output Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

24 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Unprocessed Input Output Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

25 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Output Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

26 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

27 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Final result: push out and released Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

28 Why Does Our Technique Help consumption Release the memory, memory pressure is gone Heap size Execution time Killed Killed Local Data Structures Released Processed Input Released Unprocessed Input Kept in memory, can be serialized Output Final result: push out and released Intermediate result: kept in memory, can be serialized Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

29 Why Does Our Technique Help consumption Program executes without memory pressure Heap size Execution time Killed Killed Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

30 Why Does Our Technique Help consumption If there is enough memory, increase parallelism degree Heap size Execution time Killed Killed Newly created Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

31 Challenges How to expose semantics How to interrupt/reactivate tasks Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

32 Challenges How to expose semantics a programming model How to interrupt/reactivate tasks Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

33 Challenges How to expose semantics a programming model How to interrupt/reactivate tasks a runtime system Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

34 Challenges How to expose semantics a programming model How to interrupt/reactivate tasks a runtime system Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

35 The Programming Model A unified representation of input/output Separate processed and unprocessed input Specify how to serialize and deserialize Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

36 The Programming Model A unified representation of input/output Separate processed and unprocessed input Specify how to serialize and deserialize A definition of an interruptible task Safely interrupt tasks Specify the actions when interrupt happens Merge the intermediate results Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

37 Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

38 Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class 1 A cursor points to the first unprocessed tuple // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

39 Representing Input/Output as DataPartitions How to separate processed and unprocessed input How to serialize and deserialize the data DataPartition Abstract Class 1 A cursor points to the first unprocessed tuple 2 Users implement serialize and deserialize methods // The DataPartition abstract class abstract class DataPartition { // Some fields and methods... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

40 Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

41 Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class 1 In interrupt, we define how to deal with partial results // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

42 Defining an I What actions should be taken when interrupt happens How to safely interrupt a task I Abstract Class 1 In interrupt, we define how to deal with partial results 2 s are always interrupted at the beginning in the scaleloop // The I interface in the library abstract class I { // Some methods... abstract void interrupt(); boolean scaleloop(datapartition dp) { // Iterate dp, and process each tuple while (dp.hasnext()) { // If pressure occurs, interrupt if (HasPressure()) { interrupt(); return false; } process(); } } } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

43 Multiple Input for an I How to merge intermediate results MI Abstract Class // The MI interface in the library abstract class MI extends I{ // Most parts are the same as I... // Only difference boolean scaleloop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasnext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition... } return true; } } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

44 Multiple Input for an I How to merge intermediate results 1 scaleloop takes a PartitionIterator as input MI Abstract Class // The MI interface in the library abstract class MI extends I{ // Most parts are the same as I... // Only difference boolean scaleloop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasnext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition... } return true; } } Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

45 I WordCount on Hyracks HDFS Map Operator Map Operator Map Operator... Final Reduce Operator Final Final Shuffling Reduce Operator... Reduce Operator MapOperator class MapOperator extends I implements HyracksOperator { void interrupt() { // Push out final // results to shuffling... } // Some other fields and methods... } n n Merge Operator Merge Operator HDFS Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

46 I WordCount on Hyracks HDFS 1 Map Operator Map Operator Map Operator Final Final Final Shuffling Reduce Operator Reduce Operator... Reduce Operator 1 1 n n... ReduceOperator class ReduceOperator extends I implements HyracksOperator { void interrupt() { // Tag the results; // Output as intermediate // results... } // Some other fields and methods... } Merge Operator Merge Operator HDFS Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

47 I WordCount on Hyracks HDFS Map Operator Map Operator Map Operator... Final Reduce Operator Final Final Shuffling Reduce Operator... Reduce Operator MergeOperator class Merge extends MI { void interrupt() { // Tag the results; // Output as intermediate // results } // Some other fields and methods... } n n Merge Operator Merge Operator HDFS Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

48 Challenges How to expose semantics a programming model How to interrupt/activate tasks a runtime system Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

49 I Runtime System Scheduler Partition Manager Data Partition Data Partition Monitor Data Partition Data Partition I Runtime System Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

50 I Runtime System Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

51 I Runtime System Is Disk Input/Output Serialize/Deserialize Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

52 I Runtime System Is Disk Interrupt/Create Input/Output Serialize/Deserialize Scheduler Partition Manager Data Partition Grow/Reduce Data Partition Check Monitor Reduce Data Partition Data Partition I Runtime System Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

53 Evaluation Environments We have implemented I on Hadoop Hyracks Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

54 Evaluation Environments We have implemented I on Hadoop Hyracks An 11-node Amazon EC2 cluster Each machine: 8 cores, 15GB, 80GB*2 SSD Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

55 Experiments on Hadoop Goal Show the effectiveness on real-world problems Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

56 Experiments on Hadoop Goal Show the effectiveness on real-world problems Benchmarks Original: five real-world programs collected from Stack Overflow RFix: apply the fixes recommended on websites I: apply I on original programs Name Map-Side Aggregation (MSA) In-Map Combiner (IMC) Inverted-Index Building (IIB) Word Cooccurrence Matrix (WCM) Customer Review Processing (CRP) Dataset Stack Overflow Full Dump Wikipedia Full Dump Wikipedia Full Dump Wikipedia Full Dump Wikipedia Sample Dump Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

57 Improvements Benchmark Original Time RFix Time I Time Speed Up MSA 1047 (crashed) % IMC 5200 (crashed) % IIB 1322 (crashed) % WCM 2643 (crashed) % CRP 567 (crashed) % With I, all programs survive memory pressure On average, I versions are 62.5% faster than RFix Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

58 Experiments on Hyracks Goal Show the improvements on performance Show the improvements on scalability Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

59 Experiments on Hyracks Goal Show the improvements on performance Show the improvements on scalability Benchmarks Original: five hand-optimized applications from repository I: apply I on original programs Name WordCount (WC) Heap Sort (HS) Inverted Index (II) Hash Join (HJ) Group By (GR) Dataset Yahoo Web Map and Its Subgraphs Yahoo Web Map and Its Subgraphs Yahoo Web Map and Its Subgraphs TPC-H Data TPC-H Data Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

60 Tuning Configurations for Original Programs Configurations for best performance Name Thread Number Granularity WordCount (WC) 2 32KB Heap Sort (HS) 6 32KB Inverted Index (II) 8 16KB Hash Join (HJ) 8 32KB Group By (GR) 6 16KB Configurations for best scalability Name Thread Number Granularity WordCount (WC) 1 4KB Heap Sort (HS) 1 4KB Inverted Index (II) 1 4KB Hash Join (HJ) 1 4KB Group By (GR) 1 4KB Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

61 Improvements on Performance Normalized Speed Up Original Best I WC HS II HJ GR On average, I is 34.4% faster Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

62 Improvements on Scalability Normalized Dataset Size WC HS II HJ GR 24 Original Best I On average, I scales to larger datasets Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

63 Conclusions A programming model + a runtime system Non-intrusive Easy to use Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

64 Conclusions A programming model + a runtime system Non-intrusive Easy to use First systematic approach Help data-parallel tasks survive memory pressure I improves performance and scalability On Hadoop, I is 62.5% faster On Hyracks, I is 34.4% faster I helps programs scale to 6.3 larger datasets Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

65 Thank You Q & A Lu Fang et al. Interruptible s SOSP 15, October 7, / 25

Detecting and Fixing Memory-Related Performance Problems in Managed Languages

Detecting and Fixing Memory-Related Performance Problems in Managed Languages Detecting and Fixing -Related Performance Problems in Managed Languages Lu Fang Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine May 26, 2017,