Dynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin

Size: px

Start display at page:

Download "Dynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin"

Charlotte Morgan
5 years ago
Views:

1 Dynamic Resource Allocation for Distributed Dataflows Lauritz Thamsen Technische Universität Berlin

2 Distributed Dataflows E.g. MapReduce, SCOPE, Spark, and Flink Used for scalable processing in many domains, e.g. Search and data aggregation Relational processing Graph analysis Machine learning Dynamic Resource Allocation for Distributed Dataflows 2

3 Distributed Dataflow Jobs Stage Synchronization barrier e.g. Map or Filter e.g. Reduce Dynamic Resource Allocation for Distributed Dataflows 3

4 Compared to HPC High-level programming abstractions Comprehensive frameworks and distributed execution engines Usable even without extensive knowledge of parallel and distributed programming Dynamic Resource Allocation for Distributed Dataflows 4

5 Yet Users Need to Allocate Resources Job and scale-out Worker Node C C Cluster Manager Worker Node C C Worker Node C C C C C C C C Dynamic Resource Allocation for Distributed Dataflows 5

6 Constraints of Production Jobs Users need to reserve resources for runtime targets: Usability constraints, Service level agreements, Cluster reservation only valid for a certain time But do not necessarily understand workload dynamics Dynamic Resource Allocation for Distributed Dataflows 6

7 Two-Fold Problem Runtime Prediction: Estimating performance upfront is difficult as it depends on many factors, e.g. programs, data, systems, and infrastructure Runtime Variance: Significant performance variance due to data locality, interference, and failures Dynamic Resource Allocation for Distributed Dataflows 7

8 Given a runtime target for a recurring batch job, how can minimal resources be allocated automatically?

9 Methods & Results

Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out L. Thamsen, I. Verbitskiy, F. Schmidt, T. Renner, and O.

SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows. PDCAT 2017. IEEE. L. Thamsen, I. Verbitskiy, J. Beilharz, T. Renner, A.

10 Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out L. Thamsen, I. Verbitskiy, F. Schmidt, T. Renner, and O. Kao. Selecting Resources for Distributed Dataflow Systems According to Runtime Targets. IPCCC IEEE. J. Koch, L. Thamsen, F. Schmidt, and O. Kao. SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows. PDCAT IEEE. L. Thamsen, I. Verbitskiy, J. Beilharz, T. Renner, A. Polze, and O. Kao. Ellis: Dynamically Scaling Distributed Dataflows To Meet Runtime Targets. CloudCom IEEE Dynamic Resource Allocation for Distributed Dataflows 10

11 Dynamic Resource Allocation Job + Runtime Target Workload History Similarity Matching Scale-out Models Dynamic Resource Allocation for Distributed Dataflows 11

12 Assumptions Distributed dataflow jobs with multiple stages Dedicated homogeneous clusters Recurring batch processing jobs Dynamic Resource Allocation for Distributed Dataflows 12

13 Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out Dynamic Resource Allocation for Distributed Dataflows 13

14 Scale-out Modeling Find a function that predicts the runtimes of a job based on previous runs Example Job A: Example Job B: Runtime Runtime Scale-out Scale-out Dynamic Resource Allocation for Distributed Dataflows 14

15 Parametric Regression Non-Negative Least Squares (NNLS) for a model of distributed data-parallel processing [Ernest] Parallel portion Collect-like communication patterns!"#$%&' = ) * + ), Sequential portion 1 /0#$1%#'!2 + ) 3 log /0#$1%#'!2 + ) 7 /0#$1%#'!2 Tree-like communication patterns Dynamic Resource Allocation for Distributed Dataflows 15

16 Non-parametric Regression Assuming locally defined behavior with Local Linear Regression (LLR) Runtime Scale-out Dynamic Resource Allocation for Distributed Dataflows 16

17 Automatic Model Selection Goal: model with high accuracy when possible, otherwise use a robust fallback Approach: Use parametric model for extrapolation Dynamically select prediction model for interpolation with k-fold cross-validation Dynamic Resource Allocation for Distributed Dataflows 17

18 Evaluation Setup Using up to 60 commodity nodes (8 cores, 16 GB RAM) and exemplary distributed dataflow jobs Job System Dataset Input Size WordCount Flink Wiki 250 GB TPC-H Query 10 Flink Tables 200 GB K-Means Flink Points 50 GB Grep Spark Wiki 250 GB SGD Spark Features 10 GB PageRank Spark Graph 3.4 GB Dynamic Resource Allocation for Distributed Dataflows 18

19 Cross-Validation of Models Repeated random sub-sampling Random training selection Runtime Scale-out Dynamic Resource Allocation for Distributed Dataflows 19

20 Cross-Validation of Models Repeated random sub-sampling Runtime With increasing amounts of training data, n = Scale-out Dynamic Resource Allocation for Distributed Dataflows 20

21 Relative Prediction Error Grep (Spark) SGD (Spark) Dynamic Resource Allocation for Distributed Dataflows 21

22 Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out Addresses Problem of Runtime Prediction Problem of Runtime Variance Dynamic Resource Allocation for Distributed Dataflows 22

23 Select Executions for Model Training Recurring jobs that run on updated data, code, and parameters, yielding different runtimes Idea: Similar executions à similar runtimes ex cur Similarity Matching discard ex prev ex prev ex prev ex prev Job History use ex prev ex prev ex prev Dynamic Resource Allocation for Distributed Dataflows 23

24 Different Similarity Measures Functions s (ex current, ex previous ) ϵ [0,1] Measure Type Input size Scale-out Offline t t t Runtime of stages Convergence Online Resource utilization Dynamic Resource Allocation for Distributed Dataflows 24

25 Evaluation of Similarity Measures 1 Accuracy / Share / Average Accuracy 0 0 Similarity Dynamic Resource Allocation for Distributed Dataflows 25

26 Normalized Similarity Quality Accuracy normalized over share of selected points 1 1 Accuracy / Share / Average Accuracy Average Accuracy 0 0 Similarity % Percentage of Most Similar Runs 0% 1 Top 5% Top 25% Dynamic Resource Allocation for Distributed Dataflows 26

27 Evaluation of Similarity Measures Connected Components Page Rank Stochastic Gradient Descent 0,9 0,9 0,9 Average Accuracy 0,7 0,5 0,7 0,5 0,7 0,5 0,3 0,3 0,3 100% 0 0% 1 0,01 100% 0% 100% 0 0% % Percentage of Most Similar of Most Runs Similar Runs % of Most Similar Runs % of Most Similar Runs Input Convergence Network-I/O Dynamic Resource Allocation for Distributed Dataflows 27

28 Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out Addresses Problem of Runtime Prediction Problem of Runtime Variance Dynamic Resource Allocation for Distributed Dataflows 28

29 Resource Allocation Search for minimal scale-out for a runtime target Runtime Scale-out Scale-out Scale-out Dynamic Resource Allocation for Distributed Dataflows 29

30 Stage-wise Dynamic Allocations Job + Runtime Target Runtime Dynamic Resource Allocation Scale-out Scale-out Scale-out Dynamic Resource Allocation for Distributed Dataflows 30

31 Prototype System: Ellis Integrated and usable on a per-job basis with YARN Cluster App Client Resource Manager Workload History Container App Master Ellis Container App Worker Container Container App Worker App Worker Dynamic Resource Allocation for Distributed Dataflows 31

32 Evaluation Setup Using up to 40 commodity nodes (8 cores, 16 GB RAM) and exemplary iterative Spark jobs (MLlib) Job Dataset Input Size Multi-Layer Perceptron (MLP) Multiclass 29 GB Gradient Boosted Trees (GBT) Vandermonde 111 GB Stochastic Gradient Descent (SGD) Tables 37 GB K-Means Points 50 GB Dynamic Resource Allocation for Distributed Dataflows 32

33 Results for Gradient Boosted Trees Ellis static (only initially): Ellis dynamic (per stage): Dynamic Resource Allocation for Distributed Dataflows 33

34 Results for All Benchmark Jobs Comparison of!""#$ %&'()#*!""#$ $+(+#* in three metrics: Job Violation Count Violation Sum Resources SGD 25% 7% 94% K-Means 35% 5% 80% GBT 47% 29% 100% MLP 69% 13% 127% Dynamic Resource Allocation for Distributed Dataflows 34

35 Scale-out Modeling Dynamic Resource Allocation Similarity Matching Resource Allocation Runtime Scale-out Addresses Problem of Runtime Prediction Problem of Runtime Variance Dynamic Resource Allocation for Distributed Dataflows 35

36 Summary New methods for dynamic resource allocation that are applicable to distributed dataflow frameworks Prototype integrated and usable with YARN Promising evaluation results Dynamic Resource Allocation for Distributed Dataflows 36

37 References [GoogleMR] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51.1 (2008). ACM [Twitter] G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin. Fast Data in the Era of Big Data: Twitter s Real-time Related Query Suggestion Architecture ACM SIGMOD International Conference on Management of Data (SIGMOD 13). ACM [Google Trace] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis, Third ACM Symposium on Cloud Computing (SoCC 12). ACM [Quasar] C. Delimitrou and C. Kozyrakis, Quasar: Resource-efficient and QoS- aware Cluster Management, 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 14). ACM [Morpheus] S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. n. Goiri, S. Krishnan, J. Kulkarni, and S. Rao, Morpheus: Towards Automated SLOs for Enterprise Clusters, 12th USENIX Conference on Operating Systems Design and Implementation (OSDI 16). USENIX Dynamic Resource Allocation for Distributed Dataflows 38

SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows

SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows Jannis Koch, Lauritz Thamsen, Florian Schmidt, and Odej Kao Technische Universität Berlin, Germany {firstname.lastname}@tu-berlin.de