Yak: A High-Performance Big-Data-Friendly Garbage Collector. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu
|
|
- Amelia Cameron
- 5 years ago
- Views:
Transcription
1 Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu University of California, Irvine University of Chicago Onur Mutlu ETH Zürich
2 2
3 Naiad 2
4 Background: GC Heap 3
5 Background: GC Heap 4
6 Background: GC Heap 4
7 Background: GC Heap 4
8 Background: GC Heap 4
9 Background: GC Heap 4
10 5
11 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] 5
12 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] Management cost GC time accounts for up to 50% of the execution time [Bu et al., ISMM 13] 5
13 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] Management cost GC time accounts for up to 50% of the execution time [Bu et al., ISMM 13] 5
14 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF 6
15 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF 6
16 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path Data path 6
17 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6
18 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6
19 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6
20 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER SLAVE Node Controller... Node Controller SLAVE Aggregate Join UDF Aggregate Join UDF Control path 6
21 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6
22 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Data path 6
23 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Data path 6
24 WordCount 7
25 WordCount Document Mapper 7
26 WordCount Document Mapper setup 7
27 WordCount Document Mapper setup 7
28 WordCount Document Mapper setup map 7
29 WordCount Document Mapper setup Heap map 7
30 WordCount Document Mapper setup Heap map 7
31 WordCount Document Mapper setup Heap map 7
32 WordCount Document Mapper setup Heap map 7
33 WordCount Document Mapper setup Heap map 7
34 WordCount Document Mapper setup Heap map cleanup 7
35 WordCount Document Mapper setup Heap map cleanup 7
36 WordCount Document Mapper setup Heap map cleanup 7
37 WordCount Document setup Many data objects have the same life span: Mapper Heap map cleanup 7
38 WordCount Document setup Many data objects have the same life span: Mapper Heap GC map cleanup 7
39 WordCount Document setup Many data objects have the same life span: Mapper Heap GC map cleanup GC run in the middle is wasted 7
40 WordCount Document setup Many data objects have the same life span: Mapper Memory Heap Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7
41 WordCount Document setup Many data objects have the same life span: Mapper Memory Heap Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7
42 WordCount Document setup Many data objects have the same life span: Mapper Memory Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7
43 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems 8
44 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems What about control path? 8
45 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems What about control path? generational GC region-based memory management 8
46 Yak Approach Heap 9
47 Yak Approach Control Space Data Space 9
48 Yak Approach Control Space Data Space Generational GC 9
49 Yak Approach Control Space Data Space Generational GC Region-based Memory Management 9
50 Yak Approach Control Space Data Space Generational GC Region-based Memory Management Reduced memory management cost 9
51 Yak Approach Control Space Data Space Generational GC Region-based Memory Management Reduced memory management cost annotate epoch boundary: - epoch_start() - epoch_end() 9
52 Correctness Region 10
53 Correctness Region 10
54 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] 10
55 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring 10
56 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring Developers 10
57 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring Developers Yak 10
58 Yak: An Automatic Solution Yak: the first hybrid GC 11
59 Yak: An Automatic Solution Yak: the first hybrid GC Implemented in OpenJDK 8 Modified the interpreter, two JIT compilers, the heap layout, the Parallel Scavenge GC NO code refactoring needed; correctness guaranteed by the system 11
60 Yak: An Automatic Solution Yak: the first hybrid GC Implemented in OpenJDK 8 Modified the interpreter, two JIT compilers, the heap layout, the Parallel Scavenge GC NO code refactoring needed; correctness guaranteed by the system On average, vs. default production GC (PS): Reduce 33% execution time Reduce 78% GC time 11
61 Challenges How to create regions? How to reclaim regions correctly? 12
62 How to Create Regions? A region starts at a call to epoch-start and ends at a call to epoch-end The location of epochs affects performance but not correctness Regions are thread-local Regions can be nested 13
63 How to Create Regions? void main() { } //end of main 14
64 How to Create Regions? void main() { CS,* } //end of main 14
65 How to Create Regions? void main() { epoch_start(); for( ) { epoch #1 CS,* 1,t 1 1,t 2 } epoch_end(); } //end of main 14
66 void main() { epoch_start(); for( ) { epoch_start(); for( ) { How to Create Regions? epoch #1 epoch #2 1,t 1 CS,* 1,t 2 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 14
67 void main() { epoch_start(); for( ) { epoch_start(); for( ) { epoch_start(); for( ) { How to Create Regions? epoch #1 epoch #2 epoch #3 1,t 1 CS,* 1,t 2 } epoch_end(); 2,t 1 2,t 2 } epoch_end(); } epoch_end(); 3,t 1 3,t 2 } //end of main 14
68 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); 3,t 1 3,t 2 } //end of main 14
69 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 3,t 1, 2,t 1 ) = 2,t 1 14
70 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 3,t 1, 2,t 1 ) = 2,t 1 14
71 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 2,t 1, 2,t 2 ) = CS,* 14
72 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 2,t 1, 2,t 2 ) = CS,* 14
73 How to Reclaim Regions Correctly? Object Promotion Algorithm 15
74 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: 15
75 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions 15
76 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions Where: Decide the relocation destination: 15
77 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions Where: Decide the relocation destination: Query region semilattice 15
78 Region Deallocation <CS,*> <r 1, t 1 > epoch_end() <r 2, t 1 > 16
79 Region Deallocation <CS,*> <r 1, t 1 > epoch_end() Remember Set <r 2, t 1 > 16
80 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set <r 2, t 1 > 16
81 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set <r 2, t 1 > 16
82 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set Barrier <r 2, t 1 > 16
83 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set Barrier <r 2, t 1 > 16
84 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set Barrier <r 2, t 1 > 16
85 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
86 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
87 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
88 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
89 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
90 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
91 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
92 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
93 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16
94 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Barrier 16
95 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack 16
96 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() 16
97 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack 16
98 Evaluations 3 real-world systems, 9 applications: Hadoop Popular distributed MapReduce implementation Hyracks [Borkar et al., ICDE 11] Data-parallel platform to run data-intensive jobs on a cluster of shared-nothing machines GraphChi [Kyrola et al., OSDI 12] High-performance graph analytical framework for a single machine 17
99 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18
100 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18
101 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18
102 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18
103 More Data in the Paper Statistics on Yak s heap Overhead breakdown Write barrier cost Extra space cost 19
104 Conclusion Goal: Reduce memory management cost in Big Data systems Solution: Yak, the first hybrid GC 33% execution time savings 78% GC time reduction Requires almost zero user effort 20
105 Thank You!
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen, Lu Fang, Guoqing Xu, and Brian Demsky; University of California, Irvine; Shan Lu, University of Chicago; Sanazsadat Alamian, University
More informationYak: A High-Performance Big-Data-Friendly Garbage Collector
Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen Lu Fang Guoqing Xu Brian Demsky Shan Lu Sanazsadat Alamian Onur Mutlu University of California, Irvine University of Chicago ETH
More informationDetecting and Fixing Memory-Related Performance Problems in Managed Languages
Detecting and Fixing -Related Performance Problems in Managed Languages Lu Fang Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine May 26, 2017,
More informationSpeculative Region-based Memory Management for Big Data Systems
Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen Lu Fang Guoqing Xu Brian Demsky University of California, Irvine {khanhtn1, lfang3, guoqingx, bdemsky}@uci.edu Abstract Most
More informationInterruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs
Interruptible s: Treating Pressure as Interrupts for Highly Scalable Data-Parallel Programs Lu Fang 1, Khanh Nguyen 1, Guoqing(Harry) Xu 1, Brian Demsky 1, Shan Lu 2 1 University of California, Irvine
More information1 Overview and Research Impact
1 Overview and Research Impact The past decade has witnessed the explosion of Big Data, a phenomenon where massive amounts of information are created at an unprecedented scale and speed. The availability
More informationNG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications
NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications Rodrigo Bruno Luis Picciochi Oliveira Paulo Ferreira 03-160447 Tomokazu HIGUCHI Paper Information Published
More informationDon t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems
Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems David Lion, Adrian Chiu, Hailong Sun*, Xin Zhuang, Nikola Grcevski, Ding Yuan University
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationGraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC
GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC Kai Wang, Guoqing Xu, University of California, Irvine Zhendong Su,
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationDynamic Vertical Memory Scalability for OpenJDK Cloud Applications
Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic
More informationRuntime. The optimized program is ready to run What sorts of facilities are available at runtime
Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationPause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie
Pause-Less GC for Improving Java Responsiveness Charlie Gracie IBM Senior Software Developer charlie_gracie@ca.ibm.com @crgracie charliegracie 1 Important Disclaimers THE INFORMATION CONTAINED IN THIS
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationManaged runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley
Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?
More informationSustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)
COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice
More informationMemory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps
Garbage Collection Garbage collection makes memory management easier for programmers by automatically reclaiming unused memory. The garbage collector in the CLR makes tradeoffs to assure reasonable performance
More informationManaged runtimes & garbage collection
Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security
More informationShenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat
Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Why do we need it? What does it do? How does it work? What's
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationAcknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides
Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++
More informationLecture 15 Garbage Collection
Lecture 15 Garbage Collection I. Introduction to GC -- Reference Counting -- Basic Trace-Based GC II. Copying Collectors III. Break Up GC in Time (Incremental) IV. Break Up GC in Space (Partial) Readings:
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 - Spring 2013 1 Memory Attributes! Memory to store data in programming languages has the following lifecycle
More informationJAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder
JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance
More informationLecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!
Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm
More informationAlgorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank
Algorithms for Dynamic Memory Management (236780) Lecture 4 Lecturer: Erez Petrank!1 March 24, 2014 Topics last week The Copying Garbage Collector algorithm: Basics Cheney s collector Additional issues:
More informationLocality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationGarbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff
Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis Presented by Edward Raff Motivational Setup Java Enterprise World High end multiprocessor servers Large
More informationData Analytics Job Guarantee Program
Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction
More informationTRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS
TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care
More informationCoroutines & Data Stream Processing
Coroutines & Data Stream Processing An application of an almost forgotten concept in distributed computing Zbyněk Šlajchrt, slajchrt@avast.com, @slajchrt Agenda "Strange" Iterator Example Coroutines MapReduce
More informationExploiting the Behavior of Generational Garbage Collector
Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationImproving Hadoop MapReduce Performance on Supercomputers with JVM Reuse
Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core
More informationContents. 8-1 Copyright (c) N. Afshartous
Contents 1. Introduction 2. Types and Variables 3. Statements and Control Flow 4. Reading Input 5. Classes and Objects 6. Arrays 7. Methods 8. Scope and Lifetime 9. Utility classes 10 Introduction to Object-Oriented
More informationMapReduce Design Patterns
MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together
More informationThe C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf
The C4 Collector Or: the Application memory wall will remain until compaction is solved Gil Tene Balaji Iyengar Michael Wolf High Level Agenda 1. The Application Memory Wall 2. Generational collection
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (2/2) January 12, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationParallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case
1 / 39 Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case PAN Jie 1 Yann LE BIANNIC 2 Frédéric MAGOULES 1 1 Ecole Centrale Paris-Applied Mathematics and Systems
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 Spring 2017 1 Memory Attributes Memory to store data in programming languages has the following lifecycle
More informationMapReduce Algorithm Design
MapReduce Algorithm Design Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul Big
More informationOptimising Multicore JVMs. Khaled Alnowaiser
Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services
More informationTI2736-B Big Data Processing. Claudia Hauff
TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement
More informationMapReduce Algorithms
Large-scale data processing on the Cloud Lecture 3 MapReduce Algorithms Satish Srirama Some material adapted from slides by Jimmy Lin, 2008 (licensed under Creation Commons Attribution 3.0 License) Outline
More informationAgenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1
Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationLecture 13: Garbage Collection
Lecture 13: Garbage Collection COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer/Mikkel Kringelbach 1 Garbage Collection Every modern programming language allows programmers
More informationCS 241 Honors Memory
CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationShenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat
Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Principal Software Engineer Red Hat 1 Why do we need another Garbage Collector? OpenJDK currently has: SerialGC ParallelGC
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationGLADE: A Scalable Framework for Efficient Analytics. Florin Rusu (University of California, Merced) Alin Dobra (University of Florida)
DE: A Scalable Framework for Efficient Analytics Florin Rusu (University of California, Merced) Alin Dobra (University of Florida) Big Data Analytics Big Data Storage is cheap ($100 for 1TB disk) Everything
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationCMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection
CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC330 Fall 2018 1 Memory Attributes Memory to store data in programming languages has the following lifecycle
More informationImplementation of Aggregation of Map and Reduce Function for Performance Improvisation
2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation
More informationMapReduce and Hadoop
Università degli Studi di Roma Tor Vergata MapReduce and Hadoop Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack High-level Interfaces Data Processing
More informationGLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced
GLADE: A Scalable Framework for Efficient Analytics Florin Rusu University of California, Merced Motivation and Objective Large scale data processing Map-Reduce is standard technique Targeted to distributed
More informationShenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke
Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1 What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root
More informationTask-Aware Garbage Collection in a Multi-Tasking Virtual Machine
Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Computer Science Department University of California, Santa Barbara Santa Barbara, CA 9316, USA sunils@cs.ucsb.edu Laurent Daynès
More informationZing Vision. Answering your toughest production Java performance questions
Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A
More informationHaLoop Efficient Iterative Data Processing on Large Clusters
HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationTHE TROUBLE WITH MEMORY
THE TROUBLE WITH MEMORY OUR MARKETING SLIDE Kirk Pepperdine Authors of jpdm, a performance diagnostic model Co-founded Building the smart generation of performance diagnostic tooling Bring predictability
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationMyths and Realities: The Performance Impact of Garbage Collection
Myths and Realities: The Performance Impact of Garbage Collection Tapasya Patki February 17, 2011 1 Motivation Automatic memory management has numerous software engineering benefits from the developer
More informationRunning class Timing on Java HotSpot VM, 1
Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return
More informationStructure of Programming Languages Lecture 10
Structure of Programming Languages Lecture 10 CS 6636 4536 Spring 2017 CS 6636 4536 Lecture 10: Classes... 1/23 Spring 2017 1 / 23 Outline 1 1. Types Type Coercion and Conversion Type Classes, Generics,
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationMEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION
MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION 2 1. What is the Heap Size: 2 2. What is Garbage Collection: 3 3. How are Java objects stored in memory? 3 4. What is the difference between stack and
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationCS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers
CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers Heap allocation is a necessity for modern programming tasks, and so is automatic reclamation of heapallocated memory. However,
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationData. Big: TiB - PiB. Small: MiB - GiB. Supervised Classification Regression Recommender. Learning. Model
2 Supervised Classification Regression Recommender Data Big: TiB - PiB Learning Model Small: MiB - GiB Unsupervised Clustering Dimensionality reduction Topic modeling 3 Example Formation Examples Modeling
More informationNew Java performance developments: compilation and garbage collection
New Java performance developments: compilation and garbage collection Jeroen Borgers @jborgers #jfall17 Part 1: New in Java compilation Part 2: New in Java garbage collection 2 Part 1 New in Java compilation
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationLecture Notes on Garbage Collection
Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationProgramming in C. main. Level 2. Level 2 Level 2. Level 3 Level 3
Programming in C main Level 2 Level 2 Level 2 Level 3 Level 3 1 Programmer-Defined Functions Modularize with building blocks of programs Divide and Conquer Construct a program from smaller pieces or components
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationThe Z Garbage Collector Low Latency GC for OpenJDK
The Z Garbage Collector Low Latency GC for OpenJDK Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team Jfokus VM Tech Summit 2018 Safe Harbor Statement The following is intended to outline our
More informationBoosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors
Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationApache Pig Releases. Table of contents
Table of contents 1 Download...3 2 News... 3 2.1 19 June, 2017: release 0.17.0 available...3 2.2 8 June, 2016: release 0.16.0 available...3 2.3 6 June, 2015: release 0.15.0 available...3 2.4 20 November,
More informationCompiler construction 2009
Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int
More informationCA31-1K DIS. Pointers. TA: You Lu
CA31-1K DIS Pointers TA: You Lu Pointers Recall that while we think of variables by their names like: int numbers; Computer likes to think of variables by their memory address: 0012FED4 A pointer is a
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More information