Yak: A High-Performance Big-Data-Friendly Garbage Collector. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu

Size: px
Start display at page:

Download "Yak: A High-Performance Big-Data-Friendly Garbage Collector. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu"

Transcription

1 Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu University of California, Irvine University of Chicago Onur Mutlu ETH Zürich

2 2

3 Naiad 2

4 Background: GC Heap 3

5 Background: GC Heap 4

6 Background: GC Heap 4

7 Background: GC Heap 4

8 Background: GC Heap 4

9 Background: GC Heap 4

10 5

11 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] 5

12 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] Management cost GC time accounts for up to 50% of the execution time [Bu et al., ISMM 13] 5

13 Scalability JVM crashes due to OutOfMemory error at early stage [Fang et al., SOSP 15] Management cost GC time accounts for up to 50% of the execution time [Bu et al., ISMM 13] 5

14 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF 6

15 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF 6

16 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path Data path 6

17 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6

18 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6

19 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6

20 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller MASTER SLAVE Node Controller... Node Controller SLAVE Aggregate Join UDF Aggregate Join UDF Control path 6

21 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Control path 6

22 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Data path 6

23 Two Paths, Two Hypotheses Data Loads and Feeds Queries and Results Data Publishing Cloud Partitioner Cluster Controller Node Controller... Node Controller Aggregate Join UDF Aggregate Join UDF Data path 6

24 WordCount 7

25 WordCount Document Mapper 7

26 WordCount Document Mapper setup 7

27 WordCount Document Mapper setup 7

28 WordCount Document Mapper setup map 7

29 WordCount Document Mapper setup Heap map 7

30 WordCount Document Mapper setup Heap map 7

31 WordCount Document Mapper setup Heap map 7

32 WordCount Document Mapper setup Heap map 7

33 WordCount Document Mapper setup Heap map 7

34 WordCount Document Mapper setup Heap map cleanup 7

35 WordCount Document Mapper setup Heap map cleanup 7

36 WordCount Document Mapper setup Heap map cleanup 7

37 WordCount Document setup Many data objects have the same life span: Mapper Heap map cleanup 7

38 WordCount Document setup Many data objects have the same life span: Mapper Heap GC map cleanup 7

39 WordCount Document setup Many data objects have the same life span: Mapper Heap GC map cleanup GC run in the middle is wasted 7

40 WordCount Document setup Many data objects have the same life span: Mapper Memory Heap Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7

41 WordCount Document setup Many data objects have the same life span: Mapper Memory Heap Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7

42 WordCount Document setup Many data objects have the same life span: Mapper Memory Region GC map cleanup GC run in the middle is wasted Can be efficiently reclaimed together 7

43 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems 8

44 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems What about control path? 8

45 Region-based Memory Management Sophisticated static analysis won t work for data-intensive systems What about control path? generational GC region-based memory management 8

46 Yak Approach Heap 9

47 Yak Approach Control Space Data Space 9

48 Yak Approach Control Space Data Space Generational GC 9

49 Yak Approach Control Space Data Space Generational GC Region-based Memory Management 9

50 Yak Approach Control Space Data Space Generational GC Region-based Memory Management Reduced memory management cost 9

51 Yak Approach Control Space Data Space Generational GC Region-based Memory Management Reduced memory management cost annotate epoch boundary: - epoch_start() - epoch_end() 9

52 Correctness Region 10

53 Correctness Region 10

54 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] 10

55 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring 10

56 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring Developers 10

57 Correctness User-based approach solution: Facade [Nguyen et al., ASPLOS 15] annotation & refactoring Developers Yak 10

58 Yak: An Automatic Solution Yak: the first hybrid GC 11

59 Yak: An Automatic Solution Yak: the first hybrid GC Implemented in OpenJDK 8 Modified the interpreter, two JIT compilers, the heap layout, the Parallel Scavenge GC NO code refactoring needed; correctness guaranteed by the system 11

60 Yak: An Automatic Solution Yak: the first hybrid GC Implemented in OpenJDK 8 Modified the interpreter, two JIT compilers, the heap layout, the Parallel Scavenge GC NO code refactoring needed; correctness guaranteed by the system On average, vs. default production GC (PS): Reduce 33% execution time Reduce 78% GC time 11

61 Challenges How to create regions? How to reclaim regions correctly? 12

62 How to Create Regions? A region starts at a call to epoch-start and ends at a call to epoch-end The location of epochs affects performance but not correctness Regions are thread-local Regions can be nested 13

63 How to Create Regions? void main() { } //end of main 14

64 How to Create Regions? void main() { CS,* } //end of main 14

65 How to Create Regions? void main() { epoch_start(); for( ) { epoch #1 CS,* 1,t 1 1,t 2 } epoch_end(); } //end of main 14

66 void main() { epoch_start(); for( ) { epoch_start(); for( ) { How to Create Regions? epoch #1 epoch #2 1,t 1 CS,* 1,t 2 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 14

67 void main() { epoch_start(); for( ) { epoch_start(); for( ) { epoch_start(); for( ) { How to Create Regions? epoch #1 epoch #2 epoch #3 1,t 1 CS,* 1,t 2 } epoch_end(); 2,t 1 2,t 2 } epoch_end(); } epoch_end(); 3,t 1 3,t 2 } //end of main 14

68 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); 3,t 1 3,t 2 } //end of main 14

69 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 3,t 1, 2,t 1 ) = 2,t 1 14

70 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 3,t 1, 2,t 1 ) = 2,t 1 14

71 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 2,t 1, 2,t 2 ) = CS,* 14

72 Region Semilattice void main() { epoch_start(); for( ) { epoch #1 CS,* epoch_start(); for( ) { epoch_start(); for( ) { epoch #2 epoch #3 1,t 1 1,t 2 } epoch_end(); region 2,t 1 2,t 2 } epoch_end(); } epoch_end(); } //end of main 3,t 1 3,t 2 JOIN( 2,t 1, 2,t 2 ) = CS,* 14

73 How to Reclaim Regions Correctly? Object Promotion Algorithm 15

74 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: 15

75 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions 15

76 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions Where: Decide the relocation destination: 15

77 How to Reclaim Regions Correctly? Object Promotion Algorithm Two key tasks: What: Identify escaping objects: Tracking of cross-region/space references in write barrier A fast path for intra-region references Inter-region references are recorded in the remember sets of their destination regions Where: Decide the relocation destination: Query region semilattice 15

78 Region Deallocation <CS,*> <r 1, t 1 > epoch_end() <r 2, t 1 > 16

79 Region Deallocation <CS,*> <r 1, t 1 > epoch_end() Remember Set <r 2, t 1 > 16

80 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set <r 2, t 1 > 16

81 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set <r 2, t 1 > 16

82 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() Remember Set Barrier <r 2, t 1 > 16

83 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set Barrier <r 2, t 1 > 16

84 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set Barrier <r 2, t 1 > 16

85 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

86 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

87 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

88 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

89 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

90 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

91 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

92 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

93 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Remember Set escaping roots Barrier <r 2, t 1 > 16

94 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack Barrier 16

95 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() t 2 s stack t 3 s stack 16

96 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack epoch_end() 16

97 Region Deallocation <CS,*> <r 1, t 1 > t 1 s stack 16

98 Evaluations 3 real-world systems, 9 applications: Hadoop Popular distributed MapReduce implementation Hyracks [Borkar et al., ICDE 11] Data-parallel platform to run data-intensive jobs on a cluster of shared-nothing machines GraphChi [Kyrola et al., OSDI 12] High-performance graph analytical framework for a single machine 17

99 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18

100 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18

101 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18

102 Improvement Summary Normalized Performance (to PS) Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi Hyracks Hadoop GraphChi GC Time App. Time Total Time 18

103 More Data in the Paper Statistics on Yak s heap Overhead breakdown Write barrier cost Extra space cost 19

104 Conclusion Goal: Reduce memory management cost in Big Data systems Solution: Yak, the first hybrid GC 33% execution time savings 78% GC time reduction Requires almost zero user effort 20

105 Thank You!

Yak: A High-Performance Big-Data-Friendly Garbage Collector

Yak: A High-Performance Big-Data-Friendly Garbage Collector Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen, Lu Fang, Guoqing Xu, and Brian Demsky; University of California, Irvine; Shan Lu, University of Chicago; Sanazsadat Alamian, University

More information

Yak: A High-Performance Big-Data-Friendly Garbage Collector

Yak: A High-Performance Big-Data-Friendly Garbage Collector Yak: A High-Performance Big-Data-Friendly Garbage Collector Khanh Nguyen Lu Fang Guoqing Xu Brian Demsky Shan Lu Sanazsadat Alamian Onur Mutlu University of California, Irvine University of Chicago ETH

More information

Detecting and Fixing Memory-Related Performance Problems in Managed Languages

Detecting and Fixing Memory-Related Performance Problems in Managed Languages Detecting and Fixing -Related Performance Problems in Managed Languages Lu Fang Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine May 26, 2017,

More information

Speculative Region-based Memory Management for Big Data Systems

Speculative Region-based Memory Management for Big Data Systems Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen Lu Fang Guoqing Xu Brian Demsky University of California, Irvine {khanhtn1, lfang3, guoqingx, bdemsky}@uci.edu Abstract Most

More information

Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs

Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs Interruptible s: Treating Pressure as Interrupts for Highly Scalable Data-Parallel Programs Lu Fang 1, Khanh Nguyen 1, Guoqing(Harry) Xu 1, Brian Demsky 1, Shan Lu 2 1 University of California, Irvine

More information

1 Overview and Research Impact

1 Overview and Research Impact 1 Overview and Research Impact The past decade has witnessed the explosion of Big Data, a phenomenon where massive amounts of information are created at an unprecedented scale and speed. The availability

More information

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications Rodrigo Bruno Luis Picciochi Oliveira Paulo Ferreira 03-160447 Tomokazu HIGUCHI Paper Information Published

More information

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems David Lion, Adrian Chiu, Hailong Sun*, Xin Zhuang, Nikola Grcevski, Ding Yuan University

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC Kai Wang, Guoqing Xu, University of California, Irvine Zhendong Su,

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic

More information

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic

More information

Multi-threaded Queries. Intra-Query Parallelism in LLVM

Multi-threaded Queries. Intra-Query Parallelism in LLVM Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted

More information

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie Pause-Less GC for Improving Java Responsiveness Charlie Gracie IBM Senior Software Developer charlie_gracie@ca.ibm.com @crgracie charliegracie 1 Important Disclaimers THE INFORMATION CONTAINED IN THIS

More information

Hadoop Map Reduce 10/17/2018 1

Hadoop Map Reduce 10/17/2018 1 Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice

More information

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps Garbage Collection Garbage collection makes memory management easier for programmers by automatically reclaiming unused memory. The garbage collector in the CLR makes tradeoffs to assure reasonable performance

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Why do we need it? What does it do? How does it work? What's

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++

More information

Lecture 15 Garbage Collection

Lecture 15 Garbage Collection Lecture 15 Garbage Collection I. Introduction to GC -- Reference Counting -- Basic Trace-Based GC II. Copying Collectors III. Break Up GC in Time (Incremental) IV. Break Up GC in Space (Partial) Readings:

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 - Spring 2013 1 Memory Attributes! Memory to store data in programming languages has the following lifecycle

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)! Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm

More information

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank Algorithms for Dynamic Memory Management (236780) Lecture 4 Lecturer: Erez Petrank!1 March 24, 2014 Topics last week The Copying Garbage Collector algorithm: Basics Cheney s collector Additional issues:

More information

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis Presented by Edward Raff Motivational Setup Java Enterprise World High end multiprocessor servers Large

More information

Data Analytics Job Guarantee Program

Data Analytics Job Guarantee Program Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction

More information

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care

More information

Coroutines & Data Stream Processing

Coroutines & Data Stream Processing Coroutines & Data Stream Processing An application of an almost forgotten concept in distributed computing Zbyněk Šlajchrt, slajchrt@avast.com, @slajchrt Agenda "Strange" Iterator Example Coroutines MapReduce

More information

Exploiting the Behavior of Generational Garbage Collector

Exploiting the Behavior of Generational Garbage Collector Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core

More information

Contents. 8-1 Copyright (c) N. Afshartous

Contents. 8-1 Copyright (c) N. Afshartous Contents 1. Introduction 2. Types and Variables 3. Statements and Control Flow 4. Reading Input 5. Classes and Objects 6. Arrays 7. Methods 8. Scope and Lifetime 9. Utility classes 10 Introduction to Object-Oriented

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf The C4 Collector Or: the Application memory wall will remain until compaction is solved Gil Tene Balaji Iyengar Michael Wolf High Level Agenda 1. The Application Memory Wall 2. Generational collection

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (2/2) January 12, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case 1 / 39 Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case PAN Jie 1 Yann LE BIANNIC 2 Frédéric MAGOULES 1 1 Ecole Centrale Paris-Applied Mathematics and Systems

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 Spring 2017 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

MapReduce Algorithm Design

MapReduce Algorithm Design MapReduce Algorithm Design Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul Big

More information

Optimising Multicore JVMs. Khaled Alnowaiser

Optimising Multicore JVMs. Khaled Alnowaiser Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement

More information

MapReduce Algorithms

MapReduce Algorithms Large-scale data processing on the Cloud Lecture 3 MapReduce Algorithms Satish Srirama Some material adapted from slides by Jimmy Lin, 2008 (licensed under Creation Commons Attribution 3.0 License) Outline

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Lecture 13: Garbage Collection

Lecture 13: Garbage Collection Lecture 13: Garbage Collection COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer/Mikkel Kringelbach 1 Garbage Collection Every modern programming language allows programmers

More information

CS 241 Honors Memory

CS 241 Honors Memory CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Principal Software Engineer Red Hat 1 Why do we need another Garbage Collector? OpenJDK currently has: SerialGC ParallelGC

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu (University of California, Merced) Alin Dobra (University of Florida)

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu (University of California, Merced) Alin Dobra (University of Florida) DE: A Scalable Framework for Efficient Analytics Florin Rusu (University of California, Merced) Alin Dobra (University of Florida) Big Data Analytics Big Data Storage is cheap ($100 for 1TB disk) Everything

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC330 Fall 2018 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

MapReduce and Hadoop

MapReduce and Hadoop Università degli Studi di Roma Tor Vergata MapReduce and Hadoop Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack High-level Interfaces Data Processing

More information

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced GLADE: A Scalable Framework for Efficient Analytics Florin Rusu University of California, Merced Motivation and Objective Large scale data processing Map-Reduce is standard technique Targeted to distributed

More information

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1 What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root

More information

Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine

Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Computer Science Department University of California, Santa Barbara Santa Barbara, CA 9316, USA sunils@cs.ucsb.edu Laurent Daynès

More information

Zing Vision. Answering your toughest production Java performance questions

Zing Vision. Answering your toughest production Java performance questions Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A

More information

HaLoop Efficient Iterative Data Processing on Large Clusters

HaLoop Efficient Iterative Data Processing on Large Clusters HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented

More information

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running

More information

THE TROUBLE WITH MEMORY

THE TROUBLE WITH MEMORY THE TROUBLE WITH MEMORY OUR MARKETING SLIDE Kirk Pepperdine Authors of jpdm, a performance diagnostic model Co-founded Building the smart generation of performance diagnostic tooling Bring predictability

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Myths and Realities: The Performance Impact of Garbage Collection

Myths and Realities: The Performance Impact of Garbage Collection Myths and Realities: The Performance Impact of Garbage Collection Tapasya Patki February 17, 2011 1 Motivation Automatic memory management has numerous software engineering benefits from the developer

More information

Running class Timing on Java HotSpot VM, 1

Running class Timing on Java HotSpot VM, 1 Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return

More information

Structure of Programming Languages Lecture 10

Structure of Programming Languages Lecture 10 Structure of Programming Languages Lecture 10 CS 6636 4536 Spring 2017 CS 6636 4536 Lecture 10: Classes... 1/23 Spring 2017 1 / 23 Outline 1 1. Types Type Coercion and Conversion Type Classes, Generics,

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part

More information

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION 2 1. What is the Heap Size: 2 2. What is Garbage Collection: 3 3. How are Java objects stored in memory? 3 4. What is the difference between stack and

More information

Data Partitioning and MapReduce

Data Partitioning and MapReduce Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers Heap allocation is a necessity for modern programming tasks, and so is automatic reclamation of heapallocated memory. However,

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha

More information

BigData and Map Reduce VITMAC03

BigData and Map Reduce VITMAC03 BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to

More information

Data. Big: TiB - PiB. Small: MiB - GiB. Supervised Classification Regression Recommender. Learning. Model

Data. Big: TiB - PiB. Small: MiB - GiB. Supervised Classification Regression Recommender. Learning. Model 2 Supervised Classification Regression Recommender Data Big: TiB - PiB Learning Model Small: MiB - GiB Unsupervised Clustering Dimensionality reduction Topic modeling 3 Example Formation Examples Modeling

More information

New Java performance developments: compilation and garbage collection

New Java performance developments: compilation and garbage collection New Java performance developments: compilation and garbage collection Jeroen Borgers @jborgers #jfall17 Part 1: New in Java compilation Part 2: New in Java garbage collection 2 Part 1 New in Java compilation

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious

More information

Programming in C. main. Level 2. Level 2 Level 2. Level 3 Level 3

Programming in C. main. Level 2. Level 2 Level 2. Level 3 Level 3 Programming in C main Level 2 Level 2 Level 2 Level 3 Level 3 1 Programmer-Defined Functions Modularize with building blocks of programs Divide and Conquer Construct a program from smaller pieces or components

More information

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center

More information

The Z Garbage Collector Low Latency GC for OpenJDK

The Z Garbage Collector Low Latency GC for OpenJDK The Z Garbage Collector Low Latency GC for OpenJDK Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team Jfokus VM Tech Summit 2018 Safe Harbor Statement The following is intended to outline our

More information

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium

More information

Hadoop/MapReduce Computing Paradigm

Hadoop/MapReduce Computing Paradigm Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Apache Pig Releases. Table of contents

Apache Pig Releases. Table of contents Table of contents 1 Download...3 2 News... 3 2.1 19 June, 2017: release 0.17.0 available...3 2.2 8 June, 2016: release 0.16.0 available...3 2.3 6 June, 2015: release 0.15.0 available...3 2.4 20 November,

More information

Compiler construction 2009

Compiler construction 2009 Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int

More information

CA31-1K DIS. Pointers. TA: You Lu

CA31-1K DIS. Pointers. TA: You Lu CA31-1K DIS Pointers TA: You Lu Pointers Recall that while we think of variables by their names like: int numbers; Computer likes to think of variables by their memory address: 0012FED4 A pointer is a

More information

What is Gluent? The Gluent Data Platform

What is Gluent? The Gluent Data Platform What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Big Data 7. Resource Management

Big Data 7. Resource Management Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage

More information