How to Write a Distributed Garbage Collector

Size: px
Start display at page:

Download "How to Write a Distributed Garbage Collector"

Transcription

1 How to Write a Distributed Garbage ollector Name: ri Zilka, Saravanan Subbiah Terracotta, Inc. 1

2 Goals > earn a little... > about Terracotta > about garbage collection > about distributed algorithms > about complexity 2 2

3 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 3 3

4 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 4 4

5 Hypothesis lgorithm design outweighs platform importance > Some say, don t distribute but the push is in the opposite direction > distribution / multi-threading is more about algorithms than platform synchronous nature leaks everywhere Variations in environment and inputs force this leak the more bare metal, the better?!?! 5 5

6 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 6 6

7 ase Study: Virtual heaps What are they? > Terracotta can make application heap page in and out of the JVM machine > It can persist application heap to disk > It can provide centralized object identity across distributed JVMs across a T/I network 7 7

8 Virtual heap pictorial Java Virtual Machine pplication instance H I J K 8 8

9 Virtual heap pictorial Java Virtual Machine - pplication instance H I J K 8 8

10 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K 9 9

11 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server -G app heap in memory H I J K 9 9

12 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K page to disk app heap on disk Terracotta Server's disk H I J K 10 10

13 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K page to disk -O app heap on disk Terracotta Server's disk H I J K 10 10

14 The Garbage ollection hallenge Virtual heaps represent a distributed computing challenge > How do we collect garbage in a distributed world? > No one layer has everything > rawling disk on Terracotta server is too slow > rawling memory in application instances is incorrect as heap is now virtual > NOTE: platform would not help with any of these issues

15 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 12 12

16 Terminology et s get on the same page > 1 application node, JVM machine > 2 Terracotta server instance > root top-most object in a graph clustered by Terracotta (for example, a ) > garbage in Terracotta garbage in Java no reference from any object reachable from roots not resident in any application heap 13 13

17 How distributed garbage gets created the last bit of context... > reate a shared object by adding it to a shared graph > Remove the shared object from the graph in 1 > G in 1 collects object > Now the object is garbage and needs to be collected > ollect it in the 2 > Example: map.put( k,v); map.remove(k); // value is garbage 14 14

18 reating distributed garbage n existing object graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance // #1 creates,, and // links them together = new object();.child = ;.child = ; //

19 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //

20 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //

21 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //... becomes distributed garbage, even though JVM #2 never accesses it

22 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding young generations > Step 3: DG for Terracotta server arrays > onclusions 17 17

23 DG in a single Terracotta server > oncurrent mark / sweep algorithm > hases: mark, rescue1, rescue2, delete > Mark objects still reachable from the set of all roots (rootset) > monitor references in app heap for deltas > rescue any objects not marked but joining the graph during mark phase (rescueset) > rescue any objects not reachable but still in 1 heap > ause 2, rescue again, and delete garbage 18 18

24 Marking reachable objects G_andidates exclude rootset, reachables objects paged in to 1 heap G_andidates: M N O 19 19

25 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

26 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

27 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

28 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: E F G 19 19

29 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: F G 19 19

30 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: F G M N O 19 19

31 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: G M N O 19 19

32 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

33 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

34 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: N O 19 19

35 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: O 19 19

36 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: 19 19

37 Rescue phase 1 Reference monitoring informs 2 of changes objects paged in to 1 heap 20 20

38 Rescue phase 1 Reference monitoring informs 2 of changes objects paged in to 1 heap reference deleted 20 20

39 Rescue phase 1 Deleted references that are still paged in are special objects paged in to 1 heap 21 21

40 Rescue phase 1 Deleted references that are still paged in are special objects paged in to 1 heap reference deleted 21 21

41 Rescue phase 1 ecause they can be rescued by other paged in objects objects paged in to 1 heap 22 22

42 Rescue phase 1 ecause they can be rescued by other paged in objects objects paged in to 1 heap reference rescued 22 22

43 Rescue phase 1 The resultant garbage in this scenario objects paged in to 1 heap G_andidates: G 23 23

44 Rescue phase 2 One quick pause and we are done > ogically same as rescue 1 > but no changes are allowed into 2 could not be collected even if not rescued because it cannot be Ged > So apply any rescueset objects to G_andidates > Then collect garbage > In the example, G, are garbage > We re done

45 Drawbacks to simple DG Might require tuning > Frequent DG pauses app too often uses up 2 cycles / adds workload > Infrequent DG more time to delete increase chance that reachable objects have flushed to disk disk can fill up more overhead in bookkeeping garbage 25 25

46 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 26 26

47 Generational collector trying to avoid distribution > If we partition data across 2s, each will have less data to manage divide and conquer data divide and conquer G...but more later > Generational collection We can avoid marking anything that is on disk can help avoid disk I/O maybe help avoid the need to distribute 2s 27 27

48 Generational collection Differences from simple DG > Similar to simple DG > redefine rootset to be roots currently in memory > add oldgen, younggen spaces oldgen objects that have fallen out of cache at some point younggen newly created objects in 2 > rememberedset backpointers from oldgen to younggen this one is tricky 28 28

49 Young generation pictorial oldgen, younggen, rootset D E F younggen in memory oldgen on disk 29 29

50 Young generation pictorial oldgen, younggen, rootset rootset: D E F younggen in memory oldgen on disk 29 29

51 The rememberedset backpointers help skip I/O rootset: rememberedset:,m D E F younggen in memory oldgen on disk 30 30

52 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

53 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

54 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

55 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: M 31 31

56 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: M 31 31

57 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: 31 31

58 Generational collection summary We need a generic large scale algorithm > enefits We avoided all disk I/O Younggen G run in milliseconds an run very frequently avoids sending objects to oldgen / disk > Issues Doesn t visit all objects, doesn t find all garbage Full DGs must still occur Distributed garbage must be created often 32 32

59 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server stripes > onclusions 33 33

60 DG for Server Stripes lots of challenges to address > Distributing the object graphs lets us handle arbitrary use cases > We can do fast updates w/o distributed transaction overhead another talk though... > ut we can t get around distributed algorithm for G no single 2 knows the whole graph shape what if nodes fail? network fails? async updates means DG is totally async 34 34

61 Graphical view of server stripes Terracotta server 1 Terracotta server 2 Terracotta server

62 ore algorithm > oordinator 2 decides when to start a DG > oordinator 2 decides when it is complete > pass marking token on to other 2s > Share no lists, sets, etc. amongst 2s > Make 2s respond to a global ticker event to track progress 36 36

63 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

64 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

65 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

66 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: E M F G N O 37 37

67 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O 37 37

68 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O outbound mark request 37 37

69 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O outbound mark request 37 37

70 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M G N O outbound mark request 37 37

71 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M N O outbound mark request 37 37

72 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: outbound mark request 37 37

73 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: outbound mark request 37 37

74 DG for stripes details > Mark starts at roots Requests to mark are batched, asynch for, and > We find the M,N,O graph to be garbage > Since no single machine has all G_andidates we... sked Server #2 to follow from for us > We ignore failures and abandon the DG run Nothing persisted / changed till last stage will get results when it comes back to life 38 38

75 The TIKER TOKEN our coordinating device > Ticker token a token that gets passed around the server ring > oordinator initiates > Each node adds its status and passes token on outbound inbound farmed marking tasks to other servers tasks completed on behalf of others > hase complete when inbound - outbound==0 > hase not complete, start a new token on next tick 39 39

76 DG for stripes is totally asynchronous > We do... mark in parallel rescue in parallel accept graph changes in parallel > We don t... share G_andidates, rootset or rememberedset have any synchronous remote calls 40 40

77 latform independent > We could have used messaging JGroups RMI raw T or UD protocols RESTful HTT interface > Transport doesn t really matter. > and neither does language or framework, we think 41 41

78 Have we proven our hypothesis? > lgorithm design outweighs platform importance > Specific asynch frameworks may or may not help > Not a dig on any technology but the TIKER TOKEN the mark/sweep and more, make us go fast and those are algorithms > Was distribution valuable? On 10G of data: 1 2: 2.2MM objects, 1 hour 8 2s: 2.2MM objects, 55 seconds 42 42

79 Future optimizations > Generational collection for stripes > Object migration and packing, train algorithms > Resume after crash (only restarts today) 43 43

80 Want to learn more? > ome by our booth > sk about avoiding distributed transactions lock leasing and fairness > There are all sorts of cool runtime optimizations inside the Terracotta array >

81 ri Zilka, Saravanan 45

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie Pause-Less GC for Improving Java Responsiveness Charlie Gracie IBM Senior Software Developer charlie_gracie@ca.ibm.com @crgracie charliegracie 1 Important Disclaimers THE INFORMATION CONTAINED IN THIS

More information

CS 241 Honors Memory

CS 241 Honors Memory CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35

More information

Garbage Collection. Steven R. Bagley

Garbage Collection. Steven R. Bagley Garbage Collection Steven R. Bagley Reference Counting Counts number of pointers to an Object deleted when the count hits zero Eager deleted as soon as it is finished with Problem: Circular references

More information

OS-caused Long JVM Pauses - Deep Dive and Solutions

OS-caused Long JVM Pauses - Deep Dive and Solutions OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction

More information

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array

More information

The Generational Hypothesis

The Generational Hypothesis Generational GC 1 The Generational Hypothesis Hypothesis: most objects "die young" i.e., they are only needed for a short time, and can be collected soon A great example of empirical systems work Found

More information

Exploiting the Behavior of Generational Garbage Collector

Exploiting the Behavior of Generational Garbage Collector Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,

More information

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd. Kodewerk tm Java Performance Services The War on Latency Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd. Me Work as a performance tuning freelancer Nominated Sun Java Champion www.kodewerk.com

More information

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice

More information

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017 The G1 GC in JDK 9 Erik Duveblad Senior Member of Technical Staf racle JVM GC Team ctober, 2017 Copyright 2017, racle and/or its affiliates. All rights reserved. 3 Safe Harbor Statement The following is

More information

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++

More information

Java On Steroids: Sun s High-Performance Java Implementation. History

Java On Steroids: Sun s High-Performance Java Implementation. History Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact

More information

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank Algorithms for Dynamic Memory Management (236780) Lecture 4 Lecturer: Erez Petrank!1 March 24, 2014 Topics last week The Copying Garbage Collector algorithm: Basics Cheney s collector Additional issues:

More information

Java Performance Tuning

Java Performance Tuning 443 North Clark St, Suite 350 Chicago, IL 60654 Phone: (312) 229-1727 Java Performance Tuning This white paper presents the basics of Java Performance Tuning and its preferred values for large deployments

More information

Automatic Memory Management

Automatic Memory Management Automatic Memory Management Why Automatic Memory Management? Storage management is still a hard problem in modern programming Why Automatic Memory Management? Storage management is still a hard problem

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table

More information

Cooperative Memory Management in Embedded Systems

Cooperative Memory Management in Embedded Systems ooperative Memory Management in Embedded Systems, Philip Taffner, hristoph Erhardt, hris7an Dietrich, Michael S7lkerich Department of omputer Science 4 Distributed Systems and Opera7ng Systems 1 2 Motivation

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 - Spring 2013 1 Memory Attributes! Memory to store data in programming languages has the following lifecycle

More information

Java Performance Tuning and Optimization Student Guide

Java Performance Tuning and Optimization Student Guide Java Performance Tuning and Optimization Student Guide D69518GC10 Edition 1.0 June 2011 D73450 Disclaimer This document contains proprietary information and is protected by copyright and other intellectual

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION 2 1. What is the Heap Size: 2 2. What is Garbage Collection: 3 3. How are Java objects stored in memory? 3 4. What is the difference between stack and

More information

Lecture 15 Advanced Garbage Collection

Lecture 15 Advanced Garbage Collection Lecture 15 Advanced Garbage Collection I. Break Up GC in Time (Incremental) II. Break Up GC in Space (Partial) Readings: Ch. 7.6.4-7.7.4 CS243: Advanced Garbage Collection 1 Trace-Based GC: Memory Life-Cycle

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab Designing experiments Performing experiments in Java Intel s Manycore Testing Lab High quality results that capture, e.g., How an algorithm scales Which of several algorithms performs best Pretty graphs

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 Spring 2017 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

A new Mono GC. Paolo Molaro October 25, 2006

A new Mono GC. Paolo Molaro October 25, 2006 A new Mono GC Paolo Molaro lupus@novell.com October 25, 2006 Current GC: why Boehm Ported to the major architectures and systems Featurefull Very easy to integrate Handles managed pointers in unmanaged

More information

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35 Garbage Collection Akim Demaille, Etienne Renault, Roland Levillain May 15, 2017 CCMP2 Garbage Collection May 15, 2017 1 / 35 Table of contents 1 Motivations and Definitions 2 Reference Counting Garbage

More information

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM Spark 2 Alexey Zinovyev, Java/BigData Trainer in EPAM With IT since 2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015 About Secret Word from EPAM itsubbotnik Big Data Training 3 Contacts

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Distributed Systems. The main method of distributed object communication is with remote method invocation

Distributed Systems. The main method of distributed object communication is with remote method invocation Distributed Systems Unit III Syllabus:Distributed Objects and Remote Invocation: Introduction, Communication between Distributed Objects- Object Model, Distributed Object Modal, Design Issues for RMI,

More information

Diagnostics in Testing and Performance Engineering

Diagnostics in Testing and Performance Engineering Diagnostics in Testing and Performance Engineering This document talks about importance of diagnostics in application testing and performance engineering space. Here are some of the diagnostics best practices

More information

Robust Memory Management Schemes

Robust Memory Management Schemes Robust Memory Management Schemes Prepared by : Fadi Sbahi & Ali Bsoul Supervised By: Dr. Lo ai Tawalbeh Jordan University of Science and Technology Robust Memory Management Schemes Introduction. Memory

More information

Java Without the Jitter

Java Without the Jitter TECHNOLOGY WHITE PAPER Achieving Ultra-Low Latency Table of Contents Executive Summary... 3 Introduction... 4 Why Java Pauses Can t Be Tuned Away.... 5 Modern Servers Have Huge Capacities Why Hasn t Latency

More information

Myths and Realities: The Performance Impact of Garbage Collection

Myths and Realities: The Performance Impact of Garbage Collection Myths and Realities: The Performance Impact of Garbage Collection Tapasya Patki February 17, 2011 1 Motivation Automatic memory management has numerous software engineering benefits from the developer

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Habanero Extreme Scale Software Research Project

Habanero Extreme Scale Software Research Project Habanero Extreme Scale Software Research Project Comp215: Garbage Collection Zoran Budimlić (Rice University) Adapted from Keith Cooper s 2014 lecture in COMP 215. Garbage Collection In Beverly Hills...

More information

Virtual Memory Primitives for User Programs

Virtual Memory Primitives for User Programs Virtual Memory Primitives for User Programs Andrew W. Appel & Kai Li Department of Computer Science Princeton University Presented By: Anirban Sinha (aka Ani), anirbans@cs.ubc.ca 1 About the Authors Andrew

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

THE TROUBLE WITH MEMORY

THE TROUBLE WITH MEMORY THE TROUBLE WITH MEMORY OUR MARKETING SLIDE Kirk Pepperdine Authors of jpdm, a performance diagnostic model Co-founded Building the smart generation of performance diagnostic tooling Bring predictability

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Programmer Directed GC for C++ Michael Spertus N2286= April 16, 2007

Programmer Directed GC for C++ Michael Spertus N2286= April 16, 2007 Programmer Directed GC for C++ Michael Spertus N2286=07-0146 April 16, 2007 Garbage Collection Automatically deallocates memory of objects that are no longer in use. For many popular languages, garbage

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Concurrent Garbage Collection

Concurrent Garbage Collection Concurrent Garbage Collection Deepak Sreedhar JVM engineer, Azul Systems Java User Group Bangalore 1 @azulsystems azulsystems.com About me: Deepak Sreedhar JVM student at Azul Systems Currently working

More information

Lecture 13: Garbage Collection

Lecture 13: Garbage Collection Lecture 13: Garbage Collection COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer/Mikkel Kringelbach 1 Garbage Collection Every modern programming language allows programmers

More information

Reference Counting. Reference counting: a way to know whether a record has other users

Reference Counting. Reference counting: a way to know whether a record has other users Garbage Collection Today: various garbage collection strategies; basic ideas: Allocate until we run out of space; then try to free stuff Invariant: only the PL implementation (runtime system) knows about

More information

Kotlin/Native concurrency model. nikolay

Kotlin/Native concurrency model. nikolay Kotlin/Native concurrency model nikolay igotti@jetbrains What do we want from concurrency? Do many things concurrently Easily offload tasks Get notified once task a task is done Share state safely Mutate

More information

September 15th, Finagle + Java. A love story (

September 15th, Finagle + Java. A love story ( September 15th, 2016 Finagle + Java A love story ( ) @mnnakamura hi, I m Moses Nakamura Twitter lives on the JVM When Twitter realized we couldn t stay on a Rails monolith and continue to scale at the

More information

Zing Vision. Answering your toughest production Java performance questions

Zing Vision. Answering your toughest production Java performance questions Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A

More information

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems Distributed Architectures & Microservices CS 475, Spring 2018 Concurrent & Distributed Systems GFS Architecture GFS Summary Limitations: Master is a huge bottleneck Recovery of master is slow Lots of success

More information

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,

More information

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Workload Characterization and Optimization of TPC-H Queries on Apache Spark Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research

More information

Parallel Memory Defragmentation on a GPU

Parallel Memory Defragmentation on a GPU Parallel Memory Defragmentation on a GPU Ronald Veldema, Michael Philippsen University of Erlangen-Nuremberg Germany Informatik 2 Programmiersysteme Martensstraße 3 91058 Erlangen Motivation Application

More information

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare.

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare. 6.172 Performance Engineering of Software Systems Spring 2009 Lecture 9 MIT OpenCourseWare Dynamic Storage Allocation Stack allocation: LIFO (last-in-first-out) Array and pointer A used unused P before

More information

A NEW PLATFORM FOR A NEW ERA. Copyright 2014 Pivotal. All rights reserved.

A NEW PLATFORM FOR A NEW ERA. Copyright 2014 Pivotal. All rights reserved. A NEW PLATFORM FOR A NEW ERA 1 2 Java Memory Leaks in Modular Environments Mark Thomas November 2016 Introductions Mark Thomas markt@apache.org Tomcat committer and PMC member Joined Tomcat community in

More information

G1 Garbage Collector Details and Tuning. Simone Bordet

G1 Garbage Collector Details and Tuning. Simone Bordet G1 Garbage Collector Details and Tuning Who Am I - @simonebordet Lead Architect at Intalio/Webtide Jetty's HTTP/2, SPDY and HTTP client maintainer Open Source Contributor Jetty, CometD, MX4J, Foxtrot,

More information

Lecture 15 Garbage Collection

Lecture 15 Garbage Collection Lecture 15 Garbage Collection I. Introduction to GC -- Reference Counting -- Basic Trace-Based GC II. Copying Collectors III. Break Up GC in Time (Incremental) IV. Break Up GC in Space (Partial) Readings:

More information

Lecture 6: Bridging & Switching. Last time. Today. CSE 123: Computer Networks Chris Kanich. How do multiple hosts share a single channel?

Lecture 6: Bridging & Switching. Last time. Today. CSE 123: Computer Networks Chris Kanich. How do multiple hosts share a single channel? Lecture 6: ridging & Switching SE 3: omputer Networks hris Kanich Project countdown: 5 days Last time How do multiple hosts share a single channel? Medium ccess ontrol (M) protocols hannel partitioning

More information

Using Automated Network Management at Fiserv. June 2012

Using Automated Network Management at Fiserv. June 2012 Using Automated Network Management at Fiserv June 2012 Brought to you by Join Group Vivit Network Automation Special Interest Group (SIG) Leaders: Chris Powers & Wendy Wheeler Your input is welcomed on

More information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects

More information

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care

More information

Garbage Collection (1)

Garbage Collection (1) Garbage Collection (1) Advanced Operating Systems Lecture 7 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/

More information

Scaling Your Cache & Caching at Scale. Alex

Scaling Your Cache & Caching at Scale. Alex Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching work? What s hard about caching? How do we make choices as we design a caching architecture? How do we test a cache

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Generational Garbage Collection Theory and Best Practices

Generational Garbage Collection Theory and Best Practices Chris Bailey, Java Support Architect Generational Garbage Collection Theory and Best Practices Overview Introduction to Generational Garbage Collection The nursery space aka the young generation The tenured

More information

Heap Compression for Memory-Constrained Java

Heap Compression for Memory-Constrained Java Heap Compression for Memory-Constrained Java CSE Department, PSU G. Chen M. Kandemir N. Vijaykrishnan M. J. Irwin Sun Microsystems B. Mathiske M. Wolczko OOPSLA 03 October 26-30 2003 Overview PROBLEM:

More information

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014 GC (Chapter 14) Eleanor Ainy December 16 th 2014 1 Outline of Today s Talk How to use parallelism in each of the 4 components of tracing GC: Marking Copying Sweeping Compaction 2 Introduction Till now

More information

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads Last Time Cost of nearly full resources RAM is limited Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads Embedded C Extensions for

More information

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or

Arrays are a very commonly used programming language construct, but have limited support within relational databases. Although an XML document or Performance problems come in many flavors, with many different causes and many different solutions. I've run into a number of these that I have not seen written about or presented elsewhere and I want

More information

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI Remote Invocation Today l Request-reply, RPC, RMI Next time l Overlay networks and P2P Types of communication " Persistent or transient Persistent A submitted message is stored until delivered Transient

More information

CS143 Final Fall 2009

CS143 Final Fall 2009 CS143 Final Fall 2009 Please read all instructions (including these) carefully. There are 4 questions on the exam, all with multiple parts. You have 2 hours to work on the exam. The exam is closed book,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Today l Basic distributed file systems l Two classical examples Next time l Naming things xkdc Distributed File Systems " A DFS supports network-wide sharing of files and devices

More information

Java performance - not so scary after all

Java performance - not so scary after all Java performance - not so scary after all Holly Cummins IBM Hursley Labs 2009 IBM Corporation 2001 About me Joined IBM Began professional life writing event framework for WebSphere 2004 Moved to work on

More information

Designing for Scalability. Patrick Linskey EJB Team Lead BEA Systems

Designing for Scalability. Patrick Linskey EJB Team Lead BEA Systems Designing for Scalability Patrick Linskey EJB Team Lead BEA Systems plinskey@bea.com 1 Patrick Linskey EJB Team Lead at BEA OpenJPA Committer JPA 1, 2 EG Member 2 Agenda Define and discuss scalability

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Finally! Real Java for low latency and low jitter

Finally! Real Java for low latency and low jitter Finally! Real Java for low latency and low jitter Gil Tene, CTO & co-founder, Azul Systems High level agenda Java in a low latency application world Why Stop-The-World is a problem (Duh?) Java vs. actual,

More information

HBase Practice At Xiaomi.

HBase Practice At Xiaomi. HBase Practice At Xiaomi huzheng@xiaomi.com About This Talk Async HBase Client Why Async HBase Client Implementation Performance How do we tuning G1GC for HBase CMS vs G1 Tuning G1GC G1GC in XiaoMi HBase

More information

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory Chapter 9: Virtual Memory Silberschatz, Galvin and Gagne 2013 Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1 What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root

More information

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis Presented by Edward Raff Motivational Setup Java Enterprise World High end multiprocessor servers Large

More information

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering Consistency: Relaxed SWE 622, Spring 2017 Distributed Software Engineering Review: HW2 What did we do? Cache->Redis Locks->Lock Server Post-mortem feedback: http://b.socrative.com/ click on student login,

More information

Pragmatic Clustering. Mike Cannon-Brookes CEO, Atlassian Software Systems

Pragmatic Clustering. Mike Cannon-Brookes CEO, Atlassian Software Systems Pragmatic Clustering Mike Cannon-Brookes CEO, Atlassian Software Systems 1 Confluence Largest enterprise wiki in the world 2000 customers in 60 countries J2EE application, ~500k LOC Hibernate, Lucene,

More information

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 The Operating System (OS) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke)

More information

JPDM, A Structured approach To Performance Tuning. Copyright 2017 Kirk Pepperdine. All rights reserved

JPDM, A Structured approach To Performance Tuning. Copyright 2017 Kirk Pepperdine. All rights reserved JPDM, A Structured approach To Performance Tuning About Us Performance Consulting Java Performance Tuning Workshops Co-Founded jclarity Disclaimer Our Typical Customer Application isn t performing to project

More information

ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA. 18 October 2017

ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA. 18 October 2017 ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA 18 October 2017 Who am I? Working in Performance and Reliability Engineering Team at Hotels.com Part of Expedia Inc, handling $72billion

More information

Garbage Collection. Hwansoo Han

Garbage Collection. Hwansoo Han Garbage Collection Hwansoo Han Heap Memory Garbage collection Automatically reclaim the space that the running program can never access again Performed by the runtime system Two parts of a garbage collector

More information

CS533 Modeling and Performance Evaluation of Network and Computer Systems

CS533 Modeling and Performance Evaluation of Network and Computer Systems CS533 Modeling and Performance Evaluation of Network and Computer Systems Monitors (Chapter 7) 1 Monitors That which is monitored improves. Source unknown A monitor is a tool used to observe system Observe

More information

Attila Szegedi, Software

Attila Szegedi, Software Attila Szegedi, Software Engineer @asz Everything I ever learned about JVM performance tuning @twitter Everything More than I ever wanted to learned about JVM performance tuning @twitter Memory tuning

More information

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC330 Fall 2018 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN

SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN Agenda Motivation Hierarchy of self-monitoring CPU profiling GC monitoring Heap analysis Deadlock detection 2 Agenda Motivation Hierarchy

More information

JVM Memory Model and GC

JVM Memory Model and GC JVM Memory Model and GC Developer Community Support Fairoz Matte Principle Member Of Technical Staff Java Platform Sustaining Engineering, Copyright 2015, Oracle and/or its affiliates. All rights reserved.

More information

Software Speculative Multithreading for Java

Software Speculative Multithreading for Java Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com

More information

Dynamic Storage Allocation

Dynamic Storage Allocation 6.172 Performance Engineering of Software Systems LECTURE 10 Dynamic Storage Allocation Charles E. Leiserson October 12, 2010 2010 Charles E. Leiserson 1 Stack Allocation Array and pointer A un Allocate

More information

CS842: Automatic Memory Management and Garbage Collection. Mark and sweep

CS842: Automatic Memory Management and Garbage Collection. Mark and sweep CS842: Automatic Memory Management and Garbage Collection Mark and sweep 1 Schedule M W Sept 14 Intro/Background Basics/ideas Sept 21 Allocation/layout GGGGC Sept 28 Mark/Sweep Mark/Sweep cto 5 Copying

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

Advanced Programming & C++ Language

Advanced Programming & C++ Language Advanced Programming & C++ Language ~6~ Introduction to Memory Management Ariel University 2018 Dr. Miri (Kopel) Ben-Nissan Stack & Heap 2 The memory a program uses is typically divided into four different

More information

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf The C4 Collector Or: the Application memory wall will remain until compaction is solved Gil Tene Balaji Iyengar Michael Wolf High Level Agenda 1. The Application Memory Wall 2. Generational collection

More information