How to Write a Distributed Garbage Collector

Size: px

Start display at page:

Download "How to Write a Distributed Garbage Collector"

Regina Hoover
5 years ago
Views:

1 How to Write a Distributed Garbage ollector Name: ri Zilka, Saravanan Subbiah Terracotta, Inc. 1

2 Goals > earn a little... > about Terracotta > about garbage collection > about distributed algorithms > about complexity 2 2

3 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 3 3

4 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 4 4

5 Hypothesis lgorithm design outweighs platform importance > Some say, don t distribute but the push is in the opposite direction > distribution / multi-threading is more about algorithms than platform synchronous nature leaks everywhere Variations in environment and inputs force this leak the more bare metal, the better?!?! 5 5

6 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 6 6

7 ase Study: Virtual heaps What are they? > Terracotta can make application heap page in and out of the JVM machine > It can persist application heap to disk > It can provide centralized object identity across distributed JVMs across a T/I network 7 7

8 Virtual heap pictorial Java Virtual Machine pplication instance H I J K 8 8

9 Virtual heap pictorial Java Virtual Machine - pplication instance H I J K 8 8

10 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K 9 9

11 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server -G app heap in memory H I J K 9 9

12 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K page to disk app heap on disk Terracotta Server's disk H I J K 10 10

13 Virtual heap pictorial Java Virtual Machine pplication instance H I J K page to Terracotta Terracotta Server app heap in memory H I J K page to disk -O app heap on disk Terracotta Server's disk H I J K 10 10

14 The Garbage ollection hallenge Virtual heaps represent a distributed computing challenge > How do we collect garbage in a distributed world? > No one layer has everything > rawling disk on Terracotta server is too slow > rawling memory in application instances is incorrect as heap is now virtual > NOTE: platform would not help with any of these issues

15 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 12 12

16 Terminology et s get on the same page > 1 application node, JVM machine > 2 Terracotta server instance > root top-most object in a graph clustered by Terracotta (for example, a ) > garbage in Terracotta garbage in Java no reference from any object reachable from roots not resident in any application heap 13 13

17 How distributed garbage gets created the last bit of context... > reate a shared object by adding it to a shared graph > Remove the shared object from the graph in 1 > G in 1 collects object > Now the object is garbage and needs to be collected > ollect it in the 2 > Example: map.put( k,v); map.remove(k); // value is garbage 14 14

18 reating distributed garbage n existing object graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance // #1 creates,, and // links them together = new object();.child = ;.child = ; //

19 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //

20 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //

21 reating distributed garbage Mutating that graph Java Virtual Machine #1 Java Virtual Machine #2 pplication instance pplication instance E // later, #2 changes to E //....sethild( E ); //... becomes distributed garbage, even though JVM #2 never accesses it

22 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding young generations > Step 3: DG for Terracotta server arrays > onclusions 17 17

23 DG in a single Terracotta server > oncurrent mark / sweep algorithm > hases: mark, rescue1, rescue2, delete > Mark objects still reachable from the set of all roots (rootset) > monitor references in app heap for deltas > rescue any objects not marked but joining the graph during mark phase (rescueset) > rescue any objects not reachable but still in 1 heap > ause 2, rescue again, and delete garbage 18 18

24 Marking reachable objects G_andidates exclude rootset, reachables objects paged in to 1 heap G_andidates: M N O 19 19

25 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

26 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

27 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

28 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: E F G 19 19

29 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: F G 19 19

30 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: F G M N O 19 19

31 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: G M N O 19 19

32 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

33 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: M N O 19 19

34 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: N O 19 19

35 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: O 19 19

36 Marking reachable objects G_andidates exclude rootset, reachables rootset:, objects paged in to 1 heap G_andidates: 19 19

37 Rescue phase 1 Reference monitoring informs 2 of changes objects paged in to 1 heap 20 20

38 Rescue phase 1 Reference monitoring informs 2 of changes objects paged in to 1 heap reference deleted 20 20

39 Rescue phase 1 Deleted references that are still paged in are special objects paged in to 1 heap 21 21

40 Rescue phase 1 Deleted references that are still paged in are special objects paged in to 1 heap reference deleted 21 21

41 Rescue phase 1 ecause they can be rescued by other paged in objects objects paged in to 1 heap 22 22

42 Rescue phase 1 ecause they can be rescued by other paged in objects objects paged in to 1 heap reference rescued 22 22

43 Rescue phase 1 The resultant garbage in this scenario objects paged in to 1 heap G_andidates: G 23 23

44 Rescue phase 2 One quick pause and we are done > ogically same as rescue 1 > but no changes are allowed into 2 could not be collected even if not rescued because it cannot be Ged > So apply any rescueset objects to G_andidates > Then collect garbage > In the example, G, are garbage > We re done

45 Drawbacks to simple DG Might require tuning > Frequent DG pauses app too often uses up 2 cycles / adds workload > Infrequent DG more time to delete increase chance that reachable objects have flushed to disk disk can fill up more overhead in bookkeeping garbage 25 25

46 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server arrays > onclusions 26 26

47 Generational collector trying to avoid distribution > If we partition data across 2s, each will have less data to manage divide and conquer data divide and conquer G...but more later > Generational collection We can avoid marking anything that is on disk can help avoid disk I/O maybe help avoid the need to distribute 2s 27 27

48 Generational collection Differences from simple DG > Similar to simple DG > redefine rootset to be roots currently in memory > add oldgen, younggen spaces oldgen objects that have fallen out of cache at some point younggen newly created objects in 2 > rememberedset backpointers from oldgen to younggen this one is tricky 28 28

49 Young generation pictorial oldgen, younggen, rootset D E F younggen in memory oldgen on disk 29 29

50 Young generation pictorial oldgen, younggen, rootset rootset: D E F younggen in memory oldgen on disk 29 29

51 The rememberedset backpointers help skip I/O rootset: rememberedset:,m D E F younggen in memory oldgen on disk 30 30

52 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

53 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

54 The rememberedset et s mark in younggen rootset: rememberedset:,m D E F younggen in memory oldgen on disk G_andidates: M 31 31

55 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: M 31 31

56 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: M 31 31

57 The rememberedset et s mark in younggen rootset: rememberedset:,m DISK RRIER D E F younggen in memory oldgen on disk G_andidates: 31 31

58 Generational collection summary We need a generic large scale algorithm > enefits We avoided all disk I/O Younggen G run in milliseconds an run very frequently avoids sending objects to oldgen / disk > Issues Doesn t visit all objects, doesn t find all garbage Full DGs must still occur Distributed garbage must be created often 32 32

59 genda > Hypothesis > ase study: virtual heaps / distributed gc > Terminology > Step 1: DG in single Terracotta server > Step 2: dding generational collection > Step 3: DG for Terracotta server stripes > onclusions 33 33

60 DG for Server Stripes lots of challenges to address > Distributing the object graphs lets us handle arbitrary use cases > We can do fast updates w/o distributed transaction overhead another talk though... > ut we can t get around distributed algorithm for G no single 2 knows the whole graph shape what if nodes fail? network fails? async updates means DG is totally async 34 34

61 Graphical view of server stripes Terracotta server 1 Terracotta server 2 Terracotta server

62 ore algorithm > oordinator 2 decides when to start a DG > oordinator 2 decides when it is complete > pass marking token on to other 2s > Share no lists, sets, etc. amongst 2s > Make 2s respond to a global ticker event to track progress 36 36

63 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

64 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

65 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: D E M F G N O 37 37

66 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: E M F G N O 37 37

67 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O 37 37

68 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O outbound mark request 37 37

69 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M F G N O outbound mark request 37 37

70 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M G N O outbound mark request 37 37

71 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: M N O outbound mark request 37 37

72 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: outbound mark request 37 37

73 Distributed marking Terracotta server 1 Terracotta server 2 Terracotta server 3 rootset: rootset: null rootset: G_andidates: G_andidates: G_andidates: outbound mark request 37 37

74 DG for stripes details > Mark starts at roots Requests to mark are batched, asynch for, and > We find the M,N,O graph to be garbage > Since no single machine has all G_andidates we... sked Server #2 to follow from for us > We ignore failures and abandon the DG run Nothing persisted / changed till last stage will get results when it comes back to life 38 38

75 The TIKER TOKEN our coordinating device > Ticker token a token that gets passed around the server ring > oordinator initiates > Each node adds its status and passes token on outbound inbound farmed marking tasks to other servers tasks completed on behalf of others > hase complete when inbound - outbound==0 > hase not complete, start a new token on next tick 39 39

76 DG for stripes is totally asynchronous > We do... mark in parallel rescue in parallel accept graph changes in parallel > We don t... share G_andidates, rootset or rememberedset have any synchronous remote calls 40 40

77 latform independent > We could have used messaging JGroups RMI raw T or UD protocols RESTful HTT interface > Transport doesn t really matter. > and neither does language or framework, we think 41 41

78 Have we proven our hypothesis? > lgorithm design outweighs platform importance > Specific asynch frameworks may or may not help > Not a dig on any technology but the TIKER TOKEN the mark/sweep and more, make us go fast and those are algorithms > Was distribution valuable? On 10G of data: 1 2: 2.2MM objects, 1 hour 8 2s: 2.2MM objects, 55 seconds 42 42

79 Future optimizations > Generational collection for stripes > Object migration and packing, train algorithms > Resume after crash (only restarts today) 43 43

80 Want to learn more? > ome by our booth > sk about avoiding distributed transactions lock leasing and fairness > There are all sorts of cool runtime optimizations inside the Terracotta array >

81 ri Zilka, Saravanan 45

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic