Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez

Size: px

Start display at page:

Download "Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez"

Ashlee Quinn
5 years ago
Views:

1 Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai, Yee Ling Gan, and Daniel Sanchez

2 Memory systems expose an inexpressive interface 2

3 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores Memory hierarchy 2

4 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores 0x0000 0xFFFF Obj. A Obj. B Memory hierarchy Programmers think of objects and pointers among objects 2

5 Modern languages expose an object-based memory model Program Object-based Object accesses model Runtime/Compiler 0x0000 0xFFFF Obj. A Obj. B Flat address space Loads and stores to objects Memory hierarchy Strictly hiding the flat address space provides many benefits: Memory safety prevents memory corruption bugs Automatic memory management (garbage collection) simplifies programming 3

6 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy 4

7 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy Semantic gap between programs and the memory hierarchy 4

8 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Core L1 $ L2 $ Main Mem. Obj. A Obj B Obj. C 0x0000 Semantic gap between programs and the memory hierarchy 0xFFFF 4

9 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF 4

10 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF Costly associative lookups 4

11 Hotpads: An object-based memory hierarchy 5

12 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it 5

13 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects 5

14 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Core L1 pad L2 pad L3 pad 5

15 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Object-based ISA Program Object operations Hotpads manages objects instead of cache lines Hotpads Manages objects Core L1 pad L2 pad L3 pad 5

16 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad 5

17 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad Hotpads provides architectural support for in-hierarchy object allocation and recycling 5

18 Prior architectural support for object-based programs 6

19 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ 6

20 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core GC unit L1 $ L2 $ 6

21 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ 6

22 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ We focus on redesigning the memory hierarchy 6

23 Hotpads overview 7

24 Hotpads overview Core L1 pad L2 pad L3 pad 7

25 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Objects Free space 7

26 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Free space 7

27 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 7

28 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags 7

29 Metadata (word/object) Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags Metadata Pointer? valid? dirty? recently-used? 7

30 Hotpads example 8

31 Hotpads example class Node { int value; Node next; } 8

32 Hotpads example class Node { int value; Node next; } RegFile r0 r1 r2 r3 L1 Pad L2 Pad Main Mem Objects Free space A Initial state. B 8

33 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B 9

34 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset 9

35 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset Bump pointer allocation stores A compactly after other objects 9

36 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B 10

37 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) 10

38 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) Hotpads rewrites pointers safely because it hides the memory layout from software 10

39 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A A B copied into L1. A s pointer is rewritten. B 11

40 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11

41 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11

42 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups B 11

43 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B C-Tags: A,B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups C-tags let dereferencing other pointers of A and B find their L1 copies B 11

44 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B 12

45 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B In-hierarchy allocation reduces data movement and requires no backing storage in main memory or larger pads 12

46 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D B (stale) A 13

47 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D A L1 pad is now full B (stale) 13

48 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk D L1 pad is now full 13

49 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) D When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk L1 pad is now full C is dead (unreferenced). Other objects are live. Only B is recently used. 13

50 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14

51 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14

52 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) 14

53 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. 14

54 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. Result: No need to check the L2 pad when performing a collection-eviction in the L1 pad. 14

55 Collection-evictions reduce data movement 15

56 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis 15

57 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15

58 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15

59 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 15

60 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 90% of object bytes never reach main memory 15

61 See paper for additional features 16

62 See paper for additional features Supporting large objects with subobject fetches 16

63 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence 16

64 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs 16

65 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs and more details! 16

66 Evaluation 17

67 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM 17

68 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 17

69 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 Workloads 13 Java workloads from Dacapo, SpecJBB, and JgraphT JVM modified to use the Hotpads ISA 17

70 Hotpads outperforms conventional hierarchies 18

71 Hotpads outperforms conventional hierarchies 18

72 Hotpads outperforms conventional hierarchies 34% improvement 18

73 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 18

74 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 2. Hardware-based collectionevictions reduce GC overheads 18

75 Hotpads reduces dynamic memory hierarchy energy 19

76 Hotpads reduces dynamic memory hierarchy energy 19

77 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 19

78 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 19

79 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 2. Hierarchical collection-evictions reduce memory and GC energy 19

80 Hotpads also provides benefits on compiled code 20

81 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 20

82 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator Hotpads improves performance and energy efficiency over manual memory management 20

tcmalloc, a state-of-the-art memory allocator 2.7x reduction 3.

83 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 2.7x reduction 3.6x reduction Hotpads improves performance and energy efficiency over manual memory management 20

84 See paper for more results Results for multithreaded workloads Detailed analysis of pointer rewriting and CEs Comparison with other cache-based techniques Enhanced baseline using DRRIP and stream prefetchers Cache scrubbing and zeroing [Sartor et al., PACT 14] Legacy mode performance on SPECCPU apps 21

85 An object-based memory hierarchy provides tremendous benefits 22

86 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines 22

87 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout 22

88 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction 22

89 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 22

90 Thanks! Questions? Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 23

A program execution is memory safe so long as memory access errors never occur:

A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories