Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez

Size: px
Start display at page:

Download "Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez"

Transcription

1 Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai, Yee Ling Gan, and Daniel Sanchez

2 Memory systems expose an inexpressive interface 2

3 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores Memory hierarchy 2

4 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores 0x0000 0xFFFF Obj. A Obj. B Memory hierarchy Programmers think of objects and pointers among objects 2

5 Modern languages expose an object-based memory model Program Object-based Object accesses model Runtime/Compiler 0x0000 0xFFFF Obj. A Obj. B Flat address space Loads and stores to objects Memory hierarchy Strictly hiding the flat address space provides many benefits: Memory safety prevents memory corruption bugs Automatic memory management (garbage collection) simplifies programming 3

6 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy 4

7 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy Semantic gap between programs and the memory hierarchy 4

8 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Core L1 $ L2 $ Main Mem. Obj. A Obj B Obj. C 0x0000 Semantic gap between programs and the memory hierarchy 0xFFFF 4

9 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF 4

10 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF Costly associative lookups 4

11 Hotpads: An object-based memory hierarchy 5

12 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it 5

13 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects 5

14 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Core L1 pad L2 pad L3 pad 5

15 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Object-based ISA Program Object operations Hotpads manages objects instead of cache lines Hotpads Manages objects Core L1 pad L2 pad L3 pad 5

16 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad 5

17 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad Hotpads provides architectural support for in-hierarchy object allocation and recycling 5

18 Prior architectural support for object-based programs 6

19 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ 6

20 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core GC unit L1 $ L2 $ 6

21 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ 6

22 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ We focus on redesigning the memory hierarchy 6

23 Hotpads overview 7

24 Hotpads overview Core L1 pad L2 pad L3 pad 7

25 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Objects Free space 7

26 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Free space 7

27 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 7

28 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags 7

29 Metadata (word/object) Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags Metadata Pointer? valid? dirty? recently-used? 7

30 Hotpads example 8

31 Hotpads example class Node { int value; Node next; } 8

32 Hotpads example class Node { int value; Node next; } RegFile r0 r1 r2 r3 L1 Pad L2 Pad Main Mem Objects Free space A Initial state. B 8

33 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B 9

34 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset 9

35 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset Bump pointer allocation stores A compactly after other objects 9

36 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B 10

37 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) 10

38 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) Hotpads rewrites pointers safely because it hides the memory layout from software 10

39 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A A B copied into L1. A s pointer is rewritten. B 11

40 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11

41 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11

42 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups B 11

43 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B C-Tags: A,B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups C-tags let dereferencing other pointers of A and B find their L1 copies B 11

44 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B 12

45 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B In-hierarchy allocation reduces data movement and requires no backing storage in main memory or larger pads 12

46 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D B (stale) A 13

47 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D A L1 pad is now full B (stale) 13

48 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk D L1 pad is now full 13

49 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) D When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk L1 pad is now full C is dead (unreferenced). Other objects are live. Only B is recently used. 13

50 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14

51 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14

52 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) 14

53 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. 14

54 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. Result: No need to check the L2 pad when performing a collection-eviction in the L1 pad. 14

55 Collection-evictions reduce data movement 15

56 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis 15

57 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15

58 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15

59 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 15

60 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 90% of object bytes never reach main memory 15

61 See paper for additional features 16

62 See paper for additional features Supporting large objects with subobject fetches 16

63 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence 16

64 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs 16

65 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs and more details! 16

66 Evaluation 17

67 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM 17

68 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 17

69 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 Workloads 13 Java workloads from Dacapo, SpecJBB, and JgraphT JVM modified to use the Hotpads ISA 17

70 Hotpads outperforms conventional hierarchies 18

71 Hotpads outperforms conventional hierarchies 18

72 Hotpads outperforms conventional hierarchies 34% improvement 18

73 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 18

74 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 2. Hardware-based collectionevictions reduce GC overheads 18

75 Hotpads reduces dynamic memory hierarchy energy 19

76 Hotpads reduces dynamic memory hierarchy energy 19

77 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 19

78 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 19

79 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 2. Hierarchical collection-evictions reduce memory and GC energy 19

80 Hotpads also provides benefits on compiled code 20

81 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 20

82 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator Hotpads improves performance and energy efficiency over manual memory management 20

83 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 2.7x reduction 3.6x reduction Hotpads improves performance and energy efficiency over manual memory management 20

84 See paper for more results Results for multithreaded workloads Detailed analysis of pointer rewriting and CEs Comparison with other cache-based techniques Enhanced baseline using DRRIP and stream prefetchers Cache scrubbing and zeroing [Sartor et al., PACT 14] Legacy mode performance on SPECCPU apps 21

85 An object-based memory hierarchy provides tremendous benefits 22

86 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines 22

87 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout 22

88 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction 22

89 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 22

90 Thanks! Questions? Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 23

A program execution is memory safe so long as memory access errors never occur:

A program execution is memory safe so long as memory access errors never occur: A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories

More information

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency

More information

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth

More information

Optimising Multicore JVMs. Khaled Alnowaiser

Optimising Multicore JVMs. Khaled Alnowaiser Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Run-time Environments -Part 3

Run-time Environments -Part 3 Run-time Environments -Part 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Outline of the Lecture Part 3 What is run-time support?

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CPSC 213. Introduction to Computer Systems. Instance Variables and Dynamic Allocation. Unit 1c

CPSC 213. Introduction to Computer Systems. Instance Variables and Dynamic Allocation. Unit 1c CPSC 213 Introduction to Computer Systems Unit 1c Instance Variables and Dynamic Allocation 1 Reading For Next 3 Lectures Companion 2.4.4-2.4.5 Textbook Structures, Dynamic Memory Allocation, Understanding

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit

More information

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay Lecture 19: File System Implementation Mythili Vutukuru IIT Bombay File System An organization of files and directories on disk OS has one or more file systems Two main aspects of file systems Data structures

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

2. PICTURE: Cut and paste from paper

2. PICTURE: Cut and paste from paper File System Layout 1. QUESTION: What were technology trends enabling this? a. CPU speeds getting faster relative to disk i. QUESTION: What is implication? Can do more work per disk block to make good decisions

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

CS 6354: Memory Hierarchy I. 29 August 2016

CS 6354: Memory Hierarchy I. 29 August 2016 1 CS 6354: Memory Hierarchy I 29 August 2016 Survey results 2 Processor/Memory Gap Figure 2.2 Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

Run-time Environments - 3

Run-time Environments - 3 Run-time Environments - 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture n What is run-time

More information

Azul Systems, Inc.

Azul Systems, Inc. 1 Stack Based Allocation in the Azul JVM Dr. Cliff Click cliffc@azulsystems.com 2005 Azul Systems, Inc. Background The Azul JVM is based on Sun HotSpot a State-of-the-Art Java VM Java is a GC'd language

More information

EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION

EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques

More information

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture: CS 537 Introduction to Operating Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is

More information

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Binding and Storage Björn B. Brandenburg The University of North Carolina at Chapel Hill Based in part on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. What s

More information

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands Grassroots ASPLOS can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands ASPLOS-17 Doctoral Workshop London, March 4th, 2012 1 Current

More information

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

STEPS Towards Cache-Resident Transaction Processing

STEPS Towards Cache-Resident Transaction Processing STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

Comparing Memory Systems for Chip Multiprocessors

Comparing Memory Systems for Chip Multiprocessors Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University

More information

Chapter 10: Case Studies. So what happens in a real operating system?

Chapter 10: Case Studies. So what happens in a real operating system? Chapter 10: Case Studies So what happens in a real operating system? Operating systems in the real world Studied mechanisms used by operating systems Processes & scheduling Memory management File systems

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Using the Singularity Research Development Kit

Using the Singularity Research Development Kit Using the Research Development Kit James Larus & Galen Hunt Microsoft Research ASPLOS 08 Tutorial March 1, 2008 Outline Overview (Jim) Rationale & key decisions architecture Details (Galen) Safe Languages

More information

LECTURE 12. Virtual Memory

LECTURE 12. Virtual Memory LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

Achieving Out-of-Order Performance with Almost In-Order Complexity

Achieving Out-of-Order Performance with Almost In-Order Complexity Achieving Out-of-Order Performance with Almost In-Order Complexity Comprehensive Examination Part II By Raj Parihar Background Info: About the Paper Title Achieving Out-of-Order Performance with Almost

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System File Systems Kartik Gopalan Chapter 4 From Tanenbaum s Modern Operating System 1 What is a File System? File system is the OS component that organizes data on the raw storage device. Data, by itself, is

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"

More information

Automatic Memory Management

Automatic Memory Management Automatic Memory Management Why Automatic Memory Management? Storage management is still a hard problem in modern programming Why Automatic Memory Management? Storage management is still a hard problem

More information

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1 What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

CPSC 213. Introduction to Computer Systems. Winter Session 2017, Term 2. Unit 1c Jan 24, 26, 29, 31, and Feb 2

CPSC 213. Introduction to Computer Systems. Winter Session 2017, Term 2. Unit 1c Jan 24, 26, 29, 31, and Feb 2 CPSC 213 Introduction to Computer Systems Winter Session 2017, Term 2 Unit 1c Jan 24, 26, 29, 31, and Feb 2 Instance Variables and Dynamic Allocation Overview Reading Companion: Reference 2.4.4-5 Textbook:

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Name, Scope, and Binding. Outline [1]

Name, Scope, and Binding. Outline [1] Name, Scope, and Binding In Text: Chapter 3 Outline [1] Variable Binding Storage bindings and lifetime Type bindings Type Checking Scope Lifetime vs. Scope Referencing Environments N. Meng, S. Arthur 2

More information

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview

More information

15-740/ Computer Architecture

15-740/ Computer Architecture 15-740/18-740 Computer Architecture Lecture 19: Caching II Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/31/2011 Announcements Milestone II Due November 4, Friday Please talk with us if you

More information

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf The C4 Collector Or: the Application memory wall will remain until compaction is solved Gil Tene Balaji Iyengar Michael Wolf High Level Agenda 1. The Application Memory Wall 2. Generational collection

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Cooperative Cache Scrubbing

Cooperative Cache Scrubbing Cooperative Cache Scrubbing Jennifer B. Sartor Ghent University Belgium Wim Heirman Intel ExaScience Lab Belgium Stephen M. Blackburn Australian National University Australia Lieven Eeckhout Ghent University

More information

15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 12: Advanced Caching Prof. Onur Mutlu Carnegie Mellon University Announcements Chuck Thacker (Microsoft Research) Seminar Tomorrow RARE: Rethinking Architectural

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Run-Time Environments/Garbage Collection

Run-Time Environments/Garbage Collection Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs

More information

The New C Standard (Excerpted material)

The New C Standard (Excerpted material) The New C Standard (Excerpted material) An Economic and Cultural Commentary Derek M. Jones derek@knosof.co.uk Copyright 2002-2008 Derek M. Jones. All rights reserved. 39 3.2 3.2 additive operators pointer

More information

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont.   History Table. Correlating Prediction Table Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.

More information

Concurrent Garbage Collection

Concurrent Garbage Collection Concurrent Garbage Collection Deepak Sreedhar JVM engineer, Azul Systems Java User Group Bangalore 1 @azulsystems azulsystems.com About me: Deepak Sreedhar JVM student at Azul Systems Currently working

More information

Chapter 6 Objectives

Chapter 6 Objectives Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

15-740/ Computer Architecture

15-740/ Computer Architecture 15-740/18-740 Computer Architecture Lecture 16: Runahead and OoO Wrap-Up Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/17/2011 Review Set 9 Due this Wednesday (October 19) Wilkes, Slave Memories

More information

SECOMP Efficient Formally Secure Compilers to a Tagged Architecture. Cătălin Hrițcu INRIA Paris

SECOMP Efficient Formally Secure Compilers to a Tagged Architecture. Cătălin Hrițcu INRIA Paris SECOMP Efficient Formally Secure Compilers to a Tagged Architecture Cătălin Hrițcu INRIA Paris 1 SECOMP Efficient Formally Secure Compilers to a Tagged Architecture Cătălin Hrițcu INRIA Paris 5 year vision

More information

Introduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem

Introduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

CS527 Software Security

CS527 Software Security Security Policies Purdue University, Spring 2018 Security Policies A policy is a deliberate system of principles to guide decisions and achieve rational outcomes. A policy is a statement of intent, and

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

The University of Texas at Austin

The University of Texas at Austin EE382N: Principles in Computer Architecture Parallelism and Locality Fall 2009 Lecture 24 Stream Processors Wrapup + Sony (/Toshiba/IBM) Cell Broadband Engine Mattan Erez The University of Texas at Austin

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Mark-Sweep and Mark-Compact GC

Mark-Sweep and Mark-Compact GC Mark-Sweep and Mark-Compact GC Richard Jones Anthony Hoskins Eliot Moss Presented by Pavel Brodsky 04/11/14 Our topics today Two basic garbage collection paradigms: Mark-Sweep GC Mark-Compact GC Definitions

More information

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium

More information