Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez
|
|
- Ashlee Quinn
- 5 years ago
- Views:
Transcription
1 Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai, Yee Ling Gan, and Daniel Sanchez
2 Memory systems expose an inexpressive interface 2
3 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores Memory hierarchy 2
4 Memory systems expose an inexpressive interface Flat address space Program Arbitrary loads and stores 0x0000 0xFFFF Obj. A Obj. B Memory hierarchy Programmers think of objects and pointers among objects 2
5 Modern languages expose an object-based memory model Program Object-based Object accesses model Runtime/Compiler 0x0000 0xFFFF Obj. A Obj. B Flat address space Loads and stores to objects Memory hierarchy Strictly hiding the flat address space provides many benefits: Memory safety prevents memory corruption bugs Automatic memory management (garbage collection) simplifies programming 3
6 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy 4
7 The inexpressive flat address space is inefficient Flat address space Program Object-based model Runtime/Compiler Memory hierarchy Semantic gap between programs and the memory hierarchy 4
8 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Core L1 $ L2 $ Main Mem. Obj. A Obj B Obj. C 0x0000 Semantic gap between programs and the memory hierarchy 0xFFFF 4
9 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF 4
10 The inexpressive flat address space is inefficient Program Object-based model Runtime/Compiler Flat address space Memory hierarchy Semantic gap between programs and the memory hierarchy Core L1 $ L2 $ Mismatch between objects and cache lines Main Mem. Obj. A B Obj Obj. C 0x0000 0xFFFF Costly associative lookups 4
11 Hotpads: An object-based memory hierarchy 5
12 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it 5
13 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects 5
14 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Core L1 pad L2 pad L3 pad 5
15 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Object-based ISA Program Object operations Hotpads manages objects instead of cache lines Hotpads Manages objects Core L1 pad L2 pad L3 pad 5
16 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad 5
17 Hotpads: An object-based memory hierarchy A memory hierarchy designed from the ground up for object-based programs Provides first-class support for objects and pointers in the ISA Hides the memory layout from software and takes control over it Program Object-based ISA Object operations Hotpads Manages objects Hotpads manages objects instead of cache lines Hotpads rewrites pointers to reduce associative lookups Core L1 pad L2 pad L3 pad Hotpads provides architectural support for in-hierarchy object allocation and recycling 5
18 Prior architectural support for object-based programs 6
19 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ 6
20 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Core Type check unit vfunction unit L1 $ L2 $ Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core GC unit L1 $ L2 $ 6
21 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ 6
22 Prior architectural support for object-based programs Object-oriented/typed systems [iapx432, Dally ISCA 85, CHERI ISCA 14, Kim et al. ASPLOS 17] focus on core microarchitecture design Accelerate virtual calls, object references and dynamic type checks Hardware accelerators for GC [The Lisp Machine, Joao et al. ISCA 09, Maas et al. ISCA 18] Core Type check unit vfunction unit Core L1 $ GC unit Prior work uses standard cache hierarchies L1 $ L2 $ L2 $ We focus on redesigning the memory hierarchy 6
23 Hotpads overview 7
24 Hotpads overview Core L1 pad L2 pad L3 pad 7
25 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Objects Free space 7
26 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Free space 7
27 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 7
28 Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags 7
29 Metadata (word/object) Hotpads overview Data array Core L1 pad L2 pad L3 pad Managed as a circular buffer using simple bump pointer allocation Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B C-Tags Decoupled tag store used only for a fraction of accesses Free space C-Tags Metadata Pointer? valid? dirty? recently-used? 7
30 Hotpads example 8
31 Hotpads example class Node { int value; Node next; } 8
32 Hotpads example class Node { int value; Node next; } RegFile r0 r1 r2 r3 L1 Pad L2 Pad Main Mem Objects Free space A Initial state. B 8
33 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B 9
34 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset 9
35 Hotpads moves object implicitly class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. B All loads/stores follow a single addressing mode: Base+offset Bump pointer allocation stores A compactly after other objects 9
36 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B 10
37 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) 10
38 Hotpads rewrites pointers to avoid associative lookups class Node { int value; Node next; } Program code: int v = A.value; RegFile r0 r1 r2 r3 Hotpads instructions: ld r0, (r1).value L1 Pad L2 Pad Main Mem A A Core issues access to A. A is copied into L1 pad. r1 is rewritten to A s L1 pad address. B Subsequent dereferences of r1 access A s L1 copy directly, without associative lookups (like a scratchpad) Hotpads rewrites pointers safely because it hides the memory layout from software 10
39 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A A B copied into L1. A s pointer is rewritten. B 11
40 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11
41 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. B 11
42 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups B 11
43 Pointer rewriting applies to L1 pad data as well class Node { int value; Node next; } Program code: v = A.next.value; RegFile r0 r1 r2 r3 Hotpads instructions: derefptr r2, (r1).next ld r3, (r2).value L1 Pad L2 Pad Main Mem A B C-Tags: A,B A B copied into L1. A s pointer is rewritten. Subsequent dereferences of A.next access the L1 copy of B directly, without associative lookups C-tags let dereferencing other pointers of A and B find their L1 copies B 11
44 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B 12
45 Hotpads supports in-hierarchy object allocation class Node { int value; Node next; } Program code: Node C = new Node(); RegFile r0 r1 r2 r3 Hotpads instructions: alloc r3, type=node L1 Pad L2 Pad Main Mem A B C A Core allocates new object C. B In-hierarchy allocation reduces data movement and requires no backing storage in main memory or larger pads 12
46 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D B (stale) A 13
47 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A BC D A L1 pad is now full B (stale) 13
48 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk D L1 pad is now full 13
49 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem A A BC B (stale) D When a pad fills up, it triggers a collection-eviction (CE) to free space Discards dead objects Evicts live, non-recently used objects to the next level in bulk L1 pad is now full C is dead (unreferenced). Other objects are live. Only B is recently used. 13
50 Hotpads unifies garbage collection and object evictions RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14
51 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical RegFile L1 Pad L2 Pad Main Mem B Free space A D B (stale) L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space 14
52 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) 14
53 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. 14
54 Hotpads unifies garbage collection and object evictions CEs happen concurrently with program execution and are hierarchical Each pad can perform a CE independently from larger, higher-level pads Makes CE cost proportional to pad size RegFile L1 Pad L2 Pad Main Mem B Free space L1 collection-eviction (CE) collects dead C and evicts live A & D to L2. It leaves a large contiguous chunk of free space A D B (stale) Invariant: Objects at a particular level may only point to objects at the same or larger levels. Result: No need to check the L2 pad when performing a collection-eviction in the L1 pad. 14
55 Collection-evictions reduce data movement 15
56 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis 15
57 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15
58 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live 15
59 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 15
60 Collection-evictions reduce data movement Hotpads unifies the locality principle and the generational hypothesis Hotpads acts like a super-generational collector Accesses to short-lived objects are cheap and fast Most of main-memory data is live Most objects are collected in the L1 pad 90% of object bytes never reach main memory 15
61 See paper for additional features 16
62 See paper for additional features Supporting large objects with subobject fetches 16
63 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence 16
64 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs 16
65 See paper for additional features Supporting large objects with subobject fetches Object-level pad coherence Legacy mode to support flat-address-based programs and more details! 16
66 Evaluation 17
67 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM 17
68 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 17
69 Evaluation We simulate Hotpads using MaxSim [Rodchenko et al., ISPASS 17] A simulator combining ZSim and Maxine JVM Modeled system 4 OOO cores 3-level cache or pad hierarchy Core Core L1 L1 L2 L2 Shared L3 Workloads 13 Java workloads from Dacapo, SpecJBB, and JgraphT JVM modified to use the Hotpads ISA 17
70 Hotpads outperforms conventional hierarchies 18
71 Hotpads outperforms conventional hierarchies 18
72 Hotpads outperforms conventional hierarchies 34% improvement 18
73 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 18
74 Hotpads outperforms conventional hierarchies 34% improvement 1. In-hierarchy allocation reduces memory stalls in application code 2. Hardware-based collectionevictions reduce GC overheads 18
75 Hotpads reduces dynamic memory hierarchy energy 19
76 Hotpads reduces dynamic memory hierarchy energy 19
77 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 19
78 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 19
79 Hotpads reduces dynamic memory hierarchy energy 2.6x reduction 1. Pointer rewriting and direct accesses reduce L1 energy by 2.3x 2. Hierarchical collection-evictions reduce memory and GC energy 19
80 Hotpads also provides benefits on compiled code 20
81 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 20
82 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator Hotpads improves performance and energy efficiency over manual memory management 20
83 Hotpads also provides benefits on compiled code We study an allocation-heavy, binary-tree benchmark written in C Compare Hotpads with tcmalloc, a state-of-the-art memory allocator 2.7x reduction 3.6x reduction Hotpads improves performance and energy efficiency over manual memory management 20
84 See paper for more results Results for multithreaded workloads Detailed analysis of pointer rewriting and CEs Comparison with other cache-based techniques Enhanced baseline using DRRIP and stream prefetchers Cache scrubbing and zeroing [Sartor et al., PACT 14] Legacy mode performance on SPECCPU apps 21
85 An object-based memory hierarchy provides tremendous benefits 22
86 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines 22
87 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout 22
88 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction 22
89 An object-based memory hierarchy provides tremendous benefits Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 22
90 Thanks! Questions? Modern programs operate on objects, not cache lines Hotpads is an object-based memory hierarchy that supports objects in the ISA and hides the memory layout Hotpads outperforms conventional cache hierarchies because it: Moves objects rather than cache lines Avoids most associative lookups with pointer rewriting Provides hardware support for in-hierarchy allocation and unified collection-eviction Hotpads also unlocks new memory hierarchy optimizations 23
A program execution is memory safe so long as memory access errors never occur:
A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationBanshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth
More informationOptimising Multicore JVMs. Khaled Alnowaiser
Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES
PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential
More informationRun-time Environments -Part 3
Run-time Environments -Part 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Outline of the Lecture Part 3 What is run-time support?
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationIdentifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo
Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationCPSC 213. Introduction to Computer Systems. Instance Variables and Dynamic Allocation. Unit 1c
CPSC 213 Introduction to Computer Systems Unit 1c Instance Variables and Dynamic Allocation 1 Reading For Next 3 Lectures Companion 2.4.4-2.4.5 Textbook Structures, Dynamic Memory Allocation, Understanding
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationKey Point. What are Cache lines
Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible
More informationSurvey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed
Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit
More informationLecture 19: File System Implementation. Mythili Vutukuru IIT Bombay
Lecture 19: File System Implementation Mythili Vutukuru IIT Bombay File System An organization of files and directories on disk OS has one or more file systems Two main aspects of file systems Data structures
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More information2. PICTURE: Cut and paste from paper
File System Layout 1. QUESTION: What were technology trends enabling this? a. CPU speeds getting faster relative to disk i. QUESTION: What is implication? Can do more work per disk block to make good decisions
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationCS 6354: Memory Hierarchy I. 29 August 2016
1 CS 6354: Memory Hierarchy I 29 August 2016 Survey results 2 Processor/Memory Gap Figure 2.2 Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time
More informationHigh-Level Language VMs
High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs
More informationRun-time Environments - 3
Run-time Environments - 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture n What is run-time
More informationAzul Systems, Inc.
1 Stack Based Allocation in the Azul JVM Dr. Cliff Click cliffc@azulsystems.com 2005 Azul Systems, Inc. Background The Azul JVM is based on Sun HotSpot a State-of-the-Art Java VM Java is a GC'd language
More informationEXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION
EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques
More informationPick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality
Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior
More informationManaged runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley
Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?
More informationManaged runtimes & garbage collection
Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security
More informationMotivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:
CS 537 Introduction to Operating Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is
More informationBinding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill
Binding and Storage Björn B. Brandenburg The University of North Carolina at Chapel Hill Based in part on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. What s
More informationGrassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands
Grassroots ASPLOS can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands ASPLOS-17 Doctoral Workshop London, March 4th, 2012 1 Current
More informationPerformance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs
Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationComputer Systems A Programmer s Perspective 1 (Beta Draft)
Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface
More informationSTEPS Towards Cache-Resident Transaction Processing
STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationChapter 10: Case Studies. So what happens in a real operating system?
Chapter 10: Case Studies So what happens in a real operating system? Operating systems in the real world Studied mechanisms used by operating systems Processes & scheduling Memory management File systems
More informationLimiting the Number of Dirty Cache Lines
Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
More informationMemory. Objectives. Introduction. 6.2 Types of Memory
Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationUsing the Singularity Research Development Kit
Using the Research Development Kit James Larus & Galen Hunt Microsoft Research ASPLOS 08 Tutorial March 1, 2008 Outline Overview (Jim) Rationale & key decisions architecture Details (Galen) Safe Languages
More informationLECTURE 12. Virtual Memory
LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query
More informationAchieving Out-of-Order Performance with Almost In-Order Complexity
Achieving Out-of-Order Performance with Almost In-Order Complexity Comprehensive Examination Part II By Raj Parihar Background Info: About the Paper Title Achieving Out-of-Order Performance with Almost
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationFile Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System
File Systems Kartik Gopalan Chapter 4 From Tanenbaum s Modern Operating System 1 What is a File System? File system is the OS component that organizes data on the raw storage device. Data, by itself, is
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationThe Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!
The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"
More informationAutomatic Memory Management
Automatic Memory Management Why Automatic Memory Management? Storage management is still a hard problem in modern programming Why Automatic Memory Management? Storage management is still a hard problem
More informationShenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke
Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1 What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationCPSC 213. Introduction to Computer Systems. Winter Session 2017, Term 2. Unit 1c Jan 24, 26, 29, 31, and Feb 2
CPSC 213 Introduction to Computer Systems Winter Session 2017, Term 2 Unit 1c Jan 24, 26, 29, 31, and Feb 2 Instance Variables and Dynamic Allocation Overview Reading Companion: Reference 2.4.4-5 Textbook:
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationHybrid Static-Dynamic Analysis for Statically Bounded Region Serializability
Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming
More informationCHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365
More informationName, Scope, and Binding. Outline [1]
Name, Scope, and Binding In Text: Chapter 3 Outline [1] Variable Binding Storage bindings and lifetime Type bindings Type Checking Scope Lifetime vs. Scope Referencing Environments N. Meng, S. Arthur 2
More informationJOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl
JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview
More information15-740/ Computer Architecture
15-740/18-740 Computer Architecture Lecture 19: Caching II Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/31/2011 Announcements Milestone II Due November 4, Friday Please talk with us if you
More informationThe C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf
The C4 Collector Or: the Application memory wall will remain until compaction is solved Gil Tene Balaji Iyengar Michael Wolf High Level Agenda 1. The Application Memory Wall 2. Generational collection
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationArea-Efficient Error Protection for Caches
Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationCooperative Cache Scrubbing
Cooperative Cache Scrubbing Jennifer B. Sartor Ghent University Belgium Wim Heirman Intel ExaScience Lab Belgium Stephen M. Blackburn Australian National University Australia Lieven Eeckhout Ghent University
More information15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 12: Advanced Caching Prof. Onur Mutlu Carnegie Mellon University Announcements Chuck Thacker (Microsoft Research) Seminar Tomorrow RARE: Rethinking Architectural
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationRun-Time Environments/Garbage Collection
Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs
More informationThe New C Standard (Excerpted material)
The New C Standard (Excerpted material) An Economic and Cultural Commentary Derek M. Jones derek@knosof.co.uk Copyright 2002-2008 Derek M. Jones. All rights reserved. 39 3.2 3.2 additive operators pointer
More informationExam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence
Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationEECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table
Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.
More informationConcurrent Garbage Collection
Concurrent Garbage Collection Deepak Sreedhar JVM engineer, Azul Systems Java User Group Bangalore 1 @azulsystems azulsystems.com About me: Deepak Sreedhar JVM student at Azul Systems Currently working
More informationChapter 6 Objectives
Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationSoftware-Controlled Multithreading Using Informing Memory Operations
Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University
More information15-740/ Computer Architecture
15-740/18-740 Computer Architecture Lecture 16: Runahead and OoO Wrap-Up Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/17/2011 Review Set 9 Due this Wednesday (October 19) Wilkes, Slave Memories
More informationSECOMP Efficient Formally Secure Compilers to a Tagged Architecture. Cătălin Hrițcu INRIA Paris
SECOMP Efficient Formally Secure Compilers to a Tagged Architecture Cătălin Hrițcu INRIA Paris 1 SECOMP Efficient Formally Secure Compilers to a Tagged Architecture Cătălin Hrițcu INRIA Paris 5 year vision
More informationIntroduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem
Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCS527 Software Security
Security Policies Purdue University, Spring 2018 Security Policies A policy is a deliberate system of principles to guide decisions and achieve rational outcomes. A policy is a statement of intent, and
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationThe University of Texas at Austin
EE382N: Principles in Computer Architecture Parallelism and Locality Fall 2009 Lecture 24 Stream Processors Wrapup + Sony (/Toshiba/IBM) Cell Broadband Engine Mattan Erez The University of Texas at Austin
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationMark-Sweep and Mark-Compact GC
Mark-Sweep and Mark-Compact GC Richard Jones Anthony Hoskins Eliot Moss Presented by Pavel Brodsky 04/11/14 Our topics today Two basic garbage collection paradigms: Mark-Sweep GC Mark-Compact GC Definitions
More informationBoosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors
Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium
More information