Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!
|
|
- Ezra Hawkins
- 5 years ago
- Views:
Transcription
1 Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1
2 High-Bandwidth In-Package In-package DRAM has - 5X higher bandwidth than off-package DRAM Core! In-Package! Off-Package! - Similar latency as offpackage DRAM - Limited capacity (up to 16 GB) SRAM! Hierarchy! 16 GB 384 GB > 400 GB/s In-package DRAM can be Memory Controller! 90 GB/s used as a cache On-Chip In-Package 2 * Numbers from Intel Knights Landing
3 Bandwidth Inefficiency in Existing DRAM Designs! Drawback 1: Metadata traffic (e.g., tags, LRU bits, frequency 6 Hit Metadata counters, etc.) Bytes per Instruction Unison Alloy BEAR Tagless DRAM Traffic Breakdown 3
4 Bandwidth Inefficiency in Existing DRAM Designs! Drawback 1: Metadata traffic (e.g., tags, LRU bits, frequency 6 Hit Metadata Replacement Coarse-Granularity counters, etc.) Drawback 2: replacement traffic - Especially for coarse-granularity (e.g., page-granularity) DRAM Bytes per Instruction Fine-Granularity cache designs 0 Unison Alloy BEAR Tagless DRAM Traffic Breakdown 4
5 Banshee Improves DRAM Bandwidth Efficiency! Idea 1: Page-table-based contents tracking with efficient 6 Hit Metadata Replacement translation lookaside buffer (TLB) coherence - Track contents of DRAM cache using page tables and TLBs - Lightweight TLB coherence mechanism Bytes per Instruction This paper 0 Unison Alloy BEAR Tagless Banshee DRAM Traffic Breakdown 5
6 Banshee Improves DRAM Bandwidth Efficiency! Idea 1: Page-table-based contents tracking with efficient 6 Hit Metadata Replacement translation lookaside buffer (TLB) coherence Idea 2: Bandwidth-aware frequency-based replacement (FBR) policy - Replacement traffic reduction: Limit Bytes per Instruction This paper the rate of DRAM cache replacement 0 - Metadata traffic reduction: Access Unison Alloy BEAR Tagless Banshee metadata for a sampled fraction of memory accesses DRAM Traffic Breakdown 6
7 Page-Table-Based DRAM Contents Tracking! Track DRAM cache contents using the the virtual memory mechanism Advantage - Zero overhead for tag storage and lookup Software Page Table Entry PPN Page Table Hardware Core! TLB Entry VPN PPN Translation Lookaside Buffer (TLB) SRAM Hierarchy! Disadvantage - TLB coherence overhead - replacement overhead Memory Controller In-Package! Off-Package! 7 * assuming 4-way set associativity DRAM cache
8 Idea 1: Efficient TLB Coherence! Software Hardware Track DRAM cache contents using page tables and TLBs Page Table Entry PPN Mapping Page Table Core! TLB Entry VPN PPN d! (1 bit)! Translation Lookaside Buffer (TLB) SRAM Hierarchy! Way! (2 bits)! Mapping Memory Controller In-Package! Off-Package! 8 * Assuming 4-way set-associative DRAM cache
9 Idea 1: Efficient TLB Coherence! Software Hardware Track DRAM cache contents using page tables and TLBs Maintain latest mapping for recently remapped pages in the Tag Buffer Page Table Entry PPN Mapping Page Table Core! TLB Entry VPN PPN d! (1 bit)! Translation Lookaside Buffer (TLB) SRAM Hierarchy! Way! (2 bits)! Mapping Tag Buffer PPN! V! Mapping! Memory Controller In-Package! Off-Package! 9 * Assuming 4-way set-associative DRAM cache
10 Idea 1: Efficient TLB Coherence! Software Hardware Track DRAM cache contents using page tables and TLBs Maintain latest mapping for recently remapped pages in the Tag Buffer Enforce TLB coherence lazily when the Tag Buffer is full to amortize the cost Page Table Entry PPN Mapping Page Table Reverse Mapping (Find all PTEs that map to a given PPN)! Core! TLB Entry VPN PPN d! (1 bit)! Translation Lookaside Buffer (TLB) SRAM Hierarchy! Tag Buffer Way! (2 bits)! Mapping PPN! V! Mapping! Memory Controller In-Package! Off-Package! 10 * Assuming 4-way set-associative DRAM cache
11 Idea 2: Bandwidth-Aware Replacement! DRAM cache replacement incurs significant DRAM traffic - replacement traffic - Metadata traffic (e.g., frequency counter lookups/updates) Memory Controller! Misses (64 B) Hits (64 B) In-! Package! Off- Package! 11
12 Idea 2: Bandwidth-Aware Replacement! DRAM cache replacement incurs significant DRAM traffic - replacement traffic - Metadata traffic (e.g., frequency counter lookups/updates) Memory Controller! Misses (64 B) Hits (64 B) In-! Package! Off- Package! Replacements (4096 B) 12
13 Idea 2: Bandwidth-Aware Replacement! DRAM cache replacement incurs significant DRAM traffic - replacement traffic - Metadata traffic (e.g., frequency counter lookups/updates) Memory Controller! Misses (64 B) Hits (64 B) In-! Package! Off- Package! Frequency Counter Accesses Replacements (4096 B) 13
14 Idea 2: Bandwidth-Aware Replacement! DRAM cache replacement incurs significant DRAM traffic Limit cache replacement rate - Replace only when the incoming page s frequency counter is greater than the victim pages s counter by a threshold Memory Controller! Hits (64 B) In-! Package! Frequency Counter Accesses Misses (64 B) Limited Replacements Off- Package! 14
15 Idea 2: Bandwidth-Aware Replacement! DRAM cache replacement incurs significant DRAM traffic Limit cache replacement rate Misses (64 B) Reduce metadata traffic - Access frequency counters for a randomly sampled fraction Memory Controller! Hits (64 B) In-! Package! Off- Package! of memory accesses Sampled Frequency Counter Accesses Limited Replacements 15
16 Banshee Extensions! Supporting large pages (e.g., 2MB) - A large page is cached either in its entirety or not at all Supporting multi-socket processors - Coherent DRAM caches - Partitioned DRAM caches 16
17 Performance Evaluation! ZSim simulator [1] 16 cores (4-issue, out-of-order, 2.7 GHz) In-package DRAM (1 GB, 84 GB/s) Off-package DRAM (21 GB/s) Tag Buffer - One Tag Buffer per memory controller (MC) entries, 5 KB in size [1] Sanchez, Daniel, and Christos Kozyrakis. "ZSim: fast and accurate microarchitectural simulation of thousand-core systems." ISCA,
18 Speedup (Normalized to off-package DRAM only)! Perfect In-package DRAM Normalized Speedup % within perfect DRAM cache 15% improvement Unison Alloy BEAR TaglessBanshee Only Banshee improves performance by 15% on average over the best-previous (i.e., BEAR) latency-optimized DRAM cache design 18
19 DRAM Bandwidth Efficiency! Bytes per Instruction Hit Metadata Replacement Unison Alloy 36% in-package DRAM traffic reduction BEAR Tagless Banshee In-Package DRAM Traffic Breakdown Banshee reduces 36% in-package DRAM traffic over the best-previous design 19
20 DRAM Bandwidth Efficiency! Hit Metadata Replacement Bytes per Instruction % in-package DRAM traffic reduction Bytes per Instruction % off-package DRAM traffic reduction 0 0 Unison Alloy BEAR Tagless Banshee In-Package DRAM Traffic Unison Alloy BEAR TaglessBanshee Off-Package DRAM Traffic Breakdown Banshee reduces 36% in-package DRAM traffic over the best-previous design Banshee reduces 3% off-package DRAM traffic over the best-previous design 20
21 Effect of Replacement Traffic Reduction! Normalized Speedup Limiting Replacement Rate Sampling Frequency Counters 0 Banshee LRU Banshee FBR (No Sample) Banshee Limiting replacement rate and sampling frequency counters are both important for bandwidth efficiency in Banshee 21
22 More Analysis in the Paper! Performance with large (2 MB) pages Balancing in- and off-package DRAM bandwidth Overhead for page table update and TLB coherence Storing tags in SRAM Sweep DRAM cache latency and bandwidth Sampling coefficient DRAM cache associativity 22
23 Summary! Need to optimize for bandwidth efficiency to fully exploit the performance of in-package DRAM Idea 1: Improving page-table-based DRAM cache designs with efficient Translation Lookaside Buffer (TLB) coherence Idea 2: Bandwidth-aware frequency-based replacement (FBR) policy Banshee improves performance by 15% and reduces in-package DRAM traffic by 36% over the best-previous latency-optimized DRAM cache design 23
24 Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 24
25 Backup Slides! 25
26 Summary of Operational Characteristics of Different State-of-the-Art DRAM Designs! 26
27 Tag Buffer Organization! 27
28 DRAM Layout! In-Package Data Row Layout 4KB Page! 4KB Page! 4KB Page! 4KB Page! 4KB Page! 4KB Page! 4-way set-associative Metadata Row Layout Row Buffer! Metadata for One Set 4 cached pages + 5 candidate pages 28 Metadata for One Page Tag Freq Cntr V D
29 Speedup Normalized to No! 29
30 In-Package DRAM Traffic Breakdown! 30
31 Off-Package DRAM Traffic! 31
32 Sensitivity to Page Table Update Cost! 32
33 Sensitivity to DRAM Latency and Bandwidth! 33
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation Xiangyao Yu 1 Christopher J. Hughes 2 Nadathur Satish 2 Onur Mutlu 3 Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich ABSTRACT
More informationarxiv: v1 [cs.ar] 10 Apr 2017
Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, Srinivas Devadas CSAIL, MIT, Intel Labs, ETH Zurich {yxy, devadas}@mit.edu,
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationComputer Architecture. Lecture 8: Virtual Memory
Computer Architecture Lecture 8: Virtual Memory Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Ideal Memory Zero access time
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationVirtual Memory. Motivation:
Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationVirtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.
CSE420 Virtual Memory Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) Virtual Memory Virtual memory first used to relive programmers from the burden
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More informationVirtual Memory. CS 3410 Computer System Organization & Programming
Virtual Memory CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Where are we now and
More informationLECTURE 12. Virtual Memory
LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished
More information1. Creates the illusion of an address space much larger than the physical memory
Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationVirtual Memory. CS 351: Systems Programming Michael Saelee
Virtual Memory CS 351: Systems Programming Michael Saelee registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationChapter 10: Virtual Memory. Lesson 05: Translation Lookaside Buffers
Chapter 10: Virtual Memory Lesson 05: Translation Lookaside Buffers Objective Learn that a page table entry access increases the latency for a memory reference Understand that how use of translationlookaside-buffers
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with
More informationCache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory
More informationComputer Systems. Virtual Memory. Han, Hwansoo
Computer Systems Virtual Memory Han, Hwansoo A System Using Physical Addressing CPU Physical address (PA) 4 Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in simple systems like embedded microcontrollers
More informationVirtual Memory. Motivations for VM Address translation Accelerating translation with TLBs
Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated
More informationChapter 8 Virtual Memory
Chapter 8 Virtual Memory Digital Design and Computer Architecture: ARM Edi*on Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 Chapter 8 ::
More informationCMU Introduction to Computer Architecture, Spring 2014 HW 5: Virtual Memory, Caching and Main Memory
CMU 18-447 Introduction to Computer Architecture, Spring 2014 HW 5: Virtual Memory, Caching and Main Memory Instructor: Prof. Onur Mutlu TAs: Rachata Ausavarungnirun, Varun Kohli, Xiao Bo Zhao, Paraj Tyle
More informationADDRESS TRANSLATION AND TLB
ADDRESS TRANSLATION AND TLB Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 submission deadline: Nov.
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 16: Memory Management and Paging Announcement Homework 2 is out To be posted on ilearn today Due in a week (the end of Feb 19 th ). 2 Recap: Fixed
More informationADDRESS TRANSLATION AND TLB
ADDRESS TRANSLATION AND TLB Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 submission deadline: Mar.
More informationThis Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory
This Unit: Virtual Application OS Compiler Firmware I/O Digital Circuits Gates & Transistors hierarchy review DRAM technology A few more transistors Organization: two level addressing Building a memory
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationVirtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15
Memory Addressing Motivation: why not direct physical memory access? Address translation with pages Optimizing translation: translation lookaside buffer Extra benefits: sharing and protection Memory as
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationA Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than
More informationIntro to Computer Architecture, Spring 2012 Midterm Exam II. Name:
18-447 Intro to Computer Architecture, Spring 2012 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Chris Fallin, Lavanya Subramanian, Abeer Agrawal Date: April 11, 2012 Name: Instructions:
More informationSarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>
Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationJIGSAW: SCALABLE SOFTWARE-DEFINED CACHES
JIGSAW: SCALABLE SOFTWARE-DEFINED CACHES NATHAN BECKMANN AND DANIEL SANCHEZ MIT CSAIL PACT 13 - EDINBURGH, SCOTLAND SEP 11, 2013 Summary NUCA is giving us more capacity, but further away 40 Applications
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014
More informationVirtual Memory Oct. 29, 2002
5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical
More informationComputer Architecture. Memory Hierarchy. Lynn Choi Korea University
Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part Hierarchy - I Israel Koren ECE568/Koren Part.. 3 4 5 6 7 8 9 A B C D E F 6 blocks 3 4 block
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationVIRTUAL MEMORY II. Jo, Heeseung
VIRTUAL MEMORY II Jo, Heeseung TODAY'S TOPICS How to reduce the size of page tables? How to reduce the time for address translation? 2 PAGE TABLES Space overhead of page tables The size of the page table
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationSWAP: EFFECTIVE FINE-GRAIN MANAGEMENT
: EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationA Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space
Process Abstraction, Part : Private Motivation: why not direct physical memory access? Address translation with pages Optimizing translation: translation lookaside buffer Extra benefits: sharing and protection
More informationExploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance Rachata Ausavarungnirun Saugata Ghose, Onur Kayiran, Gabriel H. Loh Chita Das, Mahmut Kandemir, Onur Mutlu Overview of This Talk Problem:
More informationCarnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe
Virtual Memory: Concepts 5 23 / 8 23: Introduction to Computer Systems 6 th Lecture, Mar. 2, 22 Instructors: Todd C. Mowry & Anthony Rowe Today Address spaces VM as a tool lfor caching VM as a tool for
More informationAddress Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationLearning to Play Well With Others
Virtual Memory 1 Learning to Play Well With Others (Physical) Memory 0x10000 (64KB) Stack Heap 0x00000 Learning to Play Well With Others malloc(0x20000) (Physical) Memory 0x10000 (64KB) Stack Heap 0x00000
More informationLecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )
Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationVirtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]
Virtual Memory CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Click any letter let me know you re here today. Instead of a DJ Clicker Question today,
More informationCS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)
CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) Important from last time We re trying to build efficient virtual address spaces Why?? Virtual / physical translation is done by HW and
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationGather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, AmiraliBoroumand, Onur Mutlu, Phillip B. Gibbons, Michael A.
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationCSE 351. Virtual Memory
CSE 351 Virtual Memory Virtual Memory Very powerful layer of indirection on top of physical memory addressing We never actually use physical addresses when writing programs Every address, pointer, etc
More informationSpring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand
Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled
More informationAlexandria University
Alexandria University Faculty of Engineering Division of Communications & Electronics CC322 Computer Architecture Sheet 3 1. A cache has the following parameters: b, block size given in numbers of words;
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationCarnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
Carnegie Mellon Virtual Memory: Concepts 5-23: Introduction to Computer Systems 7 th Lecture, October 24, 27 Instructor: Randy Bryant 2 Hmmm, How Does This Work?! Process Process 2 Process n Solution:
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationA Framework for Memory Hierarchies
Associativity schemes Scheme Number of sets Blocks per set Direct mapped Number of blocks in cache 1 Set associative Blocks in cache / Associativity Associativity (2-8) Fully associative 1 Number Blocks
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationLecture 21: Virtual Memory. Spring 2018 Jason Tang
Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output
More informationCache Policies. Philipp Koehn. 6 April 2018
Cache Policies Philipp Koehn 6 April 2018 Memory Tradeoff 1 Fastest memory is on same chip as CPU... but it is not very big (say, 32 KB in L1 cache) Slowest memory is DRAM on different chips... but can
More informationvirtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk
5-23 March 23, 2 Topics Motivations for VM Address translation Accelerating address translation with TLBs Pentium II/III system Motivation #: DRAM a Cache for The full address space is quite large: 32-bit
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationIntro to Computer Architecture, Spring 2012 Midterm Exam II
18-447 Intro to Computer Architecture, Spring 2012 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Chris Fallin, Lavanya Subramanian, Abeer Agrawal Date: April 11, 2012 Name: SOLUTIONS Problem
More informationVirtual Memory Nov 9, 2009"
Virtual Memory Nov 9, 2009" Administrivia" 2! 3! Motivations for Virtual Memory" Motivation #1: DRAM a Cache for Disk" SRAM" DRAM" Disk" 4! Levels in Memory Hierarchy" cache! virtual memory! CPU" regs"
More informationvirtual memory Page 1 CSE 361S Disk Disk
CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple
More informationLecture 24: Memory, VM, Multiproc
Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can
More informationReadings and References. Virtual Memory. Virtual Memory. Virtual Memory VPN. Reading. CSE Computer Systems December 5, 2001.
Readings and References Virtual Memory Reading Chapter through.., Operating System Concepts, Silberschatz, Galvin, and Gagne CSE - Computer Systems December, Other References Chapter, Inside Microsoft
More informationCS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-6-) Timothy Roscoe Herbstsemester 26 AS 26 Virtual Memory 8: Virtual Memory Computer Architecture
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationSEESAW: Set Enhanced Superpage Aware caching
SEESAW: Set Enhanced Superpage Aware caching http://synergy.ece.gatech.edu/ Set Associativity Mayank Parasar, Abhishek Bhattacharjee Ω, Tushar Krishna School of Electrical and Computer Engineering Georgia
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More information