ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality
|
|
- Lucas Cobb
- 5 years ago
- Views:
Transcription
1 ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu
2 Executive Summary Goal: Reduce average DRAM access latency with no modification to the existing DRAM chips Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance ( % on average for 8-core) Lower DRAM energy (7.9% on average) 2
3 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 3
4 DRAM Stores Data as Charge DRAM Cell Three steps of charge movement MemCtrl 1. Sensing 2. Restore 3. Precharge CPU Sense-Amplifier 4
5 charge DRAM Charge over Time Cell Ready to Access Cell Ready to Precharge Ready to Access Charge Level Data 1 Sense Amplifier Sense-Amplifier ACT Sensing trcd tras Restore R/W PRE Data 0 Precharge time 5
6 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 6
7 charge Accessing Highly-charged Rows Ready to Access Ready to Precharge Cell Data 1 Sense-Amplifier Data 0 ACT Sensing Restore Precharge trcd R/W tras PRE time 7
8 Observation 1 A highly-charged DRAM row can be accessed with low latency trcd: 44% tras: 37% How does a row become highly-charged? 8
9 charge How Does a Row Become Highly-Charged? DRAM cells lose charge over time Two ways of restoring a row s charge: Refresh Operation Access Refresh Access Refresh time 9
10 Observation 2 A row s charge is restored when the row is accessed How likely is a recently-accessed row to be accessed again? 10
11 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 11
12 Fraction of Accesses Row Level Temporal Locality (RLTL) A recently-accessed DRAM row is likely to be accessed again. t-rltl: Fraction of rows that are accessed within time t after their previous access 100% 100% 80% 60% 80% 40% 60% 20% 40% 20% 0% 86% 97% 8ms RLTL for single-core eight-core workloads 12
13 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 13
14 Summary of the Observations 1. A highly-charged DRAM row can be accessed with low latency 2. A row s charge is restored when the row is accessed 3. A recently-accessed DRAM row is likely to be accessed again: Row Level Temporal Locality (RLTL) 14
15 Key Idea Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again 15
16 ChargeCache Overview Memory Controller ChargeCache A D DRAM :A :B :C :D :E :F Requests: A D A ChargeCache Miss: Hit: Use Default Lower Timings 16
17 Area and Power Overhead Modeled with CACTI Area ~5KB for 128-entry ChargeCache 0.24% of a 4MB Last Level Cache (LLC) area Power Consumption 0.15 mw on average (static + dynamic) 0.23% of the 4MB LLC power consumption 17
18 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 18
19 Methodology Simulator DRAM Simulator (Ramulator [Kim+, CAL 15]) Workloads 22 single-core workloads SPEC CPU2006, TPC, STREAM 20 multi-programmed 8-core workloads By randomly choosing from single-core workloads Execute at least 1 billion representative instructions per core (Pinpoints) System Parameters 1/8 core system with 4MB LLC Default trcd/tras of 11/28 cycles 19
20 Mechanisms Evaluated Non-Uniform Access Time Memory Controller (NUAT) [Shin+, HPCA 14] Key idea: Access only recently-refreshed rows with lower timing parameters Recently-refreshed rows can be accessed faster Only a small fraction (10-12%) of accesses go to recently-refreshed rows ChargeCache Recently-accessed rows can be accessed faster A large fraction (86-97%) of accesses go to recentlyaccessed rows (RLTL) 128 entries per core, On hit: trcd-7, tras-20 cycles Upper Bound: Low Latency DRAM Works as ChargeCache with 100% Hit Ratio On all DRAM accesses: trcd-7, tras-20 cycles 20
21 Speedup Single-core Performance NUAT ChargeCache + NUAT ChargeCache LL-DRAM (Upper bound) 16% 12% 8% 4% 0% ChargeCache improves single-core performance 21
22 Speedup Eight-core Performance NUAT 2.5% ChargeCache 9% ChargeCache + NUAT LL-DRAM (Upperbound) 13% 16% 12% 8% 4% 0% ChargeCache significantly improves multi-core performance 22
23 DRAM Energy Reduction DRAM Energy Savings 15% Average Maximum 10% 5% 0% Single-core Eight-core ChargeCache reduces DRAM energy 23
24 Other Results In The Paper Detailed analysis of the Row Level Temporal Locality phenomenon ChargeCache hit-rate analysis Sensitivity studies osensitivity to t in t-rltl ochargecache capacity 24
25 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 25
26 Conclusion ChargeCache reduces average DRAM access latency at low cost Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance ( % on average for 8-core) Lower DRAM energy (7.9% on average) Source code will be available in May 26
27 ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu
28 Backup Slides 28
29 Detailed Design Highly-charged Row Address Cache (HCRAC) 1 PRE Insert Row Address 3 Invalidation Mechanism 2 ACT Lookup the Address 29
30 Fraction of Accesses tpch6 apache20 GemsFDTD mcf sphinx3 tpch2 astar hmmer milc bwaves lbm omnetpp tonto bzip2 lislie3d sjeng tpcc64 cactusadm libquantum spolex tpch17 STREAMcopy AVERAGE RLTL Distribution 0.125ms - RLTL 0.25ms - RLTL 0.5ms - RLTL 1ms - RLTL 32ms - RLTL
31 Sensitivity on Capacity 31
32 Hit-rate Analysis 32
33 Sensitivity on t-rltl 33
SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies
SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk
More informationDesign-Induced Latency Variation in Modern DRAM Chips:
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms Donghyuk Lee 1,2 Samira Khan 3 Lavanya Subramanian 2 Saugata Ghose 2 Rachata Ausavarungnirun
More informationLinearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationPractical Data Compression for Modern Memory Hierarchies
Practical Data Compression for Modern Memory Hierarchies Thesis Oral Gennady Pekhimenko Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Kayvon Fatahalian David Wood, University of Wisconsin-Madison
More informationUnderstanding Reduced-Voltage Operation in Modern DRAM Devices
Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationDecoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation
More informationExploi'ng Compressed Block Size as an Indicator of Future Reuse
Exploi'ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Execu've Summary In a compressed
More informationWhat Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study Saugata Ghose, A. Giray Yağlıkçı, Raghav Gupta, Donghyuk Lee, Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationTiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationThe Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory
The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*
More informationSpatial Memory Streaming (with rotated patterns)
Spatial Memory Streaming (with rotated patterns) Michael Ferdman, Stephen Somogyi, and Babak Falsafi Computer Architecture Lab at 2006 Stephen Somogyi The Memory Wall Memory latency 100 s clock cycles;
More informationLow-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM
Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data
More informationLinearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisors: Todd C. Mowry and Onur Mutlu Computer Science Department, Carnegie Mellon
More informationDRAM Tutorial Lecture. Vivek Seshadri
DRAM Tutorial 18-447 Lecture Vivek Seshadri DRAM Module and Chip 2 Goals Cost Latency Bandwidth Parallelism Power Energy 3 DRAM Chip Bank I/O 4 Sense Amplifier top enable Inverter bottom 5 Sense Amplifier
More informationFlexible Cache Error Protection using an ECC FIFO
Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead
More informationImproving Cache Performance using Victim Tag Stores
Improving Cache Performance using Victim Tag Stores SAFARI Technical Report No. 2011-009 Vivek Seshadri, Onur Mutlu, Todd Mowry, Michael A Kozuch {vseshadr,tcm}@cs.cmu.edu, onur@cmu.edu, michael.a.kozuch@intel.com
More informationComputer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Main Memory Lectures These slides are from the Scalable Memory Systems course taught at ACACES 2013 (July 15-19,
More informationMemory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez
Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationMinimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era
Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationA Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements
More informationImproving DRAM Performance by Parallelizing Refreshes with Accesses
Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu Executive Summary DRAM refresh interferes
More informationThesis Defense Lavanya Subramanian
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)
More informationAdaptive-Latency DRAM (AL-DRAM)
This is a summary of the original paper, entitled Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case which appears in HPCA 2015 [64]. Adaptive-Latency DRAM (AL-DRAM) Donghyuk Lee Yoongu
More informationPerceptron Learning for Reuse Prediction
Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level
More informationDynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories
SAFARI Technical Report No. 2-5 (December 6, 2) : A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon hanbinyoon@cmu.edu Justin Meza meza@cmu.edu
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationCS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda
CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda biswap@cse.iitk.ac.in https://www.cse.iitk.ac.in/users/biswap/cs698y.html Row decoder Accessing a Row Access Address
More informationNightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems
NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science
More informationAddressing End-to-End Memory Access Latency in NoC-Based Multicores
Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu
More informationEvalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve
Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationOpenPrefetch. (in-progress)
OpenPrefetch Let There Be Industry-Competitive Prefetching in RISC-V Processors (in-progress) Bowen Huang, Zihao Yu, Zhigang Liu, Chuanqi Zhang, Sa Wang, Yungang Bao Institute of Computing Technology(ICT),
More informationPrefetch-Aware DRAM Controllers
Prefetch-Aware DRAM Controllers Chang Joo Lee Onur Mutlu Veynu Narasiman Yale N. Patt High Performance Systems Group Department of Electrical and Computer Engineering The University of Texas at Austin
More informationDASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture
DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical
More informationImproving Writeback Efficiency with Decoupled Last-Write Prediction
Improving Writeback Efficiency with Decoupled Last-Write Prediction Zhe Wang Samira M. Khan Daniel A. Jiménez The University of Texas at San Antonio {zhew,skhan,dj}@cs.utsa.edu Abstract In modern DDRx
More informationExploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency
Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency Hasan Hassan 1,2,3 Gennady Pekhimenko 4,2 Nandita Vijaykumar 2 Vivek Seshadri 5,2 Donghyuk Lee 6,2 Oguz Ergin 3 Onur Mutlu
More informationarxiv: v1 [cs.ar] 23 Sep 2016
TOBB UNIVERSITY OF ECONOMICS AND TECHNOLOGY INSTITUTE OF NATURAL AND APPLIED SCIENCES arxiv:1609.07234v1 [cs.ar] 23 Sep 2016 REDUCING DRAM ACCESS LATENCY BY EXPLOITING DRAM LEAKAGE CHARACTERISTICS AND
More informationA Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarangnirun,
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationVirtualized and Flexible ECC for Main Memory
Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC
More informationAB-Aware: Application Behavior Aware Management of Shared Last Level Caches
AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering
More informationRethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization
Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,
More informationData Prefetching by Exploiting Global and Local Access Patterns
Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical
More informationPredicting Performance Impact of DVFS for Realistic Memory Systems
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt The University of Texas at Austin Nvidia Corporation {rustam,patt}@hps.utexas.edu ebrahimi@hps.utexas.edu
More informationReducing Memory Access Latency with Asymmetric DRAM Bank Organizations
Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations Young Hoon Son Seongil O Yuhwan Ro JaeW.Lee Jung Ho Ahn Seoul National University Sungkyunkwan University Seoul, Korea Suwon, Korea
More informationDesign-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms DONGHYUK LEE, NVIDIA and Carnegie Mellon University SAMIRA KHAN, University of Virginia
More informationA Hybrid Adaptive Feedback Based Prefetcher
A Feedback Based Prefetcher Santhosh Verma, David M. Koppelman and Lu Peng Department of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 78 sverma@lsu.edu, koppel@ece.lsu.edu,
More informationOAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,
More informationIN modern systems, the high latency of accessing largecapacity
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 10, OCTOBER 2016 3071 BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling Lavanya Subramanian, Donghyuk
More informationA Comprehensive Review of the Challenges and Opportunities Confronting Cache Memory System Performance
A Comprehensive Review of the Challenges and Opportunities Confronting Cache Memory System Performance Richard A. Kramer, Mathias Elmlinger, Abhishek Ramamurthy, Siva Pranav Kumar Timmireddy kramerri@oregonstate.edu,
More informationAdaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case
Carnegie Mellon University Research Showcase @ CMU Department of Electrical and Computer Engineering Carnegie Institute of Technology 2-2015 Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case
More informationarxiv: v2 [cs.ar] 15 May 2017
Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips Donghyuk Lee Samira Khan ℵ Lavanya Subramanian Saugata Ghose Rachata Ausavarungnirun Gennady Pekhimenko Vivek Seshadri
More informationExploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies
Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard
More informationTransparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh
Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,
More informationSandbox Based Optimal Offset Estimation [DPC2]
Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset
More informationBanshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth
More informationComputer Architecture: (Shared) Cache Management
Computer Architecture: (Shared) Cache Management Prof. Onur Mutlu Carnegie Mellon University (small edits and reorg by Seth Goldstein) Readings Required Qureshi et al., A Case for MLP-Aware Cache Replacement,
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationPrefetch-Aware DRAM Controllers
Prefetch-Aware DRAM Controllers Chang Joo Lee Onur Mutlu Veynu Narasiman Yale N. Patt Department of Electrical and Computer Engineering The University of Texas at Austin {cjlee, narasima, patt}@ece.utexas.edu
More informationA Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than
More informationEmerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationNear-Threshold Computing: How Close Should We Get?
Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on
More informationarxiv: v1 [cs.ar] 1 Feb 2016
SAFARI Technical Report No. - Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism Kevin K. Chang, Gabriel H. Loh, Mithuna Thottethodi, Yasuko Eckert, Mike
More informationMultiperspective Reuse Prediction
ABSTRACT Daniel A. Jiménez Texas A&M University djimenezacm.org The disparity between last-level cache and memory latencies motivates the search for e cient cache management policies. Recent work in predicting
More informationExploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance Rachata Ausavarungnirun Saugata Ghose, Onur Kayiran, Gabriel H. Loh Chita Das, Mahmut Kandemir, Onur Mutlu Overview of This Talk Problem:
More informationInsertion and Promotion for Tree-Based PseudoLRU Last-Level Caches
Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Daniel A. Jiménez Department of Computer Science and Engineering Texas A&M University ABSTRACT Last-level caches mitigate the high latency
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationFootprint-based Locality Analysis
Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.
More informationPerformance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing
More information15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011
15-740/18-740 Computer Architecture Lecture 5: Project Example Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011 Reminder: Project Proposals Project proposals due NOON on Monday 9/26 Two to three pages consisang
More informationHybrid Cache Architecture (HCA) with Disparate Memory Technologies
Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:
More informationDEMM: a Dynamic Energy-saving mechanism for Multicore Memories
DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University
More informationMicro-sector Cache: Improving Space Utilization in Sectored DRAM Caches
Micro-sector Cache: Improving Space Utilization in Sectored DRAM Caches Mainak Chaudhuri Mukesh Agrawal Jayesh Gaur Sreenivas Subramoney Indian Institute of Technology, Kanpur 286, INDIA Intel Architecture
More informationSolar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines
Solar-: Reducing Access Latency by Exploiting the Variation in Local Bitlines Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu Carnegie Mellon University ETH Zürich latency is a major bottleneck for
More informationIntroduction to memory system :from device to system
Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationAdaptive Placement and Migration Policy for an STT-RAM-Based Hybrid Cache
Adaptive Placement and Migration Policy for an STT-RAM-Based Hybrid Cache Zhe Wang Daniel A. Jiménez Cong Xu Guangyu Sun Yuan Xie Texas A&M University Pennsylvania State University Peking University AMD
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationBias Scheduling in Heterogeneous Multi-core Architectures
Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that
More information562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016
562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Member,
More informationEfficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero
Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationUnderstanding and Improving Latency of DRAM-Based Memory Systems
Understanding and Improving Latency of DRAM-Based Memory ystems Thesis Oral Kevin Chang Committee: rof. Onur Mutlu (Chair) rof. James Hoe rof. Kayvon Fatahalian rof. tephen Keckler (NVIDIA, UT Austin)
More informationGather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, AmiraliBoroumand, Onur Mutlu, Phillip B. Gibbons, Michael A.
More informationVirtualized ECC: Flexible Reliability in Memory Systems
Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing
More informationSILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization
2017 IEEE International Symposium on High Performance Computer Architecture SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization Jee Ho Ryoo The University of Texas at Austin Austin, TX
More informationPerformance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor
Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University
More informationDRAM Disturbance Errors
http://www.ddrdetective.com/ http://users.ece.cmu.edu/~yoonguk/ Flipping Bits in Memory Without Accessing Them An Experimental Study of DRAM Disturbance Errors Yoongu Kim Ross Daly, Jeremie Kim, Chris
More informationDecoupled Dynamic Cache Segmentation
Appears in Proceedings of the 8th International Symposium on High Performance Computer Architecture (HPCA-8), February, 202. Decoupled Dynamic Cache Segmentation Samira M. Khan, Zhe Wang and Daniel A.
More informationReMAP: Reuse and Memory Access Cost Aware Eviction Policy for Last Level Cache Management
This is a pre-print, author's version of the paper to appear in the 32nd IEEE International Conference on Computer Design. ReMAP: Reuse and Memory Access Cost Aware Eviction Policy for Last Level Cache
More informationA Comprehensive Review of the Challenges and Opportunities Confronting Cache Memory System Performance
A Comprehensive Review of the Challenges and Opportunities Confronting Cache Memory System Performance Photograph of Intel Xeon processor 7500 series die showing cache memories [1] R. Kramer, M. Elmlinger,
More information