WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

Size: px
Start display at page:

Download "WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems"

Transcription

1 : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University DATE-208 IEEE/ACM Design, Automation, and Test in Europe March 2 Dresden, Germany

2 Introduction Increasing demand for memory capacity Increasing number of cores on multicore processors Intel Sandy Bridge: 8 cores (6 threads) IBM POWER7: 8 cores (32 threads) Increasing data set sizes Graph, database, scientific workloads Problems with DRAM Scalability limitations Slowed down Below 6nm seems difficult Periodic refresh operations 2

3 Phase Change Memory Promising technology Denser than DRAM (3 2 ) Non-volatile storage wordline PCM storage element Shortcomings Higher access latency (4 2 DRAM) Higher dynamic energy (2 40 DRAM) Limited write endurance bitline especially WRITE operations 3

4 Existing Solutions DRAM + PCM hybrid main memories DRAM as a cache to PCM Modifications to PCM-based main memories Optimization on PCM architecture Reducing the number of writebacks from LLC to PCM CPU Core CPU Core CPU Core CPU Core CPU Core CPU Core Shared LLC Shared LLC MM DRAM Cache Large PCM storage PCM Main Memory 4

5 In this work Summary of Contributions We propose, a novel Writeback-Aware LLC management scheme reduces the number of writebacks from the Last Level Cache (LLC) to a PCM-based main memory consists of Writeback aware set-balancing mechanism Writeback-aware replacement policy 5

6 Outline Introduction Background Motivation Evaluation Results Conclusion 6

7 Impact of reducing write traffic on PCM Lifetime enhancement PCM Lifetime α Background write traffic Performance improvement Writes increase latency of reads by.2 to.8 times [Arjomand_ISCA206] Normalized Lifetime ( ) Reduction in energy consumption Writes consume ~0 of reads Write Reduction (%) 7

8 Related Work WADE [Wang_TACO203] Reduces the number of writebacks to PCM Partitions a set s blocks into frequent writeback and non-frequent writeback Tries to keep the frequent writeback blocks in the set Considers the set s blocks as the only replacement candidates Complex implementation Set-Balancing Cache (SBC) [Rolán_Micro2009] Balances the pressure on cache sets to reduce miss rate It does not reduce writebacks 8

9 Percentage of writebacks Motivation Writebacks are not uniformly distributed among LLC sets % sp Percentage of sets sp from NAS 5.3% 5.3% Percentage of writebacks % gcc Percentage of sets gcc from SPEC2006 Percentage of writebacks streamcluster 27.4% 8.7% Percentage of sets streamcluster from PARSEC A set with few writeback can be used to store the dirty eviction victims of a set with many writeback 5.% 9

10 Set Balancing LLC sets are classified into three categories Writer: frequent writebacks Non-writer: infrequent writebacks, underutilized Neutral: neither writer, nor non-writer Each writer set is partnered with a non-writer set WRITER EVICT LRU DIRTY write PCM Main Memory Partners NON-WRITER INSERT 0

11 To determine set types, two counters are used Writeback counter Saturation counter [Rolán_Micro2009] To measure the degree to which set can hold its working set Counter thresholds Writeback Set Balancing (cont.) Arithmetic Mean wwww wwbb 2 wwbb nn Saturation Counter Access hit miss overall average wwww mm Saturation wwww wwww 2 wwbb mm wwww mm wwww mm+ wwww nn τ sat = K/4, K is the set associativity τ low_wb τ high_wb

12 For a set with writeback count of wb and saturation counter of sat Set is writer if wb ττ hiiiii_wwww Set is non-writer if sat ττ ssssss and wb ττ llllll_wwww < [ ττ llllll_wwww - ττ hhhhhhhh_wwww ] > < > ττ llllll_wwww Set Balancing (cont.) ττ hhhhhhhh_wwww ττ ssssss ττ ssssss Percentage of Sets writer non-writer wb sat wb sat wb sat wb sat wb sat wb sat wb sat wb sat sp ua stream dedup gcc mcf mix mix2 2

13 Replacement Policy Frequent writeback block: frequently reused after being evicted from the cache Frequent writeback blocks are given a second chance upon eviction to stay in cache and be accessed INSERT ACCESS EVICT FV = 0 LRU DIRTY write PCM Main Memory Non-writer or neutral set To avoid performance penalty, the replacement policy is considered for the neutral or non-writer sets 3

14 Design Set(n) ST[0] ST[] 0 ST[0] ST[] 0 writer 0 non-writer 0 X neutral ST: Set Type Set(n) insert into partner of Set(n) victim 0 to PCM move to MRU MRU MRU dirty FV dirty FV 0 0 to PCM to MRU 0 X to PCM LRU finding another victim storage overhead Less than 0.6% of the LLC capacity 4

15 Experimental Setup Simulator GEM5 integrated with NVMAIN Cores 8 cores, out-of-order, 2.0GHz Caches 32KB L (2 cycles), 256KB L2 (2 cycles), Shared LLC 8MB (35 cycles) MOESI directory PCM Main Memory 4GB, 4 channels, rank/channel, 4 banks/rank t_set= 50ns, t_reset= 00ns, t_rcd= 20ns, Cell endurance = writes Four memory controllers One read and write queue, Write drain threshold: high = 80%, low = 50% 5

16 Workloads Experimental Setup Multi-threaded applications NAS and PARSEC benchmarks Multi-programmed workloads SPEC CPU2006 We run the workloads for 2 billion instructions, after two billion for cache warm-up phase We compare with : that uses the LRU replacement policy double-way: a baseline with double the associativity WADE: the proposed scheme by Wang et al. [Wang_TACO203] 6

17 LLC Writeback Compared to baseline, reduced by 26.6% on average For writers sets, reduced by 39.5%, on average For non-writers sets, increased from 0.4% to 3.% For neutral sets, reduced by 28.6% on average non-writer neutral writer.2 Normalized Writebacks % sp ua stream dedup gcc mcf mix mix2 Average 7

18 Writeback Reduction Compared to baseline double-way, by 23.3% on average Compared to WADE, by 6.4% on average.2 LLC Writeback Reduction % 6.4% sp ua stream dedup gcc mcf mix mix2 Average double-way WADE 8

19 MPKI Compared to baseline, reduced by 2.4% on average For writers sets, reduced by 27.8%, on average. For non-writers sets, increased from 2.0% to 6.2% For neutral sets, increased from 57.3% to 59.%.2 non-writer neutral writer 2.4% Normalized MPKI sp ua stream dedup gcc mcf mix mix2 Average 9

20 MPKI Compared to baseline double-way, increased by.0% on average Compared to WADE, reduced by 0.3% on average Normalized MPKI sp ua stream dedup gcc mcf mix mix2 Average double-way WADE 20

21 Main Memory Energy Compared to baseline, reduced by 9.2% on average Compared to baseline double-way, reduced by 6.5% on average Compared to WADE, reduced by.3% on average Main Memory Energy sp ua stream dedup gcc mcf mix mix2 GMEAN double-way WADE 9.2% 6.5%.3% 2

22 IPC Compared to baseline, improved by 6.7% on average Compared to baseline double-way, improved by 4.9% on average Compared to WADE, 3.2% on average.3 Normalized IPC % 4.9% 3.2% 0.9 sp ua stream dedup gcc mcf mix mix2 GMEAN double-way WADE 22

23 PCM Lifetime Compared to baseline scheme, increased by.25 on average Compared to baseline double-way, increased by.2 on average Compared to WADE, increased by.7 on average Lifetime (years) double-way WADE sp ua stream dedup gcc mcf mix mix2 GMEAN 23

24 Conclusion We proposed to reduce the number of writebacks from the LLC to the PCM main memory includes: A set-balancing mechanism Uses the non-write sets as storage of writer sets writebacks. A writeback-aware replacement policy Keeps the frequently reused dirty lines of the sets Results show that can achieve Writeback reduction, by 26.6% on average PCM lifetime enhancement, by.25 on average Main memory energy efficiency, by 9.2% on average 24

25 Thank You! Questions? 25

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different

More information

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks : Defending Against Cache-Based Side Channel Attacks Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Presented by Mengjia

More information

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Flexible Cache Error Protection using an ECC FIFO

Flexible Cache Error Protection using an ECC FIFO Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead

More information

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers 1 ASPLOS 2016 2-6 th April Amro Awad (NC State University) Pratyusa Manadhata (Hewlett Packard Labs) Yan Solihin (NC

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

Lecture-16 (Cache Replacement Policies) CS422-Spring

Lecture-16 (Cache Replacement Policies) CS422-Spring Lecture-16 (Cache Replacement Policies) CS422-Spring 2018 Biswa@CSE-IITK 1 2 4 8 16 32 64 128 From SPEC92 Miss rate: Still Applicable Today 0.14 0.12 0.1 0.08 0.06 0.04 1-way 2-way 4-way 8-way Capacity

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

Scalable High Performance Main Memory System Using PCM Technology

Scalable High Performance Main Memory System Using PCM Technology Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY International Symposium on

More information

WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System

WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System ZHE WANG, Texas A&M University SHUCHANG SHAN, Chinese Institute of Computing Technology TING CAO, Australian National University

More information

TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches

TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches Majed Valad Beigi and Gokhan Memik Department of EECS, Northwestern University, Evanston, IL majed.beigi@northwestern.edu and memik@eecs.northwestern.edu

More information

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive

More information

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion

More information

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled

More information

An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories

An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories NVM DIMM An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories Reza Salkhordeh, Onur Mutlu, and Hossein Asadi arxiv:93.7v [cs.ar] Mar 9 Abstract Emerging Non-Volatile

More information

Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache

Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache Tyler Stocksdale Advisor: Frank Mueller Mentor: Mu-Tien Chang Manager: Hongzhong Zheng 11/13/2017 Background Commodity

More information

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,

More information

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

A Page-Based Storage Framework for Phase Change Memory

A Page-Based Storage Framework for Phase Change Memory A Page-Based Storage Framework for Phase Change Memory Peiquan Jin, Zhangling Wu, Xiaoliang Wang, Xingjun Hao, Lihua Yue University of Science and Technology of China 2017.5.19 Outline Background Related

More information

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

CONTENT-AWARE SPIN-TRANSFER TORQUE MAGNETORESISTIVE RANDOM-ACCESS MEMORY (STT-MRAM) CACHE DESIGNS

CONTENT-AWARE SPIN-TRANSFER TORQUE MAGNETORESISTIVE RANDOM-ACCESS MEMORY (STT-MRAM) CACHE DESIGNS CONTENT-AWARE SPIN-TRANSFER TORQUE MAGNETORESISTIVE RANDOM-ACCESS MEMORY (STT-MRAM) CACHE DESIGNS By QI ZENG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

Reducing Writebacks Through In-Cache Displacement

Reducing Writebacks Through In-Cache Displacement 1 Reducing Writebacks Through In-Cache Displacement MOHAMMAD BAKHSHALIPOUR, Sharif University of Technology, Iran and Institute for Research in Fundamental Sciences (IPM), Iran AYDIN FARAJI, Sharif University

More information

A Bandwidth-aware Memory-subsystem Resource Management using. Non-invasive Resource Profilers for Large CMP Systems

A Bandwidth-aware Memory-subsystem Resource Management using. Non-invasive Resource Profilers for Large CMP Systems A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffrey Stuecheli, Jian Chen and Lizy K. John Department of Electrical

More information

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations

More information

Perceptron Learning for Reuse Prediction

Perceptron Learning for Reuse Prediction Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

AS the processor-memory speed gap continues to widen,

AS the processor-memory speed gap continues to widen, IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 7, JULY 2004 843 Design and Optimization of Large Size and Low Overhead Off-Chip Caches Zhao Zhang, Member, IEEE, Zhichun Zhu, Member, IEEE, and Xiaodong Zhang,

More information

Locality-Aware Data Replication in the Last-Level Cache

Locality-Aware Data Replication in the Last-Level Cache Locality-Aware Data Replication in the Last-Level Cache George Kurian, Srinivas Devadas Massachusetts Institute of Technology Cambridge, MA USA {gkurian, devadas}@csail.mit.edu Omer Khan University of

More information

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Cache Replacement Championship. The 3P and 4P cache replacement policies

Cache Replacement Championship. The 3P and 4P cache replacement policies 1 Cache Replacement Championship The 3P and 4P cache replacement policies Pierre Michaud INRIA June 20, 2010 2 Optimal replacement? Offline (we know the future) Belady Online (we don t know the future)

More information

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY Shenchen Ruan, Haixia Wang and Dongsheng Wang Tsinghua National Laboratory for Information Science and Technology, Tsinghua

More information

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction Vinson Young, Chiachen Chou, Aamer Jaleel *, and Moinuddin K. Qureshi Georgia Institute of Technology

More information

Reducing Access Latency of MLC PCMs through Line Striping

Reducing Access Latency of MLC PCMs through Line Striping Reducing Access Latency of MLC PCMs through Line Striping Morteza Hoseinzadeh Mohammad Arjomand Hamid Sarbazi-Azad HPCAN Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

The V-Way Cache : Demand-Based Associativity via Global Replacement

The V-Way Cache : Demand-Based Associativity via Global Replacement The V-Way Cache : Demand-Based Associativity via Global Replacement Moinuddin K. Qureshi David Thompson Yale N. Patt Department of Electrical and Computer Engineering The University of Texas at Austin

More information

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Farrukh Hijaz, Omer Khan University of Connecticut Power Efficiency Performance/Watt Multicores enable efficiency Power-performance

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC

More information

SEESAW: Set Enhanced Superpage Aware caching

SEESAW: Set Enhanced Superpage Aware caching SEESAW: Set Enhanced Superpage Aware caching http://synergy.ece.gatech.edu/ Set Associativity Mayank Parasar, Abhishek Bhattacharjee Ω, Tushar Krishna School of Electrical and Computer Engineering Georgia

More information

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Optimizing Replication, Communication, and Capacity Allocation in CMPs Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important

More information

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,

More information

Cache Memory Introduction and Analysis of Performance Amongst SRAM and STT-RAM from The Past Decade

Cache Memory Introduction and Analysis of Performance Amongst SRAM and STT-RAM from The Past Decade Cache Memory Introduction and Analysis of Performance Amongst S and from The Past Decade Carlos Blandon Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 386-36

More information

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Adaptive Replacement and Insertion Policy for Last Level Cache 1 Muthukumar S. and 2 HariHaran S. 1 Professor,

More information

A Power and Temperature Aware DRAM Architecture

A Power and Temperature Aware DRAM Architecture A Power and Temperature Aware DRAM Architecture Song Liu, Seda Ogrenci Memik, Yu Zhang, and Gokhan Memik Department of Electrical Engineering and Computer Science Northwestern University, Evanston, IL

More information

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers Microsoft ssri@microsoft.com Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt Microsoft Research

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories HanBin Yoon, Justin Meza, Naveen Muralimanohar*, Onur Mutlu, Norm Jouppi* Carnegie Mellon University * Hewlett-Packard

More information

SLIP: Reducing Wire Energy in the Memory Hierarchy

SLIP: Reducing Wire Energy in the Memory Hierarchy SLIP: Reducing Wire Energy in the Memory Hierarchy Subhasis Das Tor M. Aamodt William J. Dally Stanford University, University of British Columbia, NVIDIA subhasis@stanford.edu, aamodt@ece.ubc.ca, dally@stanford.edu

More information

DRAM Disturbance Errors

DRAM Disturbance Errors http://www.ddrdetective.com/ http://users.ece.cmu.edu/~yoonguk/ Flipping Bits in Memory Without Accessing Them An Experimental Study of DRAM Disturbance Errors Yoongu Kim Ross Daly, Jeremie Kim, Chris

More information

Sharing-aware Efficient Private Caching in Many-core Server Processors

Sharing-aware Efficient Private Caching in Many-core Server Processors 17 IEEE 35th International Conference on Computer Design Sharing-aware Efficient Private Caching in Many-core Server Processors Sudhanshu Shukla Mainak Chaudhuri Department of Computer Science and Engineering,

More information

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Page Replacement Algorithms

Page Replacement Algorithms Page Replacement Algorithms MIN, OPT (optimal) RANDOM evict random page FIFO (first-in, first-out) give every page equal residency LRU (least-recently used) MRU (most-recently used) 1 9.1 Silberschatz,

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Latency (ns) History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 0.5 0.75 1968 DRAM

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Bijay K.Paikaray Debabala Swain Dept. of CSE, CUTM Dept. of CSE, CUTM Bhubaneswer, India Bhubaneswer, India

More information

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Improving Cache Performance using Victim Tag Stores

Improving Cache Performance using Victim Tag Stores Improving Cache Performance using Victim Tag Stores SAFARI Technical Report No. 2011-009 Vivek Seshadri, Onur Mutlu, Todd Mowry, Michael A Kozuch {vseshadr,tcm}@cs.cmu.edu, onur@cmu.edu, michael.a.kozuch@intel.com

More information

Tiny Directory: Efficient Shared Memory in Many-core Systems with Ultra-low-overhead Coherence Tracking

Tiny Directory: Efficient Shared Memory in Many-core Systems with Ultra-low-overhead Coherence Tracking Tiny Directory: Efficient Shared Memory in Many-core Systems with Ultra-low-overhead Coherence Tracking Sudhanshu Shukla Mainak Chaudhuri Department of Computer Science and Engineering, Indian Institute

More information

Amnesic Cache Management for Non-Volatile Memory

Amnesic Cache Management for Non-Volatile Memory Amnesic Cache Management for Non-Volatile Memory Dongwoo Kang, Seungjae Baek, Jongmoo Choi Dankook University, South Korea {kangdw, baeksj, chiojm}@dankook.ac.kr Donghee Lee University of Seoul, South

More information

Selective Fill Data Cache

Selective Fill Data Cache Selective Fill Data Cache Rice University ELEC525 Final Report Anuj Dharia, Paul Rodriguez, Ryan Verret Abstract Here we present an architecture for improving data cache miss rate. Our enhancement seeks

More information

Flash-Conscious Cache Population for Enterprise Database Workloads

Flash-Conscious Cache Population for Enterprise Database Workloads IBM Research ADMS 214 1 st September 214 Flash-Conscious Cache Population for Enterprise Database Workloads Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clem Dickey,

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,

More information

Memory Cocktail Therapy: A General Learning-Based Framework to Optimize Dynamic Tradeoffs in NVMs

Memory Cocktail Therapy: A General Learning-Based Framework to Optimize Dynamic Tradeoffs in NVMs Memory Cocktail Therapy: A General Learning-Based Framework to Optimize Dynamic Tradeoffs in NVMs Zhaoxia Deng UC Santa Barbara zhaoxia@cs.ucsb.edu Lunkai Zhang University of Chicago lunkai@uchicago.edu

More information

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation Xiangyao Yu 1 Christopher J. Hughes 2 Nadathur Satish 2 Onur Mutlu 3 Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich ABSTRACT

More information

Improving Cache Management Policies Using Dynamic Reuse Distances

Improving Cache Management Policies Using Dynamic Reuse Distances Improving Cache Management Policies Using Dynamic Reuse Distances Nam Duong, Dali Zhao, Taesu Kim, Rosario Cammarota, Mateo Valero and Alexander V. Veidenbaum University of California, Irvine Universitat

More information

Lecture 12: Large Cache Design. Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers

Lecture 12: Large Cache Design. Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers Lecture 12: Large ache Design Topics: Shared vs. private, centralized vs. decentralized, UA vs. NUA, recent papers 1 Shared Vs. rivate SHR: No replication of blocks SHR: Dynamic allocation of space among

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011

15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011 15-740/18-740 Computer Architecture Lecture 5: Project Example Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011 Reminder: Project Proposals Project proposals due NOON on Monday 9/26 Two to three pages consisang

More information

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Main Memory Lectures These slides are from the Scalable Memory Systems course taught at ACACES 2013 (July 15-19,

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing Saurabh Gupta Oak Ridge National Laboratory Oak Ridge, USA guptas@ornl.gov Abstract In modern multi-core processors, last-level caches

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Cache/Memory Optimization. - Krishna Parthaje

Cache/Memory Optimization. - Krishna Parthaje Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST

More information

DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory

DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory Yuncheng Guo, Yu Hua, Pengfei Zuo Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology Huazhong

More information

1. Creates the illusion of an address space much larger than the physical memory

1. Creates the illusion of an address space much larger than the physical memory Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for

More information

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array

More information

3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016

3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016 3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016 Hybrid L2 NUCA Design and Management Considering Data Access Latency, Energy Efficiency, and Storage

More information