Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM

Size: px
Start display at page:

Download "Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM"

Transcription

1 1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones, ami. Melhem Mohammad Seyedzadeh*, Donald Kline Jr, Alex. Jones, ami. Melhem University of Pittsburgh University of Pittsburgh October 4, 2017

2 2 Executive Summary DAM scaling decreases proximity of cells and increases coupling noise between cells. Observation: There is a correlation between weak cells and data patterns that reduces reliability of the system. Our Approach: Periodic Flip Encoding (PFE) Fault Oblivious PFE (PFE FO ) minimizes the number of bad patterns Fault-Aware PFE (PFE FA ) minimizes (or eliminates) the occurrence of crosstalk errors using weak cell information Key results: Large improvement in eliability with a low performance overhead of between 1-2%

3 3 Motivation DAM cells

4 4 Motivation Technology Scaling DAM cells DAM cells DAM scaling enabled high capacity

5 5 Motivation Technology Scaling DAM cells DAM cells DAM scaling enabled high capacity Cell-to-cell crosstalk

6 6 Motivation The bad pattern occurs when a bit line swings in the opposite direction of its two neighboring bit lines In contrast, the best pattern happens when neighboring bit lines of a reference bit line swing in the opposite directions Bi-1 Bi Bi+1 Bi-1 Bi+1 Bi-1' Bi' Bi+1' H Bi-1' H Bi' H Bi+1' H Bi-1' Bi' H Bi+1' Bi-1 Bi Bi+1 Bi L L L L (a) Worst-case 000 (b) Worst-case 111 (c) Best-case 101

7 7 Motivation The bad pattern occurs when a bit line swings in the opposite direction of its two neighboring bit lines In contrast, the best pattern happens when neighboring bit lines of a reference bit line swing in the opposite directions Bi-1 Bi Bi+1 Bi-1 Bi+1 Bi-1' Bi' Bi+1' H Bi-1' H Bi' H Bi+1' H Bi-1' Bi' H Bi+1' Bi-1 Bi Bi+1 Bi L L L L (a) Worst-case 000 (b) Worst-case 111 (c) Best-case 101

8 Prior Solution to Bitline Crosstalk Noise Four-to-Five Encoding (FFE) DataWord Codeword DataWord Codeword

9 Prior Solution to Bitline Crosstalk Noise Four-to-Five Encoding (FFE) DataWord Codeword DataWord Codeword Dataword Codeword

10 Prior Solution to Bitline Crosstalk Noise Four-to-Five Encoding (FFE) DataWord Codeword DataWord Codeword Dataword Codeword

11 Prior Solution to Bitline Crosstalk Noise Four-to-Five Encoding (FFE) DataWord Codeword DataWord Codeword Dataword Codeword Advantage emove bad patterns from 5-bit groups Disadvantage 25% Overhead

12 Our Solution: Periodic Flip Encoding (PFE FO ) 12 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) (c) (d) Encoding the Original Codeword

13 Our Solution: Periodic Flip Encoding (PFE FO ) 13 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) (c) (d)

14 Our Solution: Periodic Flip Encoding (PFE FO ) 14 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) (c) (d)

15 Our Solution: Periodic Flip Encoding (PFE FO ) 15 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) (c) (d)

16 Our Solution: Periodic Flip Encoding (PFE FO ) 16 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) (c) (d) Use two auxiliary bits per cache-line to specify the code word used.

17 Our Solution: Periodic Flip Encoding (PFE FO ) 17 Partition the data into 3-bit groups and then flip the same bit position of each group (a) (b) PFE FO (c) (d) Codeword with the minimum number of bad patterns

18 Our Solution: Periodic Flip Encoding (PFE FA ) 18 Fault Oblivious PFE (PFE FO ) minimizes the number of bad patterns Fault-Aware PFE (PFE FA ) minimizes (or eliminates) the occurrence of crosstalk errors using Weak Cell Map (WCM)

19 Our Solution: Periodic Flip Encoding (PFE FA ) 19 Given location of weak cells, pick the codeword with no overlap between weak cells and bad patterns (a) (b) (c) (d)

20 Our Solution: Periodic Flip Encoding (PFE FA ) 20 Given location of weak cells, pick the codeword with no overlap between weak cells and bad patterns (a) (b) (c) (d)

21 Our Solution: Periodic Flip Encoding (PFE FA ) 21 1 Modified Memory Controller PFE FA Last Level Cache Address bit Data / WCM Encoder Module_0 Encoder Module_1 Encoder Module_14 Encoder Module_15 5 CW Original Memory Controller CW0 Encoder Module_ i Encoder_0 Memory controller implementation of fault aware PFE FA CW1 CW2 CW3 Encoder_1 Encoder_2 Encoder_3 4:1 Mux CW_ ij Main Memory WCM 4 2:4 Encoder 2

22 Experimental Methodology 22 We use PIN-based simulator to model the cache hierarchy in order to determine the accesses to main memory. CPU L1 Cache L2 Cache Cache Block Write Buffer 4-core, 8-issue width per core, out of order 16K private Inst. & Data, 8-way set-assoc. 1MB shared 16-way set-assoc. 512-bits 64-entries Benchmark Weak Cell Map PASEC, SPEC CPU2006 Bayesian distribution Fault ate 0.01%, 0.1%, 1%

23 Experimental Methodology 23 Uncorrectable Bit Error ate (UBE) ECPFO: protect against potential faults by pointing to weak cells and providing reliable storage for their content. ECPFA: pointing to and storing the values of weak cells that overlap with the center of any bad pattern. ECC-k FFE PFE ECP-k Overhead per n-bit block K[log(n)]+1 [n/4] 2 k([log(n)]+1)+1 ECC-1 32 ECC-2 32 ECC FFE PFE ECP-3 ECP-12 Block size Overhead bits per block Overhead % 18.75% 34.37% 6.25% 25% 6.25% 6.05% 23.63%

24 Experimental Methodology 24 Uncorrectable Bit Error ate (UBE) ECPFO: protect against potential faults by pointing to weak cells and providing reliable storage for their content ECPFA: pointing to and storing the values of weak cells that overlap with the center of any bad pattern ECC-k FFE PFE ECP-k Overhead per n-bit block K[log(n)]+1 [n/4] 2 k([log(n)]+1)+1 ECC-1 32 ECC-2 32 ECC FFE PFE ECP-3 ECP-12 Block size Overhead bits per block Overhead % 18.75% 34.37% 6.25% 25% 6.25% 6.05% 23.63% ISO-area

25 Experimental Methodology 25 Uncorrectable Bit Error ate (UBE) ECPFO: protect against potential faults by pointing to weak cells and providing reliable storage for their content ECPFA: pointing to and storing the values of weak cells that overlap with the center of any bad pattern ECC-k FFE PFE ECP-k Overhead per n-bit block K[log(n)]+1 [n/4] 2 k([log(n)]+1)+1 ECC-1 32 ECC-2 32 ECC FFE PFE ECP-3 ECP-12 Block size Overhead bits per block Overhead % 18.75% 34.37% 6.25% 25% 6.25% 6.05% 23.63% ISO-area

26 UBE (Lower is BeUer) 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 1.E-09 UBE (0.01% incidence of weak cells) FFE ECP -12 PFE +ECC-1 FO FO 32 blackscholes bodytrack ferret fluidanimate freqmine raytrace swapcons vips x264 canneal dedup streamcluster parsec mean bzip2 gobmk hmmer 26 libquantum mcf sjeng cactusadm calculix GemsFDTD lbm milc namd spec mean mean

27 UBE (0.01% incidence of weak cells) 27 1.E-04 FFE ECP -12 PFE +ECC-1 FO FO 32 UBE (Lower is BeUer) UBE (Lower is BeUer) 1.E-05 1.E-06 1.E-07 1.E-08 1.E-09 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 1.E-09 1.E-10 1.E-11 1.E-12 blackscholes bodytrack ferret fluidanimate freqmine raytrace swapcons vips x264 ECC-1 ECP -3 PFE 128 blackscholes bodytrack ferret fluidanimate freqmine raytrace swapdons vips x264 canneal dedup streamcluster parsec mean canneal dedup streamcluster parsec mean bzip2 gobmk hmmer libquantum FA FA mcf sjeng cactusadm calculix GemsFDTD lbm milc namd spec mean mean bzip2 gobmk hmmer libquantum mcf sjeng cactusadm calculix GemsFDTD lbm milc namd spec mean mean 10-6

28 Performance (IPC) overhead of being cost aware 28 Baseline(PFE FO ): Fault-oblivious scheme MemCTL(PFE FA ): Encoding in MemCTL with Fault information cached in MemCTL MemDIMM(PFE FA ): Encoding in DIMM with Fault information resides on DIMM Baseline MemCTL MemDIMM IPC blackscholes bodytrack ferret fluidanimate freqmine raytrace swap>ons vips x264 canneal dedup streamcluster parsec mean bzip2 gobmk hmmer libquantum mcf sjeng cactusadm calculix GemsFDTD lbm milc namd spec mean mean

29 UBE for different fault mitigation schemes 29 UBE (Lower is Be:er) 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 <1.0E-11 1.E % 0.10% 1.00% Less than

30 30 Conclusion DAM scaling decreases proximity of cells and increases coupling noise between cells. Observation: There is a correlation between weak cells and data patterns that reduces reliability of the system. Our Approach: Periodic Flip Encoding (PFE) Fault Oblivious PFE (PFE FO ) minimizes the number of bad patterns Fault-Aware PFE (PFE FA ) minimizes (or eliminates) the occurrence of crosstalk errors using weak cell information Key results: Large improvement in eliability with a low performance overhead of between 1-2%

31 31 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones, ami. Melhem Mohammad Seyedzadeh*, Donald Kline Jr, Alex. Jones, ami. Melhem University of Pittsburgh University of Pittsburgh October 4, 2017

Mitigating Bitline Crosstalk Noise in DRAM Memories

Mitigating Bitline Crosstalk Noise in DRAM Memories Mitigating Bitline Crosstalk Noise in DRAM Memories Seyed Mohammad Seyedzadeh, Donald Kline Jr, Alex K. Jones, Rami Melhem University of Pittsburgh seyedzadeh@cs.pitt.edu,{dek61,akjones}@pitt.edu,melhem@cs.pitt.edu

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012 Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso

More information

Flexible Cache Error Protection using an ECC FIFO

Flexible Cache Error Protection using an ECC FIFO Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead

More information

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What

More information

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

Lightweight Memory Tracing

Lightweight Memory Tracing Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for

More information

A Fast Instruction Set Simulator for RISC-V

A Fast Instruction Set Simulator for RISC-V A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Sandbox Based Optimal Offset Estimation [DPC2]

Sandbox Based Optimal Offset Estimation [DPC2] Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset

More information

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation

More information

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Virtualized ECC: Flexible Reliability in Memory Systems

Virtualized ECC: Flexible Reliability in Memory Systems Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing

More information

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem Computer Science Department, Electrical and Computer Engineering

More information

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem Computer Science Department, Electrical and Computer Engineering

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC

More information

Near-Threshold Computing: How Close Should We Get?

Near-Threshold Computing: How Close Should We Get? Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on

More information

Footprint-based Locality Analysis

Footprint-based Locality Analysis Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.

More information

MorphCache: A Reconfigurable Adaptive Multi-level Cache Hierarchy

MorphCache: A Reconfigurable Adaptive Multi-level Cache Hierarchy MorphCache: A Reconfigurable Adaptive Multi-level Cache Hierarchy Shekhar Srikantaiah, Emre Kultursay, Tao Zhang, Mahmut Kandemir, Mary Jane Irwin, Yuan Xie The Pennsylvania State University, University

More information

Pipelining. CS701 High Performance Computing

Pipelining. CS701 High Performance Computing Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.

More information

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University

More information

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 Alberto Ros Universidad de Murcia October 17th, 2017 1 A. Ros, T. E. Carlson, M. Alipour, and S. Kaxiras, "Non-Speculative Load-Load Reordering in TSO". ISCA,

More information

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science

More information

CloudCache: Expanding and Shrinking Private Caches

CloudCache: Expanding and Shrinking Private Caches CloudCache: Expanding and Shrinking Private Caches Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers Computer Science Department, University of Pittsburgh {abraham,cho,childers}@cs.pitt.edu Abstract The

More information

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk Huh Computer Science Department, KAIST {jeongseob,

More information

Demand-Driven Software Race Detection using Hardware

Demand-Driven Software Race Detection using Hardware Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse, Zhiqiang Ma, Matthew I. Frank Ramesh Peri, Todd Austin University of Michigan Intel Corporation CSCADS Aug

More information

Characterizing Multi-threaded Applications based on Shared-Resource Contention

Characterizing Multi-threaded Applications based on Shared-Resource Contention Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang Jack W. Davidson Mary Lou Soffa Department of Computer Science University of Virginia Charlottesville,

More information

ViPZonE: Exploi-ng DRAM Power Variability for Energy Savings in Linux x86-64

ViPZonE: Exploi-ng DRAM Power Variability for Energy Savings in Linux x86-64 ViPZonE: Exploi-ng DRAM Power Variability for Energy Savings in Linux x86-64 Mark Gottscho M.S. Project Report Advised by Dr. Puneet Gupta NanoCAD Lab, UCLA Electrical Engineering UCI Research Collaborators:

More information

Getting Started with

Getting Started with /************************************************************************* * LaCASA Laboratory * * Authors: Aleksandar Milenkovic with help of Mounika Ponugoti * * Email: milenkovic@computer.org * * Date:

More information

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements

More information

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce

More information

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*

More information

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Technical Report CSE--00 Monday, June 20, 20 Adwait Jog Asit K. Mishra Cong Xu Yuan Xie adwait@cse.psu.edu amishra@cse.psu.edu

More information

ENERGY consumption has become a major factor for

ENERGY consumption has become a major factor for 1 Rank-Aware Dynamic Migrations and Adaptive Demotions for DRAM Power Management Yanchao Lu, Donghong Wu, Bingsheng He, Xueyan Tang, Jianliang Xu and Minyi Guo arxiv:1409.5567v1 [cs.pf] 19 Sep 2014 Abstract

More information

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Thesis Defense Lavanya Subramanian

Thesis Defense Lavanya Subramanian Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)

More information

An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories

An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories NVM DIMM An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories Reza Salkhordeh, Onur Mutlu, and Hossein Asadi arxiv:93.7v [cs.ar] Mar 9 Abstract Emerging Non-Volatile

More information

Fine- grain Memory Deduplica4on for In- memory Database Systems. Heiner Litz, David Cheriton, Pete Stevenson Stanford University

Fine- grain Memory Deduplica4on for In- memory Database Systems. Heiner Litz, David Cheriton, Pete Stevenson Stanford University Fine- grain Memory Deduplica4on for In- memory Database Systems Heiner Litz, David Cheriton, Pete Stevenson Stanford University 1 Memory Capacity Challenge In- memory databases Limited by memory capacity

More information

Addressing End-to-End Memory Access Latency in NoC-Based Multicores

Addressing End-to-End Memory Access Latency in NoC-Based Multicores Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu

More information

COP: To Compress and Protect Main Memory

COP: To Compress and Protect Main Memory COP: To Compress and Protect Main Memory David J. Palframan Nam Sung Kim Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin Madison palframan@wisc.edu, nskim3@wisc.edu,

More information

COTS Multicore Processors in Avionics Systems: Challenges and Solutions

COTS Multicore Processors in Avionics Systems: Challenges and Solutions COTS Multicore Processors in Avionics Systems: Challenges and Solutions Dionisio de Niz Bjorn Andersson and Lutz Wrage dionisio@sei.cmu.edu, baandersson@sei.cmu.edu, lwrage@sei.cmu.edu Report Documentation

More information

Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches

Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches David Kadjo, Hyungjun Kim, Paul Gratz, Jiang Hu and Raid Ayoub Department of Electrical and Computer Engineering Texas A& M University,

More information

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Identifying Ad-hoc Synchronization for Enhanced Race Detection Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum

More information

Scalable Dynamic Task Scheduling on Adaptive Many-Cores

Scalable Dynamic Task Scheduling on Adaptive Many-Cores Introduction: Many- Paradigm [Our Definition] Scalable Dynamic Task Scheduling on Adaptive Many-s Vanchinathan Venkataramani, Anuj Pathania, Muhammad Shafique, Tulika Mitra, Jörg Henkel Bus CES Chair for

More information

High System-Code Security with Low Overhead

High System-Code Security with Low Overhead High System-Code Security with Low Overhead Jonas Wagner, Volodymyr Kuznetsov, George Candea, and Johannes Kinder École Polytechnique Fédérale de Lausanne Royal Holloway, University of London High System-Code

More information

Dynamic and Adaptive Calling Context Encoding

Dynamic and Adaptive Calling Context Encoding ynamic and daptive alling ontext Encoding Jianjun Li State Key Laboratory of omputer rchitecture, Institute of omputing Technology, hinese cademy of Sciences lijianjun@ict.ac.cn Wei-hung Hsu epartment

More information

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance

More information

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University

More information

HOTL: A Higher Order Theory of Locality

HOTL: A Higher Order Theory of Locality HOTL: A Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

HOTL: a Higher Order Theory of Locality

HOTL: a Higher Order Theory of Locality HOTL: a Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

Perceptron Learning for Reuse Prediction

Perceptron Learning for Reuse Prediction Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level

More information

Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization

Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization Anton Kuijsten Andrew S. Tanenbaum Vrije Universiteit Amsterdam 21st USENIX Security Symposium Bellevue,

More information

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument

More information

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Bijay K.Paikaray Debabala Swain Dept. of CSE, CUTM Dept. of CSE, CUTM Bhubaneswer, India Bhubaneswer, India

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

HAPPY: Hybrid Address-based Page Policy in DRAMs

HAPPY: Hybrid Address-based Page Policy in DRAMs HAPPY: Hybrid Address-based Page Policy in DRAMs Mohsen Ghasempour, Aamer Jaleel, Jim Garside and Mikel Luján School of Computer Science, University of Manchester NVidia Research a better option. In this

More information

Enhancing LRU Replacement via Phantom Associativity

Enhancing LRU Replacement via Phantom Associativity Enhancing Replacement via Phantom Associativity Min Feng Chen Tian Rajiv Gupta Dept. of CSE, University of California, Riverside Email: {mfeng, tianc, gupta}@cs.ucr.edu Abstract In this paper, we propose

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon Electrical and Computer Engineering Department The University of Texas at Austin doehyun.yoon@gmail.com Mattan Erez Electrical and Computer Engineering

More information

Exploi'ng Compressed Block Size as an Indicator of Future Reuse

Exploi'ng Compressed Block Size as an Indicator of Future Reuse Exploi'ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Execu've Summary In a compressed

More information

ARI: Adaptive LLC-Memory Traffic Management

ARI: Adaptive LLC-Memory Traffic Management 0 ARI: Adaptive LLC-Memory Traffic Management VIACHESLAV V. FEDOROV, Texas A&M University SHENG QIU, Texas A&M University A. L. NARASIMHA REDDY, Texas A&M University PAUL V. GRATZ, Texas A&M University

More information

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Appears in the 43 rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43) Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Daehoon Kim, Hwanju Kim, and Jaehyuk Huh Computer

More information

Multi-Cache Resizing via Greedy Coordinate Descent

Multi-Cache Resizing via Greedy Coordinate Descent Noname manuscript No. (will be inserted by the editor) Multi-Cache Resizing via Greedy Coordinate Descent I. Stephen Choi Donald Yeung Received: date / Accepted: date Abstract To reduce power consumption

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 36 Performance 2010-04-23 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in

More information

DynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories

DynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories SAFARI Technical Report No. 2-5 (December 6, 2) : A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon hanbinyoon@cmu.edu Justin Meza meza@cmu.edu

More information

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted

More information

Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs for Chip Multiprocessors Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Dept. of Computer Science Rutgers University abhib@cs.rutgers.edu Daniel Lustig Dept. of Electrical Engineering Princeton University

More information

A Front-end Execution Architecture for High Energy Efficiency

A Front-end Execution Architecture for High Energy Efficiency A Front-end Execution Architecture for High Energy Efficiency Ryota Shioya, Masahiro Goshima and Hideki Ando Department of Electrical Engineering and Computer Science, Nagoya University, Aichi, Japan Information

More information

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University

More information

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical

More information

Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors

Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors Jie Meng, Tiansheng Zhang, and Ayse K. Coskun Electrical and Computer Engineering Department, Boston University,

More information

Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations

Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations Young Hoon Son Seongil O Yuhwan Ro JaeW.Lee Jung Ho Ahn Seoul National University Sungkyunkwan University Seoul, Korea Suwon, Korea

More information

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu

More information

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,

More information

Architecture of Parallel Computer Systems - Performance Benchmarking -

Architecture of Parallel Computer Systems - Performance Benchmarking - Architecture of Parallel Computer Systems - Performance Benchmarking - SoSe 18 L.079.05810 www.uni-paderborn.de/pc2 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Definition of Benchmark

More information

Portal del coneixement obert de la UPC

Portal del coneixement obert de la UPC UPCommons Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Aquesta és una còpia de la versió author s final draft d'un article publicat a la revista IEEE journal of selected topics

More information

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Adwait Jog Asit K. Mishra Cong Xu Yuan Xie Vijaykrishnan Narayanan Ravishankar Iyer Chita R. Das The Pennsylvania State

More information

Bias Scheduling in Heterogeneous Multi-core Architectures

Bias Scheduling in Heterogeneous Multi-core Architectures Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that

More information

Critical Packet Prioritisation by Slack-Aware Re-routing in On-Chip Networks

Critical Packet Prioritisation by Slack-Aware Re-routing in On-Chip Networks Critical Packet Prioritisation by Slack-Aware Re-routing in On-Chip Networks Abhijit Das, Sarath Babu, John Jose, Sangeetha Jose and Maurizio Palesi Dept. of Computer Science and Engineering, Indian Institute

More information

AMNESIAC: Amnesic Automatic Computer

AMNESIAC: Amnesic Automatic Computer AMNESIAC: Amnesic Automatic Computer Trading Computation for Communication for Energy Efficiency Ismail Akturk Ulya R. Karpuzcu University of Minnesota, Twin Cities {aktur002,ukarpuzc}@umn.edu Abstract

More information

Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches

Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Daniel A. Jiménez Department of Computer Science and Engineering Texas A&M University ABSTRACT Last-level caches mitigate the high latency

More information

Data Prefetching by Exploiting Global and Local Access Patterns

Data Prefetching by Exploiting Global and Local Access Patterns Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical

More information

OpenPrefetch. (in-progress)

OpenPrefetch. (in-progress) OpenPrefetch Let There Be Industry-Competitive Prefetching in RISC-V Processors (in-progress) Bowen Huang, Zihao Yu, Zhigang Liu, Chuanqi Zhang, Sa Wang, Yungang Bao Institute of Computing Technology(ICT),

More information

Spatial Memory Streaming (with rotated patterns)

Spatial Memory Streaming (with rotated patterns) Spatial Memory Streaming (with rotated patterns) Michael Ferdman, Stephen Somogyi, and Babak Falsafi Computer Architecture Lab at 2006 Stephen Somogyi The Memory Wall Memory latency 100 s clock cycles;

More information

BUNSHIN: Compositing Security Mechanisms through Diversification (with Appendix)

BUNSHIN: Compositing Security Mechanisms through Diversification (with Appendix) BUNSHIN: Compositing Security Mechanisms through Diversification (with Appendix) Meng Xu, Kangjie Lu, Taesoo Kim, Wenke Lee Georgia Institute of Technology in practice. One reason is that the slowdown

More information

Best-Offset Hardware Prefetching

Best-Offset Hardware Prefetching Best-Offset Hardware Prefetching Pierre Michaud March 2016 2 BOP: yet another data prefetcher Contribution: offset prefetcher with new mechanism for setting the prefetch offset dynamically - Improvement

More information

Reducing Writebacks Through In-Cache Displacement

Reducing Writebacks Through In-Cache Displacement 1 Reducing Writebacks Through In-Cache Displacement MOHAMMAD BAKHSHALIPOUR, Sharif University of Technology, Iran and Institute for Research in Fundamental Sciences (IPM), Iran AYDIN FARAJI, Sharif University

More information

PIPELINING AND PROCESSOR PERFORMANCE

PIPELINING AND PROCESSOR PERFORMANCE PIPELINING AND PROCESSOR PERFORMANCE Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 1, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department Compiler Construction of Idempotent Regions Marc de Kruijf Karthikeyan Sankaralingam Somesh Jha Technical Report #1700 November 2011 Compiler Construction of Idempotent Regions

More information

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Novel Nonvolatile Memory Hierarchies to Realize Normally-Off Mobile Processors ASP-DAC 2014 Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced

More information

Bunshin: Compositing Security Mechanisms through Diversification

Bunshin: Compositing Security Mechanisms through Diversification Bunshin: Compositing Security Mechanisms through Diversification Meng Xu, Kangjie Lu, Taesoo Kim, and Wenke Lee, Georgia Institute of Technology https://www.usenix.org/conference/atc17/technical-sessions/presentation/xu-meng

More information