Virtualized and Flexible ECC for Main Memory

Size: px
Start display at page:

Download "Virtualized and Flexible ECC for Main Memory"

Transcription

1 Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS

2 Memory Error Protection Applying ECC uniformly ECC DIMMs Simple and transparent to programmers Error protection level Fixed, design-time decision Chipkill-correct used in high-end servers Constrain memory module design space Allow only x4 DRAMs Lower energy efficiency than x8 DRAMs Virtualized ECC objectives To provide flexible memory error protection To relax design constraints of chipkill 2

3 Virtualized ECC Two-tiered error protection Tier-1 Error Code (T1EC) Simple error code for detection or light-weight correction Tier-2 Error Code (T2EC) Strong error correcting code Store T2EC within the memory namespace itself OS manages T2EC Flexible memory error protection Different T2EC for different data pages Stronger protection for more important data 3

4 Error Protection Level Virtualized ECC Example Virtual Address space Physical Memory Low Virtual page i Virtual page j Virtual Page to Physical Frame mapping Page frame i Page frame j High Virtual page k Page frame k T2EC for Chipkill ECC page j Physical Frame to ECC Page mapping ECC page k Data T1EC T2EC for Double Chipkill 4

5 VIRTUALIZED ECC 5

6 Observations on Memory Errors Per-system error rate is still low Most of time, we try to detect errors finding no error To detect errors is a common case operation Need a low latency, low complexity error detection mechanism T1EC To correct errors is an uncommon case operation Correction can be complex, take a long time But, still need to manage error correction info somewhere Virtualized T2EC 6

7 Uniform ECC Physical Memory VA VPN offset Page Frame PA Virtual Memory PA PFN offset Data ECC 7

8 Virtualized ECC Physical Memory VA VPN offset Page Frame PA Virtual Memory PA PFN offset OS manages PFN to EPN translation Scale according to T2EC size EA T2EC ECC Page ECC Address ECC page number offset Data T1EC 8

9 Update only valid T2EC to DRAM Write: update data, T1EC, and T2EC Don t T2ECs Read: Virtualized of consecutive T2EC need fetch lines T2EC can data ECC data be lines partially operation and map most valid to T1EC a T2EC cases line ECC Address Translation Unit: fast PA to EA translation B0 A 3 PA: 0x0200 ECC Address Translation Unit LLC EA: 0x Wr: 0x0200 DRAM Rank 0 Rank c Rd: 0x00c0 A 5 Wr: 0x0540 B0 B1 B2 B c c c T2EC for Rank 1 data c c Data T1EC Data T1EC T2EC for Rank 0 data

10 Penalty with V-ECC Increased data miss rate T2EC lines in LLC reduce effective LLC size Increased traffic due to T2EC write-back One-way write-back traffic Not in a critical-path 10

11 CHIPKILL-CORRECT 11

12 Chipkill-correct Single Device-error Correct Double Device-error Detect Can tolerate a DRAM failure Can detect a second DRAM failure Chipkill requires x4 DRAMs x8 chipkill is impractical But, x8 DRAM is more energy efficient 12

13 Baseline x4 Chipkill Two x4 ECC DIMMs 128bit data + 16bit ECC (redundancy overhead: 12.5%) 4 check symbol error code using 4-bit symbol Access granularity 64B in DDR2 (min. burst 4 x 128 bit) 128B in DDR3 (min. burst 8 x 128 bit) 144-bit wide data bus x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 13

14 x8 Chipkill x8 chipkill with the same access granularity 152-bit wide data path 128-bit data + 24-bit ECC Redundancy overhead: 18.75% Need a custom-designed DIMM Increase the system cost a lot x8 152-bit wide data bus x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 14

15 x8 Chipkill /w Standard DIMMs Increase access granularity 128B in DDR2 (min. burst 4 x 256 bit) 256B in DDR3 (min. burst 8 x 256 bit) 280-bit wide data bus x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 15

16 V-ECC for Chipkill Use 3 check symbol error codes Single Symbol-error Correct and Double Symbol-error Detect T1EC 2 check symbols Detect up to 2 symbol error T2EC 3rd check symbol Combined T1EC/T2EC provides Chipkill 16

17 V-ECC: ECC x4 configuration Use 8-bit symbol error code 2 bursts out of a x4 DRAM form an 8bit-symbol Modern DRAMs have minimum burst of 4 or 8 1 x4 ECC DIMM + 1 x4 Non-ECC DIMM Each DRAM access in DDR2 (burst 4) 64B data, 4B T1EC 2B T2EC is virtualized within memory namespace 32 T2ECs per 64B cache line 136-bit wide data bus Virtualized within memory T2EC Data T1EC x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 Data x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 17

18 V-ECC: ECC x8 configuration Use 8-bit symbol error code 2 x8 ECC DIMMs Each DRAM access in DDR2 (burst 4) 64B data, 8B T1EC 4B T2EC is virtualized 16 T2ECs per 64B cache line 144-bit wide data bus Virtualized within memory T2EC Data T1EC x8 x8 x8 x8 x8 x8 x8 x8 x8 Data T1EC x8 x8 x8 x8 x8 x8 x8 x8 x8 18

19 Flexible Error Protection Single HW with V-ECC can provide Chipkill-detect, Chipkill-correct, and Double chipkill-correct Use different T2EC for different pages Chipkill- Detect Chipkill- Correct ECC x4 0B 2B 4B Reliability Performance tradeoff ECC x8 0B 4B 8B Double Chipkill- Correct Maximize performance/power efficiency with Chipkill-Detect Stronger protection at the cost of additional T2EC access 19

20 EVALUATION 20

21 Simulator/Workload GEMS + DRAMsim An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 800MHz 12.8GB/s (128-bit wide data path) 1 channel 4 ranks Power model WATTCH for processor power scaled to 45nm CACTI for cache power cacti 45nm Micron model for DRAM power commodity DRAMs Workloads 12 data intensive applications from SPEC CPU 2006 and PARSEC Microbenchmarks: STREAM and GUPS 21

22 STREAM GUPS Normalized Execution Time Less than 1% penalty on average Performance penalty Spatial locality Write-back traffic Baseline x4 ECC x4 ECC x8 bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg SPEC 2006 PARSEC

23 STREAM GUPS System Energy Efficiency Energy Delay Product (EDP) gain ECC x4: 1.1% on average ECC x8: 12.0% on average Baseline x4 ECC x4 ECC x8 17% 20% bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg % % SPEC 2006 PARSEC

24 Flexible Error Protection Chipkill-Detect Chipkill-Correct Double Chipkill-Correct Normalized Execution Time Normalized EDP bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg STREAM GUPS SPEC 2006 PARSEC

25 Conclusion Virtualized ECC Two-tiered error protection, virtualized T2EC Improved system energy efficiency with chipkill Reduce DRAM power consumption by 27% Improve system EDP by 12% Performance penalty 1% on average Error protection even for Non-ECC DIMMs Can be used for GPU memory error protection Flexibility in error protection Adaptive error protection level by user/system demand Cost of error protection is proportional to protection level 25

26 Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 26

Virtualized ECC: Flexible Reliability in Memory Systems

Virtualized ECC: Flexible Reliability in Memory Systems Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing

More information

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection

More information

Flexible Cache Error Protection using an ECC FIFO

Flexible Cache Error Protection using an ECC FIFO Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon Electrical and Computer Engineering Department The University of Texas at Austin doehyun.yoon@gmail.com Mattan Erez Electrical and Computer Engineering

More information

THE DYNAMIC GRANULARITY MEMORY SYSTEM

THE DYNAMIC GRANULARITY MEMORY SYSTEM THE DYNAMIC GRANULARITY MEMORY SYSTEM Doe Hyun Yoon IIL, HP Labs Michael Sullivan Min Kyu Jeong Mattan Erez ECE, UT Austin MEMORY ACCESS GRANULARITY The size of block for accessing main memory Often, equal

More information

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Mattan Erez The University of Teas at Austin 2010 Workshop on Architecting Memory Systems March 1, 2010 iggest Problems

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1 PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute

More information

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard

More information

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012 Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso

More information

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University

More information

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3) Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips

More information

MEMORY reliability is a major challenge in the design of

MEMORY reliability is a major challenge in the design of 3766 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 12, DECEMBER 2016 Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems Hsing-Min Chen, Supreet Jeloka,

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong *, Doe Hyun Yoon, Dam Sunwoo, Michael Sullivan *, Ikhwan Lee *, and Mattan Erez * * Dept. of Electrical and Computer Engineering,

More information

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce

More information

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction Vinson Young, Chiachen Chou, Aamer Jaleel *, and Moinuddin K. Qureshi Georgia Institute of Technology

More information

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B. Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized

More information

Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction

Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction 1 Refresh Basics A cell is expected to have a retention time of 64ms; every cell must be refreshed within a 64ms window

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University

More information

DRAM Main Memory. Dual Inline Memory Module (DIMM)

DRAM Main Memory. Dual Inline Memory Module (DIMM) DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

CLEAN-ECC: High Reliability ECC for Adaptive Granularity Memory System

CLEAN-ECC: High Reliability ECC for Adaptive Granularity Memory System -ECC: High Reliability ECC for Adaptive Granularity Memory System Seong-Lyong Gong ECE, UT Austin sl.gong@utexas.edu Jinsuk Chung ECE, UT Austin chungdna@gmail.com Minsoo Rhu NVIDIA mrhu@nvidia.com Mattan

More information

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University

More information

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM 1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones,

More information

Efficient RAS support for 3D Die-Stacked DRAM

Efficient RAS support for 3D Die-Stacked DRAM Efficient RAS support for 3D Die-Stacked DRAM Hyeran Jeon University of Southern California hyeranje@usc.edu Gabriel H. Loh AMD Research gabriel.loh@amd.com Murali Annavaram University of Southern California

More information

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates

More information

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements

More information

SEESAW: Set Enhanced Superpage Aware caching

SEESAW: Set Enhanced Superpage Aware caching SEESAW: Set Enhanced Superpage Aware caching http://synergy.ece.gatech.edu/ Set Associativity Mayank Parasar, Abhishek Bhattacharjee Ω, Tushar Krishna School of Electrical and Computer Engineering Georgia

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 9, 2015

More information

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu

More information

Footprint-based Locality Analysis

Footprint-based Locality Analysis Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to

More information

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont Lecture 16 Virtual Memory Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and

More information

Computing on the Lunatic Fringe: Exascale Computers and Why You Should Care. Mattan Erez The University of Texas at Austin

Computing on the Lunatic Fringe: Exascale Computers and Why You Should Care. Mattan Erez The University of Texas at Austin 1 Computing on the Lunatic Fringe: Exascale Computers and Why You Should Care Mattan Erez The University of Texas at Austin (C) Mattan Erez 2 Arch-focused whole-system approach Efficiency requirements

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering

More information

Lecture: Memory Technology Innovations

Lecture: Memory Technology Innovations Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor intro 1 Row Buffers

More information

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian

More information

Computer Architecture. Lecture 8: Virtual Memory

Computer Architecture. Lecture 8: Virtual Memory Computer Architecture Lecture 8: Virtual Memory Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Ideal Memory Zero access time

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

OpenPrefetch. (in-progress)

OpenPrefetch. (in-progress) OpenPrefetch Let There Be Industry-Competitive Prefetching in RISC-V Processors (in-progress) Bowen Huang, Zihao Yu, Zhigang Liu, Chuanqi Zhang, Sa Wang, Yungang Bao Institute of Computing Technology(ICT),

More information

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Near-Threshold Computing: How Close Should We Get?

Near-Threshold Computing: How Close Should We Get? Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on

More information

NAND Interleaving & Performance

NAND Interleaving & Performance NAND Interleaving & Performance What You Need to Know Presented by: Keith Garvin Product Architect, Datalight August 2008 1 Overview What is interleaving, why do it? Bus Level Interleaving Interleaving

More information

Introduction to memory system :from device to system

Introduction to memory system :from device to system Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle

More information

EEM 486: Computer Architecture. Lecture 9. Memory

EEM 486: Computer Architecture. Lecture 9. Memory EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization

SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization 2017 IEEE International Symposium on High Performance Computer Architecture SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization Jee Ho Ryoo The University of Texas at Austin Austin, TX

More information

Resilient Memory Architectures A very short tutorial on ECC and repair Dong Wan Kim Jungrae Kim Mattan Erez

Resilient Memory Architectures A very short tutorial on ECC and repair Dong Wan Kim Jungrae Kim Mattan Erez Resilient Memory Architectures A very short tutorial on ECC and repair Dong Wan Kim Jungrae Kim Mattan Erez The University of Texas at Austin 2 Are DRAM errors rare? Many errors per minute (100k nodes,

More information

ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems

ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems Xun Jian University of Illinois at Urbana-Champaign Email: xunjian1@illinois.edu Rakesh Kumar University of

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Improving DRAM Performance by Parallelizing Refreshes with Accesses Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu Executive Summary DRAM refresh interferes

More information

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES MICRO 2011 @ Porte Alegre, Brazil Gabriel H. Loh [1] and Mark D. Hill [2][1] December 2011 [1] AMD Research [2] University

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

A Power and Temperature Aware DRAM Architecture

A Power and Temperature Aware DRAM Architecture A Power and Temperature Aware DRAM Architecture Song Liu, Seda Ogrenci Memik, Yu Zhang, and Gokhan Memik Department of Electrical Engineering and Computer Science Northwestern University, Evanston, IL

More information

Tag Tables. Sean Franey & Mikko Lipasti University of Wisconsin - Madison

Tag Tables. Sean Franey & Mikko Lipasti University of Wisconsin - Madison Tag Tables Sean Franey & Mikko Lipasti University of Wisconsin - Madison sfraney@wisc.edu, mikko@engr.wisc.edu Abstract Tag Tables enable storage of tags for very large setassociative caches - such as

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 18 GPUs (III)

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 18 GPUs (III) EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 18 GPUs (III) Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture

More information

Robust GPU Architectures Improving Irregular Execution on Architectures Tuned for Regularity

Robust GPU Architectures Improving Irregular Execution on Architectures Tuned for Regularity Robust GPU Architectures Improving Irregular Execution on Architectures Tuned for Regularity Mattan Erez The University of Texas at Austin (C) Mattan Erez 2 Lots of interesting multi-level projects Resilience/

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Shin-Shiun Chen, Chun-Kai Hsu, Hsiu-Chuan Shih, and Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983

More information

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Latency (ns) History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 0.5 0.75 1968 DRAM

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching

Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching Chang Hyun Park, Taekyung Heo, Jaehyuk Huh School of Computing, KAIST {changhyunpark, tkheo}@calab.kaist.ac.kr, and

More information

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache Is the cache indexed with virtual or physical address? To index with a physical address, we

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Lecture 17. Fall 2007 Prof. Thomas Wenisch. row enable. _bitline. Lecture 18 Slide 1 EECS 470

Lecture 17. Fall 2007 Prof. Thomas Wenisch. row enable. _bitline. Lecture 18 Slide 1 EECS 470 Lecture 17 DRAM Memory row enable Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4 70 _bitline Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen,

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 29: an Introduction to Virtual Memory Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Virtual memory used to protect applications

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,

More information

Main Memory (Fig. 7.13) Main Memory

Main Memory (Fig. 7.13) Main Memory Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization

More information

COP: To Compress and Protect Main Memory

COP: To Compress and Protect Main Memory COP: To Compress and Protect Main Memory David J. Palframan Nam Sung Kim Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin Madison palframan@wisc.edu, nskim3@wisc.edu,

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture

More information

Spatial Memory Streaming (with rotated patterns)

Spatial Memory Streaming (with rotated patterns) Spatial Memory Streaming (with rotated patterns) Michael Ferdman, Stephen Somogyi, and Babak Falsafi Computer Architecture Lab at 2006 Stephen Somogyi The Memory Wall Memory latency 100 s clock cycles;

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Caches Martha A. Kim Columbia University Fall 215 Illustrations Copyright 27 Elsevier 1 / 23 Computer Systems Performance depends on which is slowest: the processor or

More information

CIT 668: System Architecture. Computer Systems Architecture

CIT 668: System Architecture. Computer Systems Architecture CIT 668: System Architecture Computer Systems Architecture 1. System Components Topics 2. Bandwidth and Latency 3. Processor 4. Memory 5. Storage 6. Network 7. Operating System 8. Performance Implications

More information