Flexible Cache Error Protection using an ECC FIFO
|
|
- Cecil Sims
- 5 years ago
- Views:
Transcription
1 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1
2 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error protection T1EC: light-weight on-chip error code T2EC: strong error correcting code Off-load T2EC overhead to FIFO in DRAM Why FIFO? It s simple to manage 15-25% LLC area reduction 10-17% LLC power saving Just 1% performance penalty 2
3 BACKGROUND 3
4 Error Correcting Codes 1-bit parity for error detection SEC-DED (Hamming) codes Single-bit Error Correction and Double-bit Error Detection 8bit ECC for 64bit data DEC-TED Double-bit Error Correction and Triple-bit Error Detection 15bit ECC for 64bit data 4
5 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 5
6 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 6
7 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors Baseline cache error protection 8 way interleaved SEC-DED Can correct up to 8-bit burst errors 8B ECC per 64B cache line 7
8 Uniform Error Protection Tag Data ECC 8 ways 2 11 sets 64B 8B ECC increases area AND leakage/dynamic power 8
9 RELATED WORK 9
10 Soft Errors: Observations Still, Soft Error Rate (SER) is low Every cache access tries to detect errors, but finds no error in most cases Error Detection Common case Need a low cost, low overhead error detection mechanism Error Correction Uncommon case Correction can be slow But, still need to maintain error correction info somewhere Memory hierarchy provides redundancy inherently for clean data Only dirty lines need error correcting codes 10
11 PERC [Sorin 06] / Energy Efficient [Li 04] Tag Data EDC ECC 8 ways 2 11 sets 64B 1B 7B Read only Data and EDC saves dynamic power Power gate ECC of clean lines saves static power 11
12 Area Efficient [Kim 06] 4 ways Tag Data EDC ECC 2 12 sets 2 12 sets 64B 1B 8B Allow only 1 dirty line per set 12
13 MAXn scheme Tag Data EDC 8 ways 2 11 sets n ways (n<8) 2 11 sets Tag ECC 64B 1B 8B Allow only n dirty lines per set May cause detrimental cleaning traffic 13
14 Two-tiered error protection Tier-1 Error Code (T1EC) On-chip light-weight error code Uniform error protection Tier-2 Error Code (T2EC) Strong error codes only for dirty lines Corrects Detected but Uncorrected Errors (DUE) of T1EC 14
15 Memory Mapped ECC [Yoon 09] Tag Data T1EC T2EC 8 ways 2 11 sets 64B 1B 8B On-Chip DRAM T2EC is memory mapped AND cached 15
16 ECC FIFO 16
17 ECC FIFO Use Two-tiered error protection T2EC is off-loaded to FIFO in DRAM LLC caching behavior is unaffected FIFO Simple to manage Coalesce buffer To better utilize DRAM channel 17
18 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 18
19 Rest of cache hierarchy Dirty line eviction to LLC Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 19
20 Rest of cache hierarchy Last Level Cache Data T1EC Encode T2EC and TAG T2EC TAG encoder T2EC Coalesce Buffer DRAM ECC FIFO 20
21 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG T2EC Push to Coalesce Buffer Coalesce Buffer DRAM ECC FIFO 21
22 Rest of cache hierarchy Next dirty line comes Data T1EC Last Level Cache DRAM T2EC encoder TAG TAG T2EC T2EC Tag/T2EC buffered in Coalesce Buffer Coalesce Buffer ECC FIFO 22
23 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC T2EC Coalesce Buffer is now FULL Coalesce Buffer DRAM ECC FIFO 23
24 Rest of cache hierarchy Last Level Cache DRAM Data T1EC Write the coalesced T2EC into ECC FIFO T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC Coalesce T2EC Buffer T2EC write size matches to DRAM burst size ECC FIFO T2EC 24
25 Rest of cache hierarchy Last Level Cache DRAM Data T1EC T2EC encoder Coalesce Buffer Coalesce Buffer becomes empty ECC FIFO T2EC 25
26 More on ECC FIFO Write-back data, but write-through ECC Potential performance degradation Increased DRAM traffic due to T2EC writes Error correction Search the matching TAG in coalesce buffer AND ECC FIFO May take a long time Not a problem since SER is low Sometimes, may not find the matching TAG ECC FIFO is finite Potentially unprotected dirty lines discussed later 26
27 EVALUATION 27
28 Performance Evaluation GEMS + DRAMsim An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 667MHz 533GB/s Eager write-back Clean dirty lines periodically Workloads 16 data intensive applications SPEC CPU 2006, PARSEC, and SPLASH2 28
29 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC
30 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC DRAM 267 GB/s 63% 68% 23% SPLASH2 PARSEC SPEC
31 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC
32 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC
33 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC
34 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 1% 09 SPLASH2 PARSEC SPEC
35 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME % 36% Max1 Max2 Max4 MME ECC FIFO 11 11% 11% 8% 1 4% 1% 09 SPLASH2 PARSEC SPEC
36 Execution Time [cycle] 260E E+09 Comparison to MME OCEAN 258x258 Baseline MME ECC FIFO 220E E E % 160E E KB 512KB 1MB 2MB LLC size 36
37 LLC Area and Power CACTI 5 model of a 45nm process Parity T1EC + SEC-DED T2EC 15% area and 9% power saving Compared to SEC-DED baseline SEC-DED T1EC + DEC-TED T2EC 15% area and 10% power saving Compared to DEC-TED baseline More error code examples in the paper 37
38 FIFO OVERWRITE 38
39 ECC FIFO is Finite ECC FIFO is implemented as a circular buffer Each T2EC push overwrites the oldest entry in the FIFO If the associated data is still valid even after T2EC is overwritten The cache line is no longer protected Errors on this line cannot be corrected using T2EC 39
40 t k LLC ECC FIFO A small LLC and an ECC FIFO with k entries (no coalesce buffer for the simplicity) 40
41 t 0 dirty line eviction to LLC TAG/T2EC to ECC FIFO 0 0 k LLC ECC FIFO 41
42 t 0 1 dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 42
43 t dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 43
44 t k-3 k-2 k k-2 3 k-1 k-3 k-1 k-2 k k LLC ECC FIFO 44
45 t k-3 k-2 k-1 k dirty line eviction to LLC TAG/T2EC to ECC FIFO k k-2 3 k-1 k-3 k k-1 k k LLC ECC FIFO T2EC 0 is evicted from the FIFO dirty line 0 is no longer protected 45 0
46 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time Window of vulnerability T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected 46
47 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected T2EC valid-time Reuse or evict Window of vulnerability t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time < T2EC overwrite-time No window of vulnerability 47
48 How to Avoid (or Mitigate) It? T2EC overwrite-time Make the FIFO bigger The bigger the FIFO, the longer T2EC lifetime Not sure how big it has to be T2EC valid-time Inherent memory access pattern A dirty line is re-used or evicted Make the T2EC obsolete before it gets overwritten Eager write-back Limit the lifetime of dirty lines 48
49 Eager Write-Back Scan LLC lines periodically Eagerly write-back dirty lines older than the period It s cleaning, so it doesn t affect hit-rate 1M cycle period Limit the T2EC valid-time (max: 2xEWB period) More T2ECs become obsolete before it gets overwritten Eager write-back improves performance 6-10% in general, 26% in libquantum 49
50 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]
51 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]
52 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]
53 Required FIFO size 100k entry FIFO, the worst case 1MB of storage in DRAM More in the paper A simple analytic model on Probability of unprotected Required FIFO size with regards to eager write-back period how to guarantee zero probability of unprotected 53
54 Conclusion ECC FIFO is an efficient low cost error protection mechanism A simple FIFO off-loads the overhead of strong T2EC 15-25% LLC area reduction 10-17% LLC power saving Does not affect LLC caching behavior Choosing error code for T1EC/T2EC is flexible See more error code examples in the paper Penalties Average 1% performance degradation negligible Increased error correction latency but SER is low Potentially unprotected cache lines Can make this quite low or even guarantee not to occur 54
55 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 55
Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez
Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection
More informationVirtualized and Flexible ECC for Main Memory
Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC
More informationVirtualized ECC: Flexible Reliability in Memory Systems
Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing
More informationTHE DYNAMIC GRANULARITY MEMORY SYSTEM
THE DYNAMIC GRANULARITY MEMORY SYSTEM Doe Hyun Yoon IIL, HP Labs Michael Sullivan Min Kyu Jeong Mattan Erez ECE, UT Austin MEMORY ACCESS GRANULARITY The size of block for accessing main memory Often, equal
More informationVirtualized and Flexible ECC for Main Memory
Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon Electrical and Computer Engineering Department The University of Texas at Austin doehyun.yoon@gmail.com Mattan Erez Electrical and Computer Engineering
More informationLet Software Decide: Matching Application Diversity with One- Size-Fits-All Memory
Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Mattan Erez The University of Teas at Austin 2010 Workshop on Architecting Memory Systems March 1, 2010 iggest Problems
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationLeveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM
1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones,
More informationDecoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation
More informationExploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies
Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard
More informationChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality
ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationA Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction
A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction Hamed Farbeh, Student Member, IEEE, Nooshin Sadat Mirzadeh, Nahid Farhady Ghalaty, Seyed-Ghassem Miremadi, Senior Member, IEEE, Mahdi
More informationWALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems
: A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University
More informationHybrid Cache Architecture (HCA) with Disparate Memory Technologies
Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:
More informationMEMORY reliability is a major challenge in the design of
3766 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 12, DECEMBER 2016 Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems Hsing-Min Chen, Supreet Jeloka,
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationCOP: To Compress and Protect Main Memory
COP: To Compress and Protect Main Memory David J. Palframan Nam Sung Kim Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin Madison palframan@wisc.edu, nskim3@wisc.edu,
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationEmerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationLecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors
Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1 PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationNear-Threshold Computing: How Close Should We Get?
Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationSnatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks
Snatch: Opportunistically Reassigning Power Allocation between and in 3D Stacks Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim,
More informationData Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger
Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationEfficient RAS support for 3D Die-Stacked DRAM
Efficient RAS support for 3D Die-Stacked DRAM Hyeran Jeon University of Southern California hyeranje@usc.edu Gabriel H. Loh AMD Research gabriel.loh@amd.com Murali Annavaram University of Southern California
More informationDASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture
DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationPageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili
PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationRethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization
Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,
More informationNON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1
NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 Alberto Ros Universidad de Murcia October 17th, 2017 1 A. Ros, T. E. Carlson, M. Alipour, and S. Kaxiras, "Non-Speculative Load-Load Reordering in TSO". ISCA,
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationTag Tables. Sean Franey & Mikko Lipasti University of Wisconsin - Madison
Tag Tables Sean Franey & Mikko Lipasti University of Wisconsin - Madison sfraney@wisc.edu, mikko@engr.wisc.edu Abstract Tag Tables enable storage of tags for very large setassociative caches - such as
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip
More informationAnswers to comments from Reviewer 1
Answers to comments from Reviewer 1 Question A-1: Though I have one correction to authors response to my Question A-1,... parity protection needs a correction mechanism (e.g., checkpoints and roll-backward
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard
More informationPower Gating with Block Migration in Chip-Multiprocessor Last-Level Caches
Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches David Kadjo, Hyungjun Kim, Paul Gratz, Jiang Hu and Raid Ayoub Department of Electrical and Computer Engineering Texas A& M University,
More informationCache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two
Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationPageForge: A Near-Memory Content- Aware Page-Merging Architecture
PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server
More information3D Memory Architecture. Kyushu University
3D Memory Architecture Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions 2 Outline Why 3D? Will 3D
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationMain Memory (Fig. 7.13) Main Memory
Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization
More informationCS/CoE 1541 Exam 2 (Spring 2019).
CS/CoE 1541 Exam 2 (Spring 2019) Name: Question 1 (5+5+5=15 points): Show the content of each of the caches shown below after the two memory references 35, 44 Use the notation [tag, M(address),] to describe
More informationAB-Aware: Application Behavior Aware Management of Shared Last Level Caches
AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering
More informationImproving Cache Performance using Victim Tag Stores
Improving Cache Performance using Victim Tag Stores SAFARI Technical Report No. 2011-009 Vivek Seshadri, Onur Mutlu, Todd Mowry, Michael A Kozuch {vseshadr,tcm}@cs.cmu.edu, onur@cmu.edu, michael.a.kozuch@intel.com
More informationUsing Silent Writes in Low-Power Traffic-Aware ECC
Using Silent Writes in Low-Power Traffic-Aware ECC Mostafa Kishani 1, Amirali Baniasadi, Hossein Pedram 1 1 Department of Computer Engineering and Information Technology, Amirkabir University of Technology,
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationReducing Writebacks Through In-Cache Displacement
1 Reducing Writebacks Through In-Cache Displacement MOHAMMAD BAKHSHALIPOUR, Sharif University of Technology, Iran and Institute for Research in Fundamental Sciences (IPM), Iran AYDIN FARAJI, Sharif University
More informationDecoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti Computer Sciences Department University of Wisconsin-Madison somayeh@cs.wisc.edu David
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationMinimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era
Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu
More informationThe Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!
The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"
More informationSpare Block Cache Architecture to Enable Low-Voltage Operation
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 1-1-2011 Spare Block Cache Architecture to Enable Low-Voltage Operation Nafiul Alam Siddique Portland State University
More informationArea-Efficient Error Protection for Caches
Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationIntroduction to cache memories
Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationChapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 02 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 2.1 The levels in a typical memory hierarchy in a server computer shown on top (a) and in
More informationThe Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory
The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*
More informationECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems
ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems Xun Jian University of Illinois at Urbana-Champaign Email: xunjian1@illinois.edu Rakesh Kumar University of
More informationfor High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami
3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationarxiv: v1 [cs.ar] 10 Apr 2017
Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, Srinivas Devadas CSAIL, MIT, Intel Labs, ETH Zurich {yxy, devadas}@mit.edu,
More informationPOPS: Coherence Protocol Optimization for both Private and Shared Data
POPS: Coherence Protocol Optimization for both Private and Shared Data Hemayet Hossain, Sandhya Dwarkadas, and Michael C. Huang University of Rochester Rochester, NY 14627, USA {hossain@cs, sandhya@cs,
More informationMarten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut
CSE 5095 & ECE 4451 & ECE 5451 Spring 2017 Lecture 5a Caching Review Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen Department of Electrical & Computer Engineering University of Connecticut
More informationSCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS
SCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS 1 JUNG KYU PARK, 2* JAEHO KIM, 3 HEUNG SEOK JEON 1 Department of Digital Media Design and Applications, Seoul Women s University,
More informationFootprint-based Locality Analysis
Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationRAMP-White / FAST-MP
RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White
More informationVulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, Erich F. Haratsch February 6, 2017
More informationImproving Writeback Efficiency with Decoupled Last-Write Prediction
Improving Writeback Efficiency with Decoupled Last-Write Prediction Zhe Wang Samira M. Khan Daniel A. Jiménez The University of Texas at San Antonio {zhew,skhan,dj}@cs.utsa.edu Abstract In modern DDRx
More informationCS152 Computer Architecture and Engineering
CS152 Computer Architecture and Engineering Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/fa16 The problem sets are intended to help
More informationLimiting the Number of Dirty Cache Lines
Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationKey Point. What are Cache lines
Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible
More informationPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationMAC: A NOVEL SYSTEMATICALLY MULTILEVEL
MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY Shenchen Ruan, Haixia Wang and Dongsheng Wang Tsinghua National Laboratory for Information Science and Technology, Tsinghua
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. Motivation: Why Care About I/O? CPU Performance:
More informationECE 30 Introduction to Computer Engineering
ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped
More informationWADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System
WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System ZHE WANG, Texas A&M University SHUCHANG SHAN, Chinese Institute of Computing Technology TING CAO, Australian National University
More informationECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU
More informationImproving Cache Management Policies Using Dynamic Reuse Distances
Improving Cache Management Policies Using Dynamic Reuse Distances Nam Duong, Dali Zhao, Taesu Kim, Rosario Cammarota, Mateo Valero and Alexander V. Veidenbaum University of California, Irvine Universitat
More informationCaches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST
CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's
More informationVirtual Snooping: Filtering Snoops in Virtualized Multi-cores
Appears in the 43 rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43) Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Daehoon Kim, Hwanju Kim, and Jaehyuk Huh Computer
More informationEE 457 Unit 7b. Main Memory Organization
1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)
More information