Flexible Cache Error Protection using an ECC FIFO

Size: px
Start display at page:

Download "Flexible Cache Error Protection using an ECC FIFO"

Transcription

1 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1

2 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error protection T1EC: light-weight on-chip error code T2EC: strong error correcting code Off-load T2EC overhead to FIFO in DRAM Why FIFO? It s simple to manage 15-25% LLC area reduction 10-17% LLC power saving Just 1% performance penalty 2

3 BACKGROUND 3

4 Error Correcting Codes 1-bit parity for error detection SEC-DED (Hamming) codes Single-bit Error Correction and Double-bit Error Detection 8bit ECC for 64bit data DEC-TED Double-bit Error Correction and Triple-bit Error Detection 15bit ECC for 64bit data 4

5 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 5

6 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 6

7 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors Baseline cache error protection 8 way interleaved SEC-DED Can correct up to 8-bit burst errors 8B ECC per 64B cache line 7

8 Uniform Error Protection Tag Data ECC 8 ways 2 11 sets 64B 8B ECC increases area AND leakage/dynamic power 8

9 RELATED WORK 9

10 Soft Errors: Observations Still, Soft Error Rate (SER) is low Every cache access tries to detect errors, but finds no error in most cases Error Detection Common case Need a low cost, low overhead error detection mechanism Error Correction Uncommon case Correction can be slow But, still need to maintain error correction info somewhere Memory hierarchy provides redundancy inherently for clean data Only dirty lines need error correcting codes 10

11 PERC [Sorin 06] / Energy Efficient [Li 04] Tag Data EDC ECC 8 ways 2 11 sets 64B 1B 7B Read only Data and EDC saves dynamic power Power gate ECC of clean lines saves static power 11

12 Area Efficient [Kim 06] 4 ways Tag Data EDC ECC 2 12 sets 2 12 sets 64B 1B 8B Allow only 1 dirty line per set 12

13 MAXn scheme Tag Data EDC 8 ways 2 11 sets n ways (n<8) 2 11 sets Tag ECC 64B 1B 8B Allow only n dirty lines per set May cause detrimental cleaning traffic 13

14 Two-tiered error protection Tier-1 Error Code (T1EC) On-chip light-weight error code Uniform error protection Tier-2 Error Code (T2EC) Strong error codes only for dirty lines Corrects Detected but Uncorrected Errors (DUE) of T1EC 14

15 Memory Mapped ECC [Yoon 09] Tag Data T1EC T2EC 8 ways 2 11 sets 64B 1B 8B On-Chip DRAM T2EC is memory mapped AND cached 15

16 ECC FIFO 16

17 ECC FIFO Use Two-tiered error protection T2EC is off-loaded to FIFO in DRAM LLC caching behavior is unaffected FIFO Simple to manage Coalesce buffer To better utilize DRAM channel 17

18 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 18

19 Rest of cache hierarchy Dirty line eviction to LLC Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 19

20 Rest of cache hierarchy Last Level Cache Data T1EC Encode T2EC and TAG T2EC TAG encoder T2EC Coalesce Buffer DRAM ECC FIFO 20

21 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG T2EC Push to Coalesce Buffer Coalesce Buffer DRAM ECC FIFO 21

22 Rest of cache hierarchy Next dirty line comes Data T1EC Last Level Cache DRAM T2EC encoder TAG TAG T2EC T2EC Tag/T2EC buffered in Coalesce Buffer Coalesce Buffer ECC FIFO 22

23 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC T2EC Coalesce Buffer is now FULL Coalesce Buffer DRAM ECC FIFO 23

24 Rest of cache hierarchy Last Level Cache DRAM Data T1EC Write the coalesced T2EC into ECC FIFO T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC Coalesce T2EC Buffer T2EC write size matches to DRAM burst size ECC FIFO T2EC 24

25 Rest of cache hierarchy Last Level Cache DRAM Data T1EC T2EC encoder Coalesce Buffer Coalesce Buffer becomes empty ECC FIFO T2EC 25

26 More on ECC FIFO Write-back data, but write-through ECC Potential performance degradation Increased DRAM traffic due to T2EC writes Error correction Search the matching TAG in coalesce buffer AND ECC FIFO May take a long time Not a problem since SER is low Sometimes, may not find the matching TAG ECC FIFO is finite Potentially unprotected dirty lines discussed later 26

27 EVALUATION 27

28 Performance Evaluation GEMS + DRAMsim An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 667MHz 533GB/s Eager write-back Clean dirty lines periodically Workloads 16 data intensive applications SPEC CPU 2006, PARSEC, and SPLASH2 28

29 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC

30 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC DRAM 267 GB/s 63% 68% 23% SPLASH2 PARSEC SPEC

31 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

32 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

33 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

34 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 1% 09 SPLASH2 PARSEC SPEC

35 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME % 36% Max1 Max2 Max4 MME ECC FIFO 11 11% 11% 8% 1 4% 1% 09 SPLASH2 PARSEC SPEC

36 Execution Time [cycle] 260E E+09 Comparison to MME OCEAN 258x258 Baseline MME ECC FIFO 220E E E % 160E E KB 512KB 1MB 2MB LLC size 36

37 LLC Area and Power CACTI 5 model of a 45nm process Parity T1EC + SEC-DED T2EC 15% area and 9% power saving Compared to SEC-DED baseline SEC-DED T1EC + DEC-TED T2EC 15% area and 10% power saving Compared to DEC-TED baseline More error code examples in the paper 37

38 FIFO OVERWRITE 38

39 ECC FIFO is Finite ECC FIFO is implemented as a circular buffer Each T2EC push overwrites the oldest entry in the FIFO If the associated data is still valid even after T2EC is overwritten The cache line is no longer protected Errors on this line cannot be corrected using T2EC 39

40 t k LLC ECC FIFO A small LLC and an ECC FIFO with k entries (no coalesce buffer for the simplicity) 40

41 t 0 dirty line eviction to LLC TAG/T2EC to ECC FIFO 0 0 k LLC ECC FIFO 41

42 t 0 1 dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 42

43 t dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 43

44 t k-3 k-2 k k-2 3 k-1 k-3 k-1 k-2 k k LLC ECC FIFO 44

45 t k-3 k-2 k-1 k dirty line eviction to LLC TAG/T2EC to ECC FIFO k k-2 3 k-1 k-3 k k-1 k k LLC ECC FIFO T2EC 0 is evicted from the FIFO dirty line 0 is no longer protected 45 0

46 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time Window of vulnerability T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected 46

47 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected T2EC valid-time Reuse or evict Window of vulnerability t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time < T2EC overwrite-time No window of vulnerability 47

48 How to Avoid (or Mitigate) It? T2EC overwrite-time Make the FIFO bigger The bigger the FIFO, the longer T2EC lifetime Not sure how big it has to be T2EC valid-time Inherent memory access pattern A dirty line is re-used or evicted Make the T2EC obsolete before it gets overwritten Eager write-back Limit the lifetime of dirty lines 48

49 Eager Write-Back Scan LLC lines periodically Eagerly write-back dirty lines older than the period It s cleaning, so it doesn t affect hit-rate 1M cycle period Limit the T2EC valid-time (max: 2xEWB period) More T2ECs become obsolete before it gets overwritten Eager write-back improves performance 6-10% in general, 26% in libquantum 49

50 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

51 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

52 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

53 Required FIFO size 100k entry FIFO, the worst case 1MB of storage in DRAM More in the paper A simple analytic model on Probability of unprotected Required FIFO size with regards to eager write-back period how to guarantee zero probability of unprotected 53

54 Conclusion ECC FIFO is an efficient low cost error protection mechanism A simple FIFO off-loads the overhead of strong T2EC 15-25% LLC area reduction 10-17% LLC power saving Does not affect LLC caching behavior Choosing error code for T1EC/T2EC is flexible See more error code examples in the paper Penalties Average 1% performance degradation negligible Increased error correction latency but SER is low Potentially unprotected cache lines Can make this quite low or even guarantee not to occur 54

55 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 55

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC

More information

Virtualized ECC: Flexible Reliability in Memory Systems

Virtualized ECC: Flexible Reliability in Memory Systems Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing

More information

THE DYNAMIC GRANULARITY MEMORY SYSTEM

THE DYNAMIC GRANULARITY MEMORY SYSTEM THE DYNAMIC GRANULARITY MEMORY SYSTEM Doe Hyun Yoon IIL, HP Labs Michael Sullivan Min Kyu Jeong Mattan Erez ECE, UT Austin MEMORY ACCESS GRANULARITY The size of block for accessing main memory Often, equal

More information

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon Electrical and Computer Engineering Department The University of Texas at Austin doehyun.yoon@gmail.com Mattan Erez Electrical and Computer Engineering

More information

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Mattan Erez The University of Teas at Austin 2010 Workshop on Architecting Memory Systems March 1, 2010 iggest Problems

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM 1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones,

More information

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation

More information

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard

More information

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce

More information

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012 Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso

More information

A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction

A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction Hamed Farbeh, Student Member, IEEE, Nooshin Sadat Mirzadeh, Nahid Farhady Ghalaty, Seyed-Ghassem Miremadi, Senior Member, IEEE, Mahdi

More information

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University

More information

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:

More information

MEMORY reliability is a major challenge in the design of

MEMORY reliability is a major challenge in the design of 3766 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 12, DECEMBER 2016 Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems Hsing-Min Chen, Supreet Jeloka,

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

COP: To Compress and Protect Main Memory

COP: To Compress and Protect Main Memory COP: To Compress and Protect Main Memory David J. Palframan Nam Sung Kim Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin Madison palframan@wisc.edu, nskim3@wisc.edu,

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1 PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Near-Threshold Computing: How Close Should We Get?

Near-Threshold Computing: How Close Should We Get? Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks Snatch: Opportunistically Reassigning Power Allocation between and in 3D Stacks Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim,

More information

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Efficient RAS support for 3D Die-Stacked DRAM

Efficient RAS support for 3D Die-Stacked DRAM Efficient RAS support for 3D Die-Stacked DRAM Hyeran Jeon University of Southern California hyeranje@usc.edu Gabriel H. Loh AMD Research gabriel.loh@amd.com Murali Annavaram University of Southern California

More information

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University

More information

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,

More information

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 Alberto Ros Universidad de Murcia October 17th, 2017 1 A. Ros, T. E. Carlson, M. Alipour, and S. Kaxiras, "Non-Speculative Load-Load Reordering in TSO". ISCA,

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

Tag Tables. Sean Franey & Mikko Lipasti University of Wisconsin - Madison

Tag Tables. Sean Franey & Mikko Lipasti University of Wisconsin - Madison Tag Tables Sean Franey & Mikko Lipasti University of Wisconsin - Madison sfraney@wisc.edu, mikko@engr.wisc.edu Abstract Tag Tables enable storage of tags for very large setassociative caches - such as

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Answers to comments from Reviewer 1

Answers to comments from Reviewer 1 Answers to comments from Reviewer 1 Question A-1: Though I have one correction to authors response to my Question A-1,... parity protection needs a correction mechanism (e.g., checkpoints and roll-backward

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches

Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches David Kadjo, Hyungjun Kim, Paul Gratz, Jiang Hu and Raid Ayoub Department of Electrical and Computer Engineering Texas A& M University,

More information

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

PageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge: A Near-Memory Content- Aware Page-Merging Architecture PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server

More information

3D Memory Architecture. Kyushu University

3D Memory Architecture. Kyushu University 3D Memory Architecture Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions 2 Outline Why 3D? Will 3D

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Main Memory (Fig. 7.13) Main Memory

Main Memory (Fig. 7.13) Main Memory Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization

More information

CS/CoE 1541 Exam 2 (Spring 2019).

CS/CoE 1541 Exam 2 (Spring 2019). CS/CoE 1541 Exam 2 (Spring 2019) Name: Question 1 (5+5+5=15 points): Show the content of each of the caches shown below after the two memory references 35, 44 Use the notation [tag, M(address),] to describe

More information

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering

More information

Improving Cache Performance using Victim Tag Stores

Improving Cache Performance using Victim Tag Stores Improving Cache Performance using Victim Tag Stores SAFARI Technical Report No. 2011-009 Vivek Seshadri, Onur Mutlu, Todd Mowry, Michael A Kozuch {vseshadr,tcm}@cs.cmu.edu, onur@cmu.edu, michael.a.kozuch@intel.com

More information

Using Silent Writes in Low-Power Traffic-Aware ECC

Using Silent Writes in Low-Power Traffic-Aware ECC Using Silent Writes in Low-Power Traffic-Aware ECC Mostafa Kishani 1, Amirali Baniasadi, Hossein Pedram 1 1 Department of Computer Engineering and Information Technology, Amirkabir University of Technology,

More information

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Reducing Writebacks Through In-Cache Displacement

Reducing Writebacks Through In-Cache Displacement 1 Reducing Writebacks Through In-Cache Displacement MOHAMMAD BAKHSHALIPOUR, Sharif University of Technology, Iran and Institute for Research in Fundamental Sciences (IPM), Iran AYDIN FARAJI, Sharif University

More information

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti Computer Sciences Department University of Wisconsin-Madison somayeh@cs.wisc.edu David

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu

More information

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"

More information

Spare Block Cache Architecture to Enable Low-Voltage Operation

Spare Block Cache Architecture to Enable Low-Voltage Operation Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 1-1-2011 Spare Block Cache Architecture to Enable Low-Voltage Operation Nafiul Alam Siddique Portland State University

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 02 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 2.1 The levels in a typical memory hierarchy in a server computer shown on top (a) and in

More information

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*

More information

ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems

ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems Xun Jian University of Illinois at Urbana-Champaign Email: xunjian1@illinois.edu Rakesh Kumar University of

More information

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami 3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

arxiv: v1 [cs.ar] 10 Apr 2017

arxiv: v1 [cs.ar] 10 Apr 2017 Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, Srinivas Devadas CSAIL, MIT, Intel Labs, ETH Zurich {yxy, devadas}@mit.edu,

More information

POPS: Coherence Protocol Optimization for both Private and Shared Data

POPS: Coherence Protocol Optimization for both Private and Shared Data POPS: Coherence Protocol Optimization for both Private and Shared Data Hemayet Hossain, Sandhya Dwarkadas, and Michael C. Huang University of Rochester Rochester, NY 14627, USA {hossain@cs, sandhya@cs,

More information

Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut

Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut CSE 5095 & ECE 4451 & ECE 5451 Spring 2017 Lecture 5a Caching Review Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen Department of Electrical & Computer Engineering University of Connecticut

More information

SCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS

SCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS SCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS 1 JUNG KYU PARK, 2* JAEHO KIM, 3 HEUNG SEOK JEON 1 Department of Digital Media Design and Applications, Seoul Women s University,

More information

Footprint-based Locality Analysis

Footprint-based Locality Analysis Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

RAMP-White / FAST-MP

RAMP-White / FAST-MP RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White

More information

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, Erich F. Haratsch February 6, 2017

More information

Improving Writeback Efficiency with Decoupled Last-Write Prediction

Improving Writeback Efficiency with Decoupled Last-Write Prediction Improving Writeback Efficiency with Decoupled Last-Write Prediction Zhe Wang Samira M. Khan Daniel A. Jiménez The University of Texas at San Antonio {zhew,skhan,dj}@cs.utsa.edu Abstract In modern DDRx

More information

CS152 Computer Architecture and Engineering

CS152 Computer Architecture and Engineering CS152 Computer Architecture and Engineering Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/fa16 The problem sets are intended to help

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY Shenchen Ruan, Haixia Wang and Dongsheng Wang Tsinghua National Laboratory for Information Science and Technology, Tsinghua

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. Motivation: Why Care About I/O? CPU Performance:

More information

ECE 30 Introduction to Computer Engineering

ECE 30 Introduction to Computer Engineering ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped

More information

WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System

WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System ZHE WANG, Texas A&M University SHUCHANG SHAN, Chinese Institute of Computing Technology TING CAO, Australian National University

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Improving Cache Management Policies Using Dynamic Reuse Distances

Improving Cache Management Policies Using Dynamic Reuse Distances Improving Cache Management Policies Using Dynamic Reuse Distances Nam Duong, Dali Zhao, Taesu Kim, Rosario Cammarota, Mateo Valero and Alexander V. Veidenbaum University of California, Irvine Universitat

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Appears in the 43 rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43) Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Daehoon Kim, Hwanju Kim, and Jaehyuk Huh Computer

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information