Hybrid Cache Architecture (HCA) with Disparate Memory Technologies
|
|
- Lorraine Austin
- 6 years ago
- Views:
Transcription
1 Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement: Elmootazbellah (Mootaz) Elnozahy, Hung Le, Balaram Sinharoy, William (Bill) J. Starke, and Chung-Lung Kevin Shum 1
2 Outline Motivation and Introduction Methodology Level based Hybrid Cache Architecture Region based Hybrid Cache Architecture 3D Hybrid Cache Stacking Conclusion 2
3 Introduction Traditional SRAM-based Cache Architecture - Limited size with CMP: cache-core balance - Leakage power - More cache levels: design overhead, coherence - Non-Uniform Cache Architecture (wire delay) Improve cache power-performance with Emerging Memory Technologies, under the same chip area/footprint - Embedded DRAM - Magnetic RAM - Phase Change RAM - Three-dimensional space 3
4 Different Memory Technologies SRAM 6T structure DRAM 1T1C structure MTJ (Magnetic Tunnel Junction) Magnetic RAM 1T1J structure Phase Change RAM 1T1J structure 4
5 Comparisons SRAM edram MRAM PRAM Density (ratio) Low (1) High (4) High (4) High(16) Dynamic Power Low Medium Low for read; High for write Reduce Cache miss rate Increase hit latency Medium for read; High for write Leakage Power High Medium Low Low Speed Very Fast Fast Low leakage power High dynamic power Fast for read; Slow for write Slow for read; Very slow for write Non-volatility No No Yes Yes Scalability Yes Yes Yes Yes Endurance >
6 Motivation 1.4 1M-SRAM 4M-DRAM 4M-MRAM 16M-PRAM No 1 single memory technology has astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean Normalized IPC the best power-performance Static Dynamic Hybrid Cache may outperform its counterpart of single technology astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean Normalized Power 6
7 Outline Introduction and Motivation Methodology Level based Hybrid Cache Architecture Region based Hybrid Cache Architecture 3D Hybrid Cache Stacking Conclusions 7
8 Evaluation Methodology Fast Slow (edram/ MRAM) (PRAM) Baseline: a 2D 8-core CMP (3-level SRAM Caches) Fast A LHCA Fast Fast Middle (edram/ MRAM) Slow (PRAM) (edram/ MRAM/ PRAM) (edram/ MRAM/ PRAM) Flattening and with hybrid cache 3DHCA 3DHCA 3DHCA Slow E(eDRAM/ DMiddle C MRAM) (PRAM) (edram/ MRAM) Slow (PRAM) L1s Fast Fast B Slow (edram/ RHCA MRAM/ Slow PRAM) (edram/ MRAM/ 3D design PRAM) scenario w/ L1s 3D Layer 1 (edram/ MRAM) (edram/ MRAM) L4 (PRAM) (PRAM) 3D Layer 2 Flattening and L4 with hybrid cache Flattening, and L4 with hybrid cache A cache design scenario with 3D chip integration 8
9 Evaluation Setup Cache Density Latency (cycles) Dyn. eng (nj) Static power (W) SRAM(1MB) edram(4mb) MRAM(4MB) 4 Read:20, write:60 PRAM(16MB) 16 Read:40 write:200 Read:0.4 write:2.3 Read:0.8 write: Processor L1 //L4 Memory 8-way issue out-of-order, 8-core, 4GHz 32KB DL1, 32KB IL1, 128B, 4-way, 1 R/W port See corresponding design cases 400 cycles latency Benchmarks: SpecInt06, Specjbb, NAS, Bioperf, Parsec Simulator: SystemSim full system simulator 9
10 Outline Introduction and Motivation Methodology Level based Hybrid Cache Architecture Intra-Level Hybrid Cache Architecture 3D Hybrid Cache Stacking Conclusions 10
11 LHCA: Performance and Power SRAM edram-lhca MRAM-LHCA PRAM-LHCA 11 astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean A (edram/ MRAM/ PRAM) Simple LHCA can provide LHCA Performance and Static Power Dynamic benefits Processor 8-way issure out-of-order, 8-core, 4GHz astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean L1 32KB DL1, 32KB IL1, 128B, 4-way, 1 R/W port 256KB, 128B, 4-way, 1 R/W port Parameters from the methodology 0.59 Normalized Power Normalized IPC
12 Outline Introduction and Motivation Methodology Level based Hybrid Cache Architecture Region based Hybrid Cache Architecture 3D Hybrid Cache Stacking Conclusions 12
13 RHCA Features Fast and slow regions in one cache level Mutually Exclusive Regions Parallel search Unified LRU Intra-cache data movement policy: Move frequently used data to the fast region RHCA Drowsy RHCA: Keep slow region in drowsy mode, wake up latency (details in paper) 13
14 RHCA Policy Cache line migration policy Access cache line Allocate the new line and replace an LRU line if needed, across all regions Reset 2-bit saturation counter; set sticky bit N Hit? No action Y N Provide data; increment saturation counter MSB set (hot) in slow region? Y Select an LRU cache line of the same set in the fast region Clear sticky bit Y Sticky bit set? N Cancel swap Swap; reset saturation counter; set sticky bit 14
15 RHCA Hardware Support Tag & Status Array Decoder Data Array Fast Region Fast Region Sticky bit and Saturating counter(s) valid tag Index Tag Offset Offset Slow Region Saturating counter(s) =? valid tag Index Slow Region swap buffer Tag =? Hardware support Saturating counter in slow and sticky bit in fast, swap buffer Minimum hardware support: 1-bit sticky bit in fast region 15
16 RHCA Configuration RHCA (fast+slow) Fast region total size (latency) SRAM+eDRAM 256KB (6 cycles) 4MB (24 cycles) SRAM+MRAM w/ 256KB L1s (6 cycles) 4MB (r:20, w:60) Fast SRAM+PRAM 256KB (6 cycles) B 16MB (r:40, w:200) RHCA Slow region: (edram/ 256KB/bank, 1 r/w port, block size 128B, associativity:16, MRAM/ 16, 64 RHCA is 256KB less size than corresponding LHCA Slow PRAM) Avoid odd-sized cache DNUCA policy: more fine grained, move a line to a closer bank on each hit, bank-based, same size 16
17 RHCA Result astar bzip2 gcc gobmk h264 hmmer-sp h264 libquantum mcf omnetpp perl sjeng blast bt blast cg bt clustalw cg hmmer clustalw lu bt hmmer mg cg lu sp clustalw mg ua hmmer sp specjbb lu ua dedup mg specjbb fluidanimate sp dedup freqmine ua fluidanimate streamcluster specjbb freqmine Geomean dedup streamcluster fluidanimate Geomean freqmine streamcluster Geomean SRAM-eDRAM RHCA RHCA achieves better performance level SRAM Best LHCA DNUCA RHCA than SRAM 3-level SRAM baseline Best LHCA DNUCA and RHCA LHCA level SRAM Best LHCA DNUCA RHCA SRAM-MRAM RHCA SRAM-PRAM RHCA astar bzip2 gcc gobmk astar bzip2 gcc gobmk h264 hmmer-sp hmmer-sp libquantum mcf omnetpp perl sjeng libquantum mcf omnetpp perl sjeng blast PRAM is not promising for Cache Normalized IPCNormalized IPC 17
18 Outline Introduction and Motivation Methodology Level based Hybrid Cache Architecture Region based Hybrid Cache Architecture 3D Hybrid Cache Stacking Conclusions 18
19 3DHCA-configuration 3D Layer 1 edram/ Fast Middle edram/ 3DHCA 3DHCA 3DHCA C D E Slow edram/ Fast L4 (PRAM) 3D Layer 2 Slow (PRAM) (PRAM) 3DHCA-C (3D LHCA): 256KB SRAM, 4M edram, 32M L4 PRAM 3DHCA-D: 32M fast, middle, slow region (3D RHCA) Data in slow region can be moved to fast and middle regions 3DHCA-E: 4M fast+slow region, 32M PRAM (LHCA+RHCA) 19
20 DHCA-result 3-level SRAM Best LHCA RHCA 3DHCA-C 3DHCA-D 3DHCA-E astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean Static Normal_dyn Swap_dyn gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean 20 astar bzip2 gcc Normalized Power Normalized IPC
21 Conclusion Hybrid cache architecture is promising to improve cache power-performance under same chip area/footprint RHCA and LHCA achieve better power-performance than SRAM-based design RHCA outperforms LHCA with minimal hardware support 3DHCA achieves better performance than LHCA and RHCA, while still maintains lower power than 2D SRAM baseline 21
Emerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationWALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems
: A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationA Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors
, July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As
More informationFlexible Cache Error Protection using an ECC FIFO
Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead
More informationMemory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez
Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationOAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,
More informationCache/Memory Optimization. - Krishna Parthaje
Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST
More informationCouture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung
Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern
More information3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016
3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016 Hybrid L2 NUCA Design and Management Considering Data Access Latency, Energy Efficiency, and Storage
More informationOptimizing Replication, Communication, and Capacity Allocation in CMPs
Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationLecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )
Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency
More informationConstructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology
Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology LEI JIANG, BO ZHAO, JUN YANG, and YOUTAO ZHANG, University of Pittsburgh Modern mobile processors
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationEnergy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags
Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags Jinwook Jung, Yohei Nakata, Masahiko Yoshimoto, and Hiroshi Kawaguchi Graduate School of System Informatics, Kobe
More informationLeveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM
1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones,
More informationDecoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation
More informationOpenPrefetch. (in-progress)
OpenPrefetch Let There Be Industry-Competitive Prefetching in RISC-V Processors (in-progress) Bowen Huang, Zihao Yu, Zhigang Liu, Chuanqi Zhang, Sa Wang, Yungang Bao Institute of Computing Technology(ICT),
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationCompiler-Assisted Refresh Minimization for Volatile STT-RAM Cache
Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, Yiran Chen, Yanxiang He City University of Hong Kong University of Pittsburg Outline
More informationKruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring
NDSS 2012 Kruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring Donghai Tian 1,2, Qiang Zeng 2, Dinghao Wu 2, Peng Liu 2 and Changzhen Hu 1 1 Beijing Institute of Technology
More informationLinearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationSystem Simulator for x86
MARSS Micro Architecture & System Simulator for x86 CAPS Group @ SUNY Binghamton Presenter Avadh Patel http://marss86.org Present State of Academic Simulators Majority of Academic Simulators: Are for non
More informationBloom Filtering Cache Misses for Accurate Data Speculation and Prefetching
Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching Jih-Kwon Peir, Shih-Chang Lai, Shih-Lien Lu, Jared Stark, Konrad Lai peir@cise.ufl.edu Computer & Information Science and Engineering
More informationA Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,
A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing, RAM, or edram Justin Bates Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 3816-36
More informationChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality
ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce
More informationCache Memory Configurations and Their Respective Energy Consumption
Cache Memory Configurations and Their Respective Energy Consumption Dylan Petrae Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 Abstract When it
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for
More informationDASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture
DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture Junwhan Ahn *, Sungjoo Yoo, and Kiyoung Choi * junwhan@snu.ac.kr, sungjoo.yoo@postech.ac.kr, kchoi@snu.ac.kr * Department of Electrical
More informationA Page-Based Storage Framework for Phase Change Memory
A Page-Based Storage Framework for Phase Change Memory Peiquan Jin, Zhangling Wu, Xiaoliang Wang, Xingjun Hao, Lihua Yue University of Science and Technology of China 2017.5.19 Outline Background Related
More informationTAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches
TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches Majed Valad Beigi and Gokhan Memik Department of EECS, Northwestern University, Evanston, IL majed.beigi@northwestern.edu and memik@eecs.northwestern.edu
More informationEmerging NVM Enabled Storage Architecture:
Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction
More informationManaging Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems
Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline
More informationDynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories
SAFARI Technical Report No. 2-5 (December 6, 2) : A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon hanbinyoon@cmu.edu Justin Meza meza@cmu.edu
More informationA Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers
A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea
More informationROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture
ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture Jie Zhang, Miryeong Kwon, Changyoung Park, Myoungsoo Jung, Songkuk Kim Computer Architecture and Memory
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationVirtualized ECC: Flexible Reliability in Memory Systems
Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing
More informationArchitectural Aspects in Design and Analysis of SOTbased
Architectural Aspects in Design and Analysis of SOTbased Memories Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril & Mehdi Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationStructure of Computer Systems
222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a
More informationDESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS
DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS Jiacong He Department of Electrical Engineering The University of Texas at Dallas 800 W Campbell Rd, Richardson, TX, USA Email: jiacong.he@utdallas.edu
More informationCONTENT-AWARE SPIN-TRANSFER TORQUE MAGNETORESISTIVE RANDOM-ACCESS MEMORY (STT-MRAM) CACHE DESIGNS
CONTENT-AWARE SPIN-TRANSFER TORQUE MAGNETORESISTIVE RANDOM-ACCESS MEMORY (STT-MRAM) CACHE DESIGNS By QI ZENG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
More informationfor High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami
3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University
More informationLow-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM
Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data
More informationMohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu
Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly
More informationRespin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory
Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Computer Architecture Research Lab h"p://arch.cse.ohio-state.edu Universal Demand for Low Power Mobility Ba"ery life Performance
More informationEvalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve
Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationExploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies
Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard
More informationThe Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory
The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*
More informationA Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values
A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values Mohsen Imani, Abbas Rahimi, Yeseong Kim, Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093,
More informationAC-DIMM: Associative Computing with STT-MRAM
AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:
More informationEnhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization
Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization Anton Kuijsten Andrew S. Tanenbaum Vrije Universiteit Amsterdam 21st USENIX Security Symposium Bellevue,
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationBandwidth-Aware Reconfigurable Cache Design with Hybrid Memory Technologies
Bandwidth-Aware Reconfigurable Cache Design with Hybrid Memory Technologies Jishen Zhao, Cong Xu, Yuan Xie Computer Science and Engineering Department, Pennsylvania State University {juz138,czx12,yuanxie}@cse.psu.edu
More informationBenchmarking the Memory Hierarchy of Modern GPUs
1 of 30 Benchmarking the Memory Hierarchy of Modern GPUs In 11th IFIP International Conference on Network and Parallel Computing Xinxin Mei, Kaiyong Zhao, Chengjian Liu, Xiaowen Chu CS Department, Hong
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationNovel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014
Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced
More informationMigration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM
Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Hyunchul Seok Daejeon, Korea hcseok@core.kaist.ac.kr Youngwoo Park Daejeon, Korea ywpark@core.kaist.ac.kr Kyu Ho Park Deajeon,
More informationImproving Energy Efficiency of Write-asymmetric Memories by Log Style Write
Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Guangyu Sun 1, Yaojun Zhang 2, Yu Wang 3, Yiran Chen 2 1 Center for Energy-efficient Computing and Applications, Peking University
More informationCloudCache: Expanding and Shrinking Private Caches
CloudCache: Expanding and Shrinking Private Caches Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers Computer Science Department, University of Pittsburgh {abraham,cho,childers}@cs.pitt.edu Abstract The
More informationInitial Results on the Performance Implications of Thread Migration on a Chip Multi-Core
3 rd HiPEAC Workshop IBM, Haifa 17-4-2007 Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core, P. Michaud, L. He, D. Fetis, C. Ioannou, P. Charalambous and A. Seznec
More informationImproving Virtual Machine Scheduling in NUMA Multicore Systems
Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore
More informationAdaptive Placement and Migration Policy for an STT-RAM-Based Hybrid Cache
Adaptive Placement and Migration Policy for an STT-RAM-Based Hybrid Cache Zhe Wang Daniel A. Jiménez Cong Xu Guangyu Sun Yuan Xie Texas A&M University Pennsylvania State University Peking University AMD
More informationCache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs
Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Technical Report CSE--00 Monday, June 20, 20 Adwait Jog Asit K. Mishra Cong Xu Yuan Xie adwait@cse.psu.edu amishra@cse.psu.edu
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationSoftware-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems
Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationEFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES
EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES MICRO 2011 @ Porte Alegre, Brazil Gabriel H. Loh [1] and Mark D. Hill [2][1] December 2011 [1] AMD Research [2] University
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationNear-Threshold Computing: How Close Should We Get?
Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on
More informationDEMM: a Dynamic Energy-saving mechanism for Multicore Memories
DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationBottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt
Bottleneck Identification and Scheduling in Multithreaded Applications José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Executive Summary Problem: Performance and scalability of multithreaded applications
More informationSecurity-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat
Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance
More informationA Hybrid Adaptive Feedback Based Prefetcher
A Feedback Based Prefetcher Santhosh Verma, David M. Koppelman and Lu Peng Department of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 78 sverma@lsu.edu, koppel@ece.lsu.edu,
More informationCouture: Tailoring STT-MRAM for Persistent Main Memory
: Tailoring for Persistent Main Memory Mustafa M Shihab 1, Jie Zhang 3, Shuwen Gao 2, Joseph Callenes-Sloan 1, and Myoungsoo Jung 3 1 University of Texas at Dallas, 2 Intel Corporation 3 School of Integrated
More informationUnderstanding The Effects of Wrong-path Memory References on Processor Performance
Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend
More informationAS the processor-memory speed gap continues to widen,
IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 7, JULY 2004 843 Design and Optimization of Large Size and Low Overhead Off-Chip Caches Zhao Zhang, Member, IEEE, Zhichun Zhu, Member, IEEE, and Xiaodong Zhang,
More informationPerceptron Learning for Reuse Prediction
Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level
More informationDynamic Cache Pooling in 3D Multicore Processors
Dynamic Cache Pooling in 3D Multicore Processors TIANSHENG ZHANG, JIE MENG, and AYSE K. COSKUN, BostonUniversity Resource pooling, where multiple architectural components are shared among cores, is a promising
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationBanked Multiported Register Files for High-Frequency Superscalar Microprocessors
Banked Multiported Register Files for High-Frequency Superscalar Microprocessors Jessica H. T seng and Krste Asanoviü MIT Laboratory for Computer Science, Cambridge, MA 02139, USA ISCA2003 1 Motivation
More informationAddressing End-to-End Memory Access Latency in NoC-Based Multicores
Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu
More informationDynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design
Dynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design Yu-Ting Chen, Jason Cong, Hui Huang, Bin Liu, Chunyue Liu, Miodrag Potkonjak, and Glenn Reinman Computer Science Department,
More informationA Framework for Providing Quality of Service in Chip Multi-Processors
A Framework for Providing Quality of Service in Chip Multi-Processors Fei Guo 1, Yan Solihin 1, Li Zhao 2, Ravishankar Iyer 2 1 North Carolina State University 2 Intel Corporation The 40th Annual IEEE/ACM
More informationMemory hierarchy and cache
Memory hierarchy and cache QUIZ EASY 1). What is used to design Cache? a). SRAM b). DRAM c). Blend of both d). None. 2). What is the Hierarchy of memory? a). Processor, Registers, Cache, Tape, Main memory,
More informationCache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs
Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Adwait Jog Asit K. Mishra Cong Xu Yuan Xie Vijaykrishnan Narayanan Ravishankar Iyer Chita R. Das The Pennsylvania State
More informationEvaluation of RISC-V RTL with FPGA-Accelerated Simulation
Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanovic CARRV 2017 10/14/2017 Evaluation Methodologies For Computer
More informationComputer Sciences Department
Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for
More information3D Memory Architecture. Kyushu University
3D Memory Architecture Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions 2 Outline Why 3D? Will 3D
More informationMestrado em Informática
Sistemas de Computação e Desempenho Arquitecturas Paralelas Mestrado em Informática 2010/11 A.J.Proença Tema Arquitecturas Paralelas (1) Estrutura do tema AP 1. A evolução das arquitecturas pelo paralelismo
More information