Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory

Size: px
Start display at page:

Download "Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory"

Transcription

1 Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Computer Architecture Research Lab h"p://arch.cse.ohio-state.edu

2 Universal Demand for Low Power Mobility Ba"ery life Performance Power constraints Energy cost Environment 2

3 Near-Threshold Operation Vth NT Vdd Nominal Vdd 0 ~1/ Voltage (V) ~1/ Frequency 1 3

4 Challenges in Near-Threshold Performance degradanon Amplified process varianon Leakage power dominates The ininal idea of Respin Build caches in NT-CMP with leakage-free nonvolanle memories to reduce power consumpnon Probability distribution Frac=on of Total Chip Power Nominal Vdd Relative frequency Near-Threshold Vdd 900mV, Vth σ/µ= 9% 900mV, Vth σ/µ=12% 400mV, Vth σ/µ= 9% 400mV, Vth σ/µ=12% Dynamic and Leakage Power Breakdown for a 64-Core CMP ~39% Total Leakage ~73% Total Leakage ~35% Cache Leakage dynamic-core dynamic-cache leakage-core leakage-cache 4

5 Outline Basics of NVM Respin Architecture Methodology and EvaluaNon Conclusion 5

6 Non-Volatile Memory Basics Non-VolaNlity Resistance as data representanon (e.g. PCRAM, STT-RAM, ReRAM, etc.) Near-Zero Leakage Power Good fit for future power-constrained compunng High Density Great design candidate in the big data era Good Performance Feasible for on-chip storage replacement 6

7 STT-RAM Unique features of STT-RAM: fast read speed, low read energy, unlimited write endurance, and good companbility with CMOS technology Shortcomings: long write latency and high write energy MTJ (MagneNc Tunnel JuncNon) STT-RAM Cell (1T-1MTJ) AnN-parallel state: highresistance, represennng a 1 Parallel state: low-resistance, represennng a 0 7

8 STT-RAM is Good Fit for NT-CMP NT-CMP High leakage power Low operanng frequency Large numbers of cores requiring high cache capacity Unreliable funcnonal units STT-RAM Near-zero leakage power Long latency writes High density (~4x denser than SRAM) Robust from soh-errors 8

9 Outline Basics of NVM Respin Architecture Methodology and EvaluaNon Conclusion 9

10 Core 3 Core 8 Core 9 Core 10 Core11 Core 12 Core 13 Core 14 Core 15 Core 3 Core 8 Core 9 Core 10 Core11 Core 12 Core 13 Core 14 Core 15 Core 4 Core 4 Core 5 Core 5 Core 6 Core 6 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 8 Core 9 Core 10 Core11 Core 12 Core 13 Core 14 Core 15 Core 3 Core 8 Core 9 Core 10 Core11 Core 12 Core 13 Core 14 Core 15 Core 4 Core 5 Core 6 Respin Architecture NT Vdd High Vdd NT Vdd High Vdd Core 0 Core 1 Core 2 Core 7 Cluster 0 Cluster 1 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 L3 Cache L2 Cache L1D L1I L2 Cache Core 0 Core 1 Core 2 Core 7 Core 0 Core 1 Core 2 Core 7 Cluster 2 Cluster 3 Core 8 Core 9 Core 10 Core11 Core 12 Core 13 Core 14 Core 15 (a) Cores operate at NT-Vdd rail with low frequencies Caches are built with STT-RAM and operate at high-vdd rail making read speed extremely fast Clustered-CMP with fast STT-RAM read enables within-cluster shared L1 cache design, removing coherence costs (b) 10

11 Shared Cache Controller Cache CLK (0.4ns) Core0 (1.6ns) Core1 (2.4ns) Core2 (1.6ns) Core3 (1.6ns) Core4 (2.0ns) cycle 0 cycle 1 cycle 2 cycle 3 cycle 4 Level-shifting overhead Level-shifting overhead service C0 service C2 half-miss C3 cannot service C3 in time service C3 cycle 5 cycle 6 service C4 service C1 Request serviced (a) Cache CLK (0.4ns) Core0 (1.6ns) Core1 (2.4ns) Core2 (1.6ns) Core3 (1.6ns) Core4 (2.0ns) cycle 0 cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 service C0 service C2 service C3 service C4 service C1 half-miss C half-miss (b) 11

12 Dynamic Core Consolidation High process varianon and leakage in NT-CMP lead to fast cores more energy-efficient than slow ones Dynamically consolidate threads onto more efficient cores with greedy search at runnme can further save energy VC 15 VC 14 VC Virtual Core Monitor Filter 0 1 System Firmware (ACPI) Virtual Cores VC 4 OS VC 3 Core Remapper Energy Table Phy. ID Weight Status OFF OFF 0 1 VC 2 ID Map Virt. ID Phy. ID 3 2 VC 1 VC 0 Retired Inst. Energy Power Control Mechanism implemented in system firmware, energy-perinstrucnon used as greedy selecnon metric, and instrucnon VC 14 VC 13 VC C 15 C 14 C Physical Cores VC 3 C 4 VC 0 VC 4 C 3 VC 1 VC 2 C 2 OFF C 1 OFF C 0 count used as evaluanon interval 12

13 Greedy Selection A greedy approach is used to search for opnmal system configuranons at applicanon runnme Energy-Per-InstrucNon (EPI) as evaluanon metric InstrucNon count as evaluanon interval 13

14 Outline Basics of NVM Respin Architecture Methodology and Evalua=on Conclusion 14

15 Methodology Level Size (mm²) Block Size Associa=vity Read/Write Ports L1I (Private/Shared within Cluster) L1D (Private/Shared within Cluster) L2 (Shared within Cluster) L3 (Shared within Chip) 16KB (Private)/256KB (Shared within Cluster) 8MB (Small)/16MB (Medium)/32MB (Large) 32B 64B 24MB (Small)/48MB (Medium)/ 128B 96MB (Large) Table 1. Summary of Cache Parameters. Vdd Rail Area (mm²) Read/Write Latency (ns) 2-way 4-way 8-way 16-way Read/Write Energy (pj) 1/1 Leakage Power (mw) SRAM (16KB 16) Low (0.65V) SRAM (256KB) STT-RAM (256KB) High (1.0V) / / Table 2. Comparison of SRAM vs. STT-RAM Technology Parameters. 15

16 Methodology SimulaNon Framework: SESC for architectural simulanon CACTI, McPAT, and NVSim for Benchmarks: latency, power, energy, and area simulanon SPLASH2 and PARSEC Main Evaluated ConfiguraNon: 64-core CMP with four 16-core clusters Medium size L2 and L3 caches 0.4ns shared L1 cache read latency Cores CMP Architecture 64 out-of-order Fetch/Issue/Commit Width 2/2/2 Register File Size InstrucNon Window Size Reorder Buffer Size Load/Store Queue Size NoC Interconnect Coherence Protocol Consistency Model Technology NT-Vdd 76 int, 56 fp 56 int, 24 fp 80 entries 38 entries 2D Torus MESI Release Consistency 22nm 0.4V (Core), 0.65V (Cache) Nominal-Vdd 1.0V Core Frequency Range Median Core Frequency Vth std. dev./mean (σ/μ) VariaNon Parameters 375MHz 725MHz 500MHz 12% (Chip), 10% (Cluster) Table 3. Summary of Experimental Parameters. 16

17 Normalized Total Chip Power Small Cache Power and Performance Power Reduc=on of Proposed Design with Three L2/L3 Cache Sizes Medium Cache ~3% ~14% Large Cache ~23% leakage Respin achieved 14% power reducnon and 11% performance improvement with medium sized cache dynamic Rela=ve Execu=on Time for Medium Cache Size Normalized Execu=on Time GEOMEAN PR-SRAM-NT HP-SRAM-CMP SH-SRAM-Nom SH-STT 17

18 Energy Consumption For medium sized cache, Respin achieved 22% energy savings with the basic shared STT-RAM cache design plus addinonal 10% with core consolidanon enabled 1.2 Energy Consump=on for Three L2/L3 Cache Sizes 1.4 Energy Consump=on for Different Core Consolida=on 1.39 Configura=ons Normalized Energy Consump=on Small Cache Medium Cache Large Cache PR-SRAM-NT SH-SRAM-Nom SH-STT Normalized Energy Consump=on GEOMEAN PR-SRAM-NT HP-SRAM-CMP SH-SRAM-Nom SH-STT SH-STT-CC SH-STT-CC-Oracle PR-STT-CC SH-STT-CC-OS 18

19 Core Consolidation Analysis In most cases our greedy algorithm matches well with the oracle while in very few cases subopnmal selecnon becomes the barrier to slow down the pace of our greedy mechanism SH-STT-CC SH-STT-CC-Oracle radix (SPLASH2): greedy achieved 48% energy savings while oracle achieved 50% Number of OFF Cores Number of Million Instructions 16 SH-STT-CC SH-STT-CC-Oracle 14 lu (SPLASH2): greedy achieved 29% energy savings while oracle achieved 38% Number of OFF Cores Sub-op=mal Number of Million Instructions 19

20 Shared Cache Access Load More than 95% of the incoming requests can be serviced in one processor core cycle Percentage of Total Cache Cycles Shared DL1 Cache U=liza=on Rate in One NT-CMP Cluster >= 4 Num of Requests Arriving at DL1 Cache blackscholes cholesky radix raytrace streamcluster AVERAGE Percentage of DL1 Read Hit Requests Frac=on of Read Hit Requests Serviced by the Shared DL1 in 1, 2, or more cycles blackscholes cholesky radix raytrace streamcluster AVERAGE 1 2 >= 3 Num of Core Cycles Serviced 20

21 Sensitivity Study on Cluster Size 16-core cluster provides the best trade-off between data sharing among cores and shared cache access load, giving ~11% performance gain Cluster Size (#cores) Shared L1 (I/D) Size (KB) Performance Gain (%)

22 Number of Active Cores in Dynamic Core Consolidation Strong phased behavior can be observed across all applicanons, making our dynamic core consolidanon mechanism very useful Average Number of Active Cores barnes cholesky fft lu SH-STT-CC SH-STT-CC-Oracle ocean radiosity radix raytracewater-nsquared AVG swaptions streamcluster bodytrack blackscholes 22

23 Outline Basics of NVM Respin Architecture Methodology and EvaluaNon Conclusion 23

24 Conclusion The first work to explore the use of non-volanle caches in near-threshold chip muln-processors A novel architecture designed to enhance NT-CMP performance and reduce energy consumpnon by sharing L1 caches and implemennng dynamic core consolidanon mechanism Achieved energy reducnon by 33% and improved performance by 11% 24

25 Questions? Thank you! 25

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Novel Nonvolatile Memory Hierarchies to Realize Normally-Off Mobile Processors ASP-DAC 2014 Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted

More information

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions

More information

Architectural Aspects in Design and Analysis of SOTbased

Architectural Aspects in Design and Analysis of SOTbased Architectural Aspects in Design and Analysis of SOTbased Memories Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril & Mehdi Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache

Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, Yiran Chen, Yanxiang He City University of Hong Kong University of Pittsburg Outline

More information

Cache/Memory Optimization. - Krishna Parthaje

Cache/Memory Optimization. - Krishna Parthaje Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST

More information

A Framework for Modeling GPUs Power Consumption

A Framework for Modeling GPUs Power Consumption A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January

More information

Near-Threshold Computing: Reclaiming Moore s Law

Near-Threshold Computing: Reclaiming Moore s Law 1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:

More information

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1

NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 NON-SPECULATIVE LOAD LOAD REORDERING IN TSO 1 Alberto Ros Universidad de Murcia October 17th, 2017 1 A. Ros, T. E. Carlson, M. Alipour, and S. Kaxiras, "Non-Speculative Load-Load Reordering in TSO". ISCA,

More information

Stash Directory: A Scalable Directory for Many- Core Coherence! Socrates Demetriades and Sangyeun Cho

Stash Directory: A Scalable Directory for Many- Core Coherence! Socrates Demetriades and Sangyeun Cho Stash Directory: A Scalable Directory for Many- Core Coherence! Socrates Demetriades and Sangyeun Cho 20 th Interna+onal Symposium On High Performance Computer Architecture (HPCA). Orlando, FL, February

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections ) Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency

More information

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations

More information

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August The Engine & DRAM Endurance and Speed with STT MRAM Les Crudele / Andrew J. Walker PhD August 2018 1 Contents The Leaking Creaking Pyramid STT-MRAM: A Compelling Replacement STT-MRAM: A Unique Endurance

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Locality-Aware Data Replication in the Last-Level Cache

Locality-Aware Data Replication in the Last-Level Cache Locality-Aware Data Replication in the Last-Level Cache George Kurian, Srinivas Devadas Massachusetts Institute of Technology Cambridge, MA USA {gkurian, devadas}@csail.mit.edu Omer Khan University of

More information

An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM Bi Wu, Yuanqing Cheng,YingWang, Aida Todri-Sanial, Guangyu Sun, Lionel Torres and Weisheng Zhao School of Software Engineering

More information

Recent Advancements in Spin-Torque Switching for High-Density MRAM

Recent Advancements in Spin-Torque Switching for High-Density MRAM Recent Advancements in Spin-Torque Switching for High-Density MRAM Jon Slaughter Everspin Technologies 7th International Symposium on Advanced Gate Stack Technology, September 30, 2010 Everspin Technologies,

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu

More information

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Farrukh Hijaz, Omer Khan University of Connecticut Power Efficiency Performance/Watt Multicores enable efficiency Power-performance

More information

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row

More information

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami 3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University

More information

SpongeDirectory: Flexible Sparse Directories Utilizing Multi-Level Memristors

SpongeDirectory: Flexible Sparse Directories Utilizing Multi-Level Memristors SpongeDirectory: Flexible Sparse Directories Utilizing Multi-Level Memristors Lunkai Zhang, Dmitri Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin State Key Laboratory of Computer

More information

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Guangyu Sun 1, Yaojun Zhang 2, Yu Wang 3, Yiran Chen 2 1 Center for Energy-efficient Computing and Applications, Peking University

More information

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University

More information

A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors

A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors Wing-kei Yu, Shantanu Rajwade, Sung-En Wang, Bob Lian, G. Edward Suh, Edwin Kan Cornell University 2 of 32 Self-Powered Devices

More information

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What

More information

ABSTRACT. Mu-Tien Chang Doctor of Philosophy, 2013

ABSTRACT. Mu-Tien Chang Doctor of Philosophy, 2013 ABSTRACT Title of dissertation: TECHNOLOGY IMPLICATIONS FOR LARGE LAST-LEVEL CACHES Mu-Tien Chang Doctor of Philosophy, 3 Dissertation directed by: Professor Bruce Jacob Department of Electrical and Computer

More information

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,

More information

3D Memory Architecture. Kyushu University

3D Memory Architecture. Kyushu University 3D Memory Architecture Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions 2 Outline Why 3D? Will 3D

More information

A Design Space Exploration Methodology for Parameter Optimization in Multicore Processors

A Design Space Exploration Methodology for Parameter Optimization in Multicore Processors A Design Space Exploration Methodology for Parameter Optimization in Multicore Processors Prasanna Kansakar, Student Member, IEEE and Arslan Munir, Member, IEEE 1 Abstract The need for application specific

More information

Spin-Hall Effect MRAM Based Cache Memory: A Feasibility Study

Spin-Hall Effect MRAM Based Cache Memory: A Feasibility Study Spin-Hall Effect MRAM Based Cache Memory: A Feasibility Study Jongyeon Kim, Bill Tuohy, Cong Ma, Won Ho Choi, Ibrahim Ahmed, David Lilja, and Chris H. Kim University of Minnesota Dept. of ECE 1 Overview

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

Efficient Systems. Micrel lab, DEIS, University of Bologna. Advisor

Efficient Systems. Micrel lab, DEIS, University of Bologna. Advisor Row-based Design Methodologies To Compensate Variability For Energy- Efficient Systems Micrel lab, DEIS, University of Bologna Mohammad Reza Kakoee PhD Student m.kakoee@unibo.it it Luca Benini Advisor

More information

CMP annual meeting, January 23 rd, 2014

CMP annual meeting, January 23 rd, 2014 J.P.Nozières, G.Prenat, B.Dieny and G.Di Pendina Spintec, UMR-8191, CEA-INAC/CNRS/UJF-Grenoble1/Grenoble-INP, Grenoble, France CMP annual meeting, January 23 rd, 2014 ReRAM V wr0 ~-0.9V V wr1 V ~0.9V@5ns

More information

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan : An

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches Nikos Hardavellas Michael Ferdman, Babak Falsafi, Anastasia Ailamaki Carnegie Mellon and EPFL Data Placement in Distributed

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Emerging NVM Enabled Storage Architecture:

Emerging NVM Enabled Storage Architecture: Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction

More information

VIPS: SIMPLE, EFFICIENT, AND SCALABLE CACHE COHERENCE

VIPS: SIMPLE, EFFICIENT, AND SCALABLE CACHE COHERENCE VIPS: SIMPLE, EFFICIENT, AND SCALABLE CACHE COHERENCE Alberto Ros 1 Stefanos Kaxiras 2 Kostis Sagonas 2 Mahdad Davari 2 Magnus Norgen 2 David Klaftenegger 2 1 Universidad de Murcia aros@ditec.um.es 2 Uppsala

More information

CS311 Lecture 21: SRAM/DRAM/FLASH

CS311 Lecture 21: SRAM/DRAM/FLASH S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

An Energy Efficient Parallel Architecture Using Near Threshold Operation

An Energy Efficient Parallel Architecture Using Near Threshold Operation An Energy Efficient Parallel Architecture Using Near Threshold Operation Ronald G. Dreslinski, Bo Zhai, Trevor Mudge, David Blaauw, and Dennis Sylvester University of Michigan - Ann Arbor, MI {rdreslin,

More information

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Technical Report CSE--00 Monday, June 20, 20 Adwait Jog Asit K. Mishra Cong Xu Yuan Xie adwait@cse.psu.edu amishra@cse.psu.edu

More information

Cache Memory Configurations and Their Respective Energy Consumption

Cache Memory Configurations and Their Respective Energy Consumption Cache Memory Configurations and Their Respective Energy Consumption Dylan Petrae Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 Abstract When it

More information

c 2014 Aditya Agrawal

c 2014 Aditya Agrawal c 2014 Aditya Agrawal REFRESH REDUCTION IN DYNAMIC MEMORIES BY ADITYA AGRAWAL DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 14: Photonic Interconnect Instructor: Ron Dreslinski Winter 2016 1 1 Announcements 2 Remaining lecture schedule 3/15: Photonics

More information

Evaluation of Hybrid Memory Technologies using SOT-MRAM for On-Chip Cache Hierarchy

Evaluation of Hybrid Memory Technologies using SOT-MRAM for On-Chip Cache Hierarchy 1 Evaluation of Hybrid Memory Technologies using SOT-MRAM for On-Chip Cache Hierarchy Fabian Oboril, Rajendra Bishnoi, Mojtaba Ebrahimi and Mehdi B. Tahoori Abstract Magnetic Random Access Memory (MRAM)

More information

C1C: A Configurable, Compiler-Guided STT-RAM L1 Cache

C1C: A Configurable, Compiler-Guided STT-RAM L1 Cache 52 C1C: A Configurable, Compiler-Guided STT-RAM L1 Cache YONG LI, YAOJUN ZHANG, HAI LI, YIRAN CHEN, and ALEX K. JONES, University of Pittsburgh Spin-Transfer Torque RAM (STT-RAM), a promising alternative

More information

Maximize energy efficiency in a normally-off system using NVRAM. Stéphane Gros Yeter Akgul

Maximize energy efficiency in a normally-off system using NVRAM. Stéphane Gros Yeter Akgul Maximize energy efficiency in a normally-off system using NVRAM Stéphane Gros Yeter Akgul Summary THE COMPANY THE CONTEXT THE TECHNOLOGY THE SYSTEM THE CO-DEVELOPMENT CONCLUSION May 31, 2017 2 Summary

More information

Developing a Prototyping Board for Emerging Memory

Developing a Prototyping Board for Emerging Memory Developing a Prototyping Board for Emerging Memory 2013. 10. 25 Sungjoo Yoo Embedded System Architecture Lab. POSTECH Introduction scaling problem [ITRS, 2012] Year 2012 2013 2014 2015 2016 2017 2018 2019

More information

AC-DIMM: Associative Computing with STT-MRAM

AC-DIMM: Associative Computing with STT-MRAM AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:

More information

Fault Injection Attacks on Emerging Non-Volatile Memories

Fault Injection Attacks on Emerging Non-Volatile Memories Lab of Green and Secure Integrated Circuit Systems (LOGICS) Fault Injection Attacks on Emerging Non-Volatile Memories Mohammad Nasim Imtiaz Khan and Swaroop Ghosh School of Electrical Engineering and Computer

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture

MTJ-Based Nonvolatile Logic-in-Memory Architecture 2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory

More information

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Memories: Memory Technology

Memories: Memory Technology Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy

More information

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15

More information

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM

More information

Routing Path Reuse Maximization for Efficient NV-FPGA Reconfiguration

Routing Path Reuse Maximization for Efficient NV-FPGA Reconfiguration Routing Path Reuse Maximization for Efficient NV-FPGA Reconfiguration Yuan Xue, Patrick ronin, hengmo Yang and Jingtong Hu 01/27/2016 Outline Introduction NV-FPGA benefits and challenges Routing optimization

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

Asymmetry-Aware Work-Stealing Runtimes

Asymmetry-Aware Work-Stealing Runtimes Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher atten School of Electrical and Computer Engineering Cornell University 43rd Int l Symp. on Computer Architecture,

More information

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform H. Mizuno, N. Irie, K. Uchiyama, Y. Yanagisawa 1, S. Yoshioka 1, I. Kawasaki 1, and T. Hattori 2 Hitachi Ltd.,

More information

ASIC Design of Shared Vector Accelerators for Multicore Processors

ASIC Design of Shared Vector Accelerators for Multicore Processors 26 th International Symposium on Computer Architecture and High Performance Computing 2014 ASIC Design of Shared Vector Accelerators for Multicore Processors Spiridon F. Beldianu & Sotirios G. Ziavras

More information

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM, A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing, RAM, or edram Justin Bates Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 3816-36

More information

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber Intelligent Infrastructure Lab (IIL), Hewlett-Packard

More information

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

! Memory.  RAM Memory.  Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell.  Used in most commercial chips ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories

More information

Memory technology and optimizations ( 2.3) Main Memory

Memory technology and optimizations ( 2.3) Main Memory Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between

More information

A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values

A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values Mohsen Imani, Abbas Rahimi, Yeseong Kim, Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093,

More information

Performance of coherence protocols

Performance of coherence protocols Performance of coherence protocols Cache misses have traditionally been classified into four categories: Cold misses (or compulsory misses ) occur the first time that a block is referenced. Conflict misses

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

The Runahead Network-On-Chip

The Runahead Network-On-Chip The Network-On-Chip Zimo Li University of Toronto zimo.li@mail.utoronto.ca Joshua San Miguel University of Toronto joshua.sanmiguel@mail.utoronto.ca Natalie Enright Jerger University of Toronto enright@ece.utoronto.ca

More information

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Clay Hughes & Tao Li Department of Electrical and Computer Engineering University of Florida

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches

TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches TAPAS: Temperature-aware Adaptive Placement for 3D Stacked Hybrid Caches Majed Valad Beigi and Gokhan Memik Department of EECS, Northwestern University, Evanston, IL majed.beigi@northwestern.edu and memik@eecs.northwestern.edu

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

A Low Power 1Mbit MRAM based based on 1T1MTJ Bit Cell Integrated with Copper Interconnects

A Low Power 1Mbit MRAM based based on 1T1MTJ Bit Cell Integrated with Copper Interconnects A Low Power 1Mbit MRAM based based on 1T1MTJ Bit Cell Integrated with Copper Interconnects M. Durlam, P. Naji, A. Omair, M. DeHerrera, J. Calder, J. M. Slaughter, B. Engel, N. Rizzo, G. Grynkewich, B.

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

! Memory Overview. ! ROM Memories. ! RAM Memory  SRAM  DRAM. ! This is done because we can build.  large, slow memories OR ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview

More information