SPINTRONIC MEMORY ARCHITECTURE

Size: px
Start display at page:

Download "SPINTRONIC MEMORY ARCHITECTURE"

Transcription

1 SPINTRONIC MEMORY ARCHITECTURE Anand Raghunathan Integrated Systems Laboratory School of ECE, Purdue University Rangharajan Venkatesan Shankar Ganesh Ramasubramanian, Ashish Ranjan Kaushik Roy 7 th NCN-NEEDS Summer School Spintronics: Science, Circuits, and Systems

2 AGENDA Background Memory architecture Designing caches with STT-MRAM Exploring DWM for on-chip memory

3 DEMAND FOR ON-CHIP MEMORY Over 50% of chip area devoted to memory Multi-cores, increasing processor-memory gap accelerate the demand for onchip memory Cache Transistors (Million) 1000 Cache transistors 800 % chip transistors in cache Cache trends in Intel microprocessors % chip transistors in cache On-chip memory in SoCs

4 STORAGE HIERARCHIES Fundamental tradeoff between speed and capacity Would like to have both Solution: Organize memory/storage in a hierarchical manner Source: Avnet

5 MICROPROCESSOR ON-CHIP MEMORY HIERARCHY Microprocessors utilize multiple levels of on-chip cache to hide memory latency and improve bandwidth Exploits key properties of memory accesses Temporal Locality: If a location is referenced it is likely to be referenced again in the near future. Spatial Locality: If a location is referenced it is likely that locations near it will be referenced in the near future. Typical on-chip memory hierarchy Intel Core i7 Mobile CPU cache hierarchy

6 CACHE BASICS A cache holds a copy of some subset of data from the next lower level of the hierarchy Each cache line consists of an address tag and a data block Cache address = Tag + Index + Offset Source: MIT OCW, Course 6.823

7 CACHE BASICS Cache access Given an address, check if tag is in cache If yes, cache hit return data from cache If no, cache miss retrieve data from next level of hierarchy Where should the data be written in the cache? What if data that is already present at that location? Source: MIT OCW, Course Average access time = Hit time + Miss rate * Miss penalty

8 CACHE BASICS Caches help hide memory latency and increase throughput Why? LITTLE S LAW: The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = λw.

9 CACHE BASICS Mapping of locations (addresses) may be direct (many-to-one) or associative (many-to-many) Direct-mapped 2-way associative Why associativity? Source: Wikipedia

10 CACHE BASICS A cache consists of a tag array, data array, and control logic Tag array lookup indicates whether access is a hit or miss Tag and data arrays may be accessed serially or in parallel Organization of a direct-mapped cache Source: MIT OCW, Course Source: Wikipedia

11 CACHE BASICS Various design choices present a rich tradeoff space Cache Size Cache Block Associativity Replacement policy Inclusivity Write policy Average access time = Hit time + Miss rate * Miss penalty

12 THE CASE FOR SPINTRONIC ON-CHIP MEMORIES The combination of endurance, speed, non-volatility and density make spintronic memories promising for on-chip applications Source: Toshiba Corp., IEDM 2012

13 DESIGNING CACHES WITH STT-MRAM Fixed Layer Tunneling oxide Free layer

14 STT-MRAM CACHE CHARACTERISTICS Iso-capacity comparison of 1MB SRAM and STT-MRAM caches Area Read latency Write latency SRAM STT-MRAM Leakage Read energy Write energy Adverse Favorable

15 WHERE IS STT-MRAM USED IN CACHE HIERARCHY? CPU Small size Optimized for latency L1 cache (SRAM) Large size Optimized for density L2 cache (STT-MRAM) Spin-Transfer Torque MRAM (STT-MRAM) STT-MRAM is suitable for lower levels of cache hierarchy (due to high write latency) Architectural techniques required to address inefficient writes

16 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)

17 HYBRID SRAM/STT-MRAM L2 CACHE Observation: Writes are concentrated to a small portion of address space Conventional Cache Tag array Data array W0 W1 W2 W3 W4 W5 W6 W7 Way0 Way1 Way2 Way3 Way4 Way5 Way6 Way7 Hybrid cache Split cache into read and write intensive regions Write region SRAM Read region STT-MRAM Policy to control which region a cache block resides in Tag W0 W1 Data Way0 Way1 Tag array W2 W3 W4 W5 W6 W7 Control Policy Write region Hybrid Cache Way2 Way3 Way4 Way5 Way6 Way7 Read region Data array X. Wu et al. ISCA 2009

18 HYBRID SRAM/STT-MRAM L2 CACHE Aims to combine the benefits of SRAM and STT-MRAM Large parts of cache with STT-MRAM Small part of cache with SRAM SRAM used to store write-intensive cache blocks Efficient writes High density, Low leakage SRAM STT-MRAM Hybrid Leakage Density Write

19 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)

20 REDUCING WRITE INTENSITY: WRITE BIASING CPU L1 cache (SRAM) L2 cache (STT-MRAM) Dirty block evictions from L1 cache Write on miss Typical eviction policies (LRU, FIFO) does not distinguish between clean and dirty blocks Only dirty blocks need to be written to L2! Write biasing Modify eviction policy to increase residency of dirty blocks in L1

21 REDUCING WRITE INTENSITY: WRITE BIASING Traditional cache eviction policy (LRU) Status stack of 4-way set-associative cache Load C Store E Highest priority A B C D C A B D E C A B Lowest priority Last accessed block is always inserted at the top of stack (TOS) M. Rasquinha et al. ISLPED 2010 Write biasing policy Status stack of 4-way set-associative cache Clean block Dirty block Insert position K=2 Highest Lowest priority priority A B C D Load C C A B D Promote to TOS Store E C A E B Insert to pos. K Store E E C A B Promote to TOS Store G E C G A Insert to pos. K Store G G E C A Promote to TOS Load A G E A C Promote to pos.k as all blocks above Blocks with repeated writes prioritized K are dirty over recently accessed clean blocks

22 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)

23 REDUCING WRITE INTENSITY Observation: Most bits in dirty cache blocks are unchanged Redundant writes upon eviction Solution Partial line update Redundant bit writes to L2 in SPEC 2006 benchmarks P. Zhou et al. ICCAD 2009

24 PARTIAL LINE UPDATE SCHEME Tag Array Data Array Tag H Tag H Data Array CPU CPU L1 CACHE (SRAM) One Cache Line 64B Write to L2 L1 CACHE (SRAM) History Partial Lines in a 64B Line 16B 16B 16B 16B Write to L2 X X X 64B 16B 16B 16B 16B L2 CACHE (SRAM) Tag Array Data Array L2 CACHE (MRAM) Tag Array Data Array Conventional Cache When a dirty cache line is evicted from L1 cache, the entire cache line is written to L2 cache. Partial line update Track dirty cache lines at finer granularity using additional bits in tag array of L1 cache When cache line is evicted, only dirty parts written to L2 cache S. P. Park et al. DAC 2012

25 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)

26 VOLATILE STT-MRAM Lifetime of cache blocks in caches is small Lifetimes of cache blocks in L2 cache Relax retention time to improve write efficiency Retention Time 10 years 1 sec 10 ms Write latency 11ns 6ns 3ns Retention time vs. write latency for STT-MRAM L2 cache A. Jog et al. DAC 2012

27 VOLATILE STT-MRAM ARCHITECTURE Reducing retention time makes cache volatile Requires refresh or write-back for dirty blocks State machine to determine block status 2-bit state machine for every block 4 states: S0,S1,S2,S3 Initial state: S0 (just written) Intermediate state: S1 Diminishing state: S2 (about to become invalid) Invalid state: S3 A. Jog et al. DAC 2012 Write Valid 0 10 ms (Retention time) Block lifetime T T T Invalid S0 S1 S2 S3 D W W State machine to determine block status W T = Counter pulse width W = Write D = Diminishing I = Invalid I

28 VOLATILE STT-MRAM ARCHITECTURE Stores block status 2-bit counter/block Tag Data MRU LRU Way ID Block State S0 S2 S1 S1 S2 S1 S2 S0 S1 S0 S2 S2 S1 S1 S0 S2 Refresh Write back 2-bit counter in tag array to store block status (S0,S1,S2,S3) For blocks in S2 (diminishing) state Refresh, if more recently used (MRU) Write-back, if less recently used (LRU) A. Jog et al. DAC 2012

29 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)

30 ASYMMETRIC WRITE ARCHITECTURE WITH REDUNDANT BLOCKS (AWARE) Observation: Write latency of STT-MRAM bit-cells is asymmetric (AP P is ~3X faster than P AP) Idea: Have slow and fast writes Fast writes involve only AP P switching Increase frequency of fast writes Add redundant blocks (RBLs) to the cache data array, which are pre-set to AP state Fast write: Write to clean RBLs, and swap with data blocks (DBLs) Slow write: When no clean RBLs available, update DBL and clean all RBLs that share the word-line 0: AP, 1:P Example : Write operation into DBL2 Fast write operation in 5.5ns (AP P) K. Kwon et al. TVLSI 2014 Slow write operation in 16.5ns (P AP)

31 SUMMARY STT-MRAM is a promising candidate for lower-level caches High density, non-volatility and low leakage Key challenges: High write energy, high write latency, asymmetric writes Suitable architectural optimizations can significantly mitigate the impact of inefficient writes

32 EXPLORING DOMAIN WALL MEMORY FOR ON-CHIP CACHES Ferromagnetic Wire I read/write0 I write1 I shift left Fixed Layer MTJ I shift right

33 BACKGROUND: DOMAIN WALL MEMORY (DWM) Ferromagnetic Wire Free layer I shift left Structure Ferromagnetic wire, Magnetic Tunnel Junction (MTJ) Data stored in magnetic domains of ferromagnetic wire Operation Shift, Read, and Write Read/write operations performed using MTJ Bits shifted along the ferromagnetic wire by applying current pulse MTJ Fixed layer

34 DWM HISTORY Initially proposed for secondary storage and storage class memory applications Recent efforts explore potential for on-chip memory and logic Concept S. Parkin et al. Science NEC Prototypes IBM Applications Accelerator memory (NanoArch 2011) Re-configurable logic (TMag. 2011) General purpose cache (ISLPED 2012, DATE 2013, DAC 2013)

35 DWM: BENEFITS AND CHALLENGES DWM offers excellent density Variable access latency is a unique challenge Access time 1ms 100us 10us 1us 100ns 10ns 1ns FLASH- NAND DWM DRAM STT-MRAM FLASH-NOR PCRAM FeRAM SRAM Cell area per bit (F 2 ) Idle Power Low High Write Energy Low Medium High Comparison of different memory technologies (data from S. Parkin et al. Science 2008) Adverse Favorable

36 BASIC DWM BIT-CELL BL DWM bit-cell structure DWM device, Shift ports and RW port Min. sized shift ports Large RW port SWL T S1... WL Shift Port WL BLB T RW1 RW Port MTJ T RW2 Schematic of DWM cell... Shift Port SWL T S2 Bit-cell area dominated by access transistors BL SWL WL WL SWL Layout of DWM cell BLB R. Venkatesan et al. ISLPED 2012

37 DWM: LOGICAL VIEW Tape capable of storing multiple bits Tape head controlled by shift controller Head status is stored to track the current location of tape head Variable access latency Tradeoff between density and access latency Data Input Shift Controller Head Status Shift enable Tape Head Location 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 DWM Tape R. Venkatesan et al. ISLPED 2012 Density vs. latency tradeoff

38 DWM BIT-CELL OPTIMIZATIONS Shift-based write 1 Read latency I Write0 Free Domain Fixed Domains I Write1 Area Write latency Fast and energy-efficient Varying bits per tape Tradeoff between latency and area RWL WWL SWL BL WWL BL T2 RWL T1 SL Leakage Read energy SWL WL MultibitDWM Read-only ports MTJ 1bitDWM Accelerate performance critical reads RWL WWL SL RWL SWL BL Write energy SRAM STT-MRAM DWM SWL RWL WL Read-only port MultibitDWM with read-only ports SL R. Venkatesan et al. DATE 2013, S. Fukami et al. VLSI Symp. 2009

39 TAPECACHE: DWM-BASED CACHE FOR GENERAL- PURPOSE PROCESSORS Processor Address bits WWL BL T2 RWL T1 SL Tag array L1 cache Data array Tag bits Index bits MTJ Tag array Data array L2 cache Tag array Bitlines 1bitDWM RWL WWL SWL BL Comparator DECODER Data array SWL MultibitDWM L1 caches using 1bitDWM Hybrid L2 cache 1bitDWM-based tag array MultibitDWM-based data array R. Venkatesan et al. ISLPED 2012, R. Venkatesan et al. DATE WL SL Data array Tag array 0.5 Area Energy Area/Energy distribution for 1MB SRAM cache Column mux Sense amp Output drivers Data output

40 TAPECACHE: DWM TAPE CLUSTERS Bit-interleaved DTC organization Each bit of a block in a different DWM tape Parallel read of all bits in a cache block Amortized head control circuitry Cache Block 0 Cache Block 1 Tape 1 N Bits Block 0 Bit 1 Block 1 Bit 1 Block 2 Bit 1 Block 3 Bit 1 Tape 2 N Bits Block 0 Bit 2 Block 1 Bit 2 Block 2 Bit 2 Block 3 Bit 2 Tape 3 N Bits Block 0 Bit 3 Block 1 Bit 3 Block 2 Bit 3 Block 3 Bit 3 Tape Head Tape M N Bits Block 0 Bit M Block 1 Bit M Block 2 Bit M Block 3 Bit M Block N Bit 1 Block N Bit 2 Block N Bit 3 Block N Bit M Bitlines DWM Tape Cluster (DTC) Bit-interleaved DTC organization R. Venkatesan et al. ISLPED 2012

41 TAPECACHE DATA ARRAY ORGANIZATION Data array organization Randomly addressable Data Array DWM Tape Cluster (DTC) Bit-interleaved MultibitDWM bit-cell Head status array Stores current head location for each DTC Shift control logic Determines the no. of shifts required to access the block Index bits split into two parts: Decode and Shift Tag Array Comparator Address Bits Tag Bits Index Bits Decode Bits DECODER Shift Bits Head Status Array DTC DTC DTC DTC DTC DTC Bitlines DTC DTC DTC DTC DTC DTC Sense Amp Output Drivers Shift Control Logic Data Array Data Output BLB WWL SWL SWL RWL WWL BL DWM Tape Cluster(DTC) BLB WWL SWL SWL RWL WWL BL R. Venkatesan et al. ISLPED 2012

42 TAPECACHE MANAGEMENT POLICY Tape head selection Static: Each cache block assigned a tape head statically Dynamic: Select the tape head nearest to required cache block Tape head update Eager: Restore the tape head to a default location after each access Lazy: Update head status to track tape head location Exploits spatial locality Pre-shifting: Predict the bit that is likely to be accessed next and position the tape head Address 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x tape head1 tape head2 Address 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x tape head1 Head Status tape head2 R. Venkatesan et al. ISLPED 2012, R. Venkatesan et al. DATE 2013.

43 DEVICE TO ARCHITECTURE SIMULATION FRAMEWORK Architectural evaluation L2 Core Cache (STT I-L1 D-L1 MRAM) L2 Core Cache (STT I-L1 D-L1 MRAM) OS scheduler L2 Core Cache (STT I-L1 D-L1 MRAM) L3 Cache (DWM) Core L2 Cache (DWM) I-L1 D-L1 L2 Core Cache (STT I-L1 D-L1 MRAM) Core L2 Cache (DWM) I-L1 D-L1 SimpleScalar/gem5 DTC DTC DTC Cache design TAG ARRAY DATA ARRAY DTC DTC DTC DWM-CACTI Comparator Hit/Miss Data. OUT Domain wall engineering Device simulation (C. Augustine et al. IEDM 2011)

44 EXPERIMENTAL SETUP System configuration Processor Core Functional Units I/D-Cache Unified L2 Cache Alpha pipeline, Issue width 4, 3GHz Integer - 8 ALUs, 4 multipliers Floating point - 2 ALUs, 2 multipliers 32 KB, Directly mapped, 32B line size 1MB, 4 way set-associative, 128B line size, 4 banks Technology node: 32nm Benchmarks: SPEC2K6 Evaluated energy, area and performance using isocapacity replacement

45 SPINTRONIC CACHES FOR GPGPUS GPUs are everywhere Huge parallelism ( cores) High memory bandwidth requirements Increasing on-chip memory in GPUs with generation Memory consumes significant fraction of GPU area and power Energy dominated by leakage and read On-chip memory size (MB) GPU Mobile Nvidia GPUs AMD GPUs G80 GT200 GPU Servers Radeon 7970 GF104 Desktop GPU GK110 GK104 Radeon 290X Year

46 STAG: OVERVIEW Streaming Multiprocessor Warp Scheduler & Dispatch Core Core Register File Core SM GPGPU Host Interface Thread Scheduler SM SM Challenge Interleaved accesses from multiple SMs Consecutive accesses from an SM belongs to different warps Low locality Shared memory L1 I-Cache L1 D-Cache Tex. Cache Const. Cache 1bitDWM bit-cell SaPB Tag Array Shift logic Data Array L2 Cache DRAM Interface RWL WWL SWL BL Warp 0 Warp N Warp 0 Warp N SM Tag Array SM Data Array L2 Cache SM SWL MultibitDWM bit-cell Proposal Warp-id based prediction Exploits intra-warp localilty Shift-aware prefetch buffer design Eliminates contention from different SMs R. Venkatesan et al. ISCA 2014 WL SL History Table WarpID Address Stride Confidence 127 0x782747AE xAD xBDA Predicted address Shift aware prefetch buffer

47 STAG: RESULTS Simulation setup GPGPU-Sim Rodinia and Parboil benchmarks Results Processor L1 caches L2 cache Baseline System Configuration 16 SMs with 32 cores/sm, 32KB register file, 48KB shared memory I-cache: 4KB, 64B blocks, 4 ways D-cache: 16 KB, 128B blocks, 4 ways Constant cache: 4KB, 64B block, 2 ways Texture cache: 16KB, 64B blocks, 16 ways 768 KB, 128B blocks, 16 way associative, 6 banks IPC (Normalized) STT -MRAM All-1bitDWM STAG DWM-Ideal Energy (Normalized) STT -MRAM 1bitDWM Proposed DWM-Ideal Compared to iso-area SRAM 26% IPC improvement, 4X energy reduction. Compared to iso-area STT-MRAM R. Venkatesan et al. ISCA % IPC improvement, 3.6X energy reduction

48 SUMMARY While initially proposed for secondary storage, DWM is also a promising candidate for on-chip memories Excellent density, in addition to non-volatility and low leakage Key challenge: Variable latency due to shift operations Suitable bit-cell design, cache architecture and management policies can eliminate the impact on performance

49 REFERENCES: STT-MRAM 1. S. P. Park, S. Gupta, N. Mojumder, A. Raghunathan, and K. Roy, Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture, in Proc. DAC, Kon-Woo Kwon, Sri Harsha Choday, Yusung Kim, Kaushik Roy: AWARE (Asymmetric Write Architecture With REdundant Blocks): A High Write Speed STT-MRAM Cache Architecture. IEEE TVLSI, M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili, An energy efficient cache design using Spin Torque Transfer (STT) RAM, in Proc. ISLPED, X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie, Hybrid cache architecture with disparate memory technologies, in Proc. ISCA, A. Jadidi, M. Arjomand, and H. S. Azad, High-endurance and performance efficient design of hybrid cache architectures through adaptive line replacement, in Proc. ISLPED, C. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. Stan, Relaxing non-volatility for fast and energy-efficient STT-RAM caches, in Proc. HPCA, A. Jog, A. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. Das, Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs, in Proc. DAC 2012.

50 REFERENCES: DWM 1. R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, K. Roy and A. Raghunathan, TapeCache: A High Density, Energy Efficient Cache based on Domain Wall Memory, in Proc. ISLPED, R. Venkatesan, M. Sharad, K. Roy and A. Raghunathan, DWM-TAPESTRI - An Energy Efficient All- Spin Cache using Domain wall Shift based Writes, in Proc. DATE, R. Venkatesan, M. Sharad, V. J. Kozhikottu, C. Augustine, A. Raychowdhury, K. Roy, and A. Raghunathan, Cache Design with Domain Wall Memory, IEEE Transactions on Computers. (accepted for publication) 4. R. Venkatesan, S. Ramasubramanium, S. Venkataramani, K. Roy, and A. Raghunathan, STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies, in Proc. ISCA, M. Sharad, R. Venkatesan, A. Raghunathan, and K. Roy, Multi-level Magnetic RAM using Domain wall Shift for Energy-Efficient, High-Density Caches, in Proc. ISLPED, R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, Dense, Energy-efficient All-Spin Cache Hierarchy using Shift based Writes and Multi-Level Storage, ACM Journal on Emerging Technologies in Computing Systems. (accepted for publication)

51 THANK YOU! Questions?

This material is based upon work supported in part by Intel Corporation /DATE13/ c 2013 EDAA

This material is based upon work supported in part by Intel Corporation /DATE13/ c 2013 EDAA DWM-TAPESTRI - An Energy Efficient All-Spin Cache using Domain wall Shift based Writes Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan School of Electrical and Computer Engineering,

More information

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Novel Nonvolatile Memory Hierarchies to Realize Normally-Off Mobile Processors ASP-DAC 2014 Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

Computing with Spintronics: Circuits and architectures

Computing with Spintronics: Circuits and architectures Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2014 Computing with Spintronics: Circuits and architectures Rangharajan Venkatesan Purdue University Follow this

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

A Coherent Hybrid SRAM and STT-RAM L1 Cache Architecture for Shared Memory Multicores

A Coherent Hybrid SRAM and STT-RAM L1 Cache Architecture for Shared Memory Multicores A Coherent Hybrid and L1 Cache Architecture for Shared Memory Multicores Jianxing Wang, Yenni Tim Weng-Fai Wong, Zhong-Liang Ong Zhenyu Sun, Hai (Helen) Li School of Computing Swanson School of Engineering

More information

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Guangyu Sun 1, Yaojun Zhang 2, Yu Wang 3, Yiran Chen 2 1 Center for Energy-efficient Computing and Applications, Peking University

More information

An Efficient STT-RAM Last Level Cache Architecture for GPUs

An Efficient STT-RAM Last Level Cache Architecture for GPUs An Efficient STT-RAM Last Level Cache Architecture for GPUs Mohammad Hossein Samavatian, Hamed Abbasitabar, Mohammad Arjomand, and Hamid Sarbazi-Azad, HPCAN Lab, Computer Engineering Department, Sharif

More information

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM, A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing, RAM, or edram Justin Bates Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 3816-36

More information

Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years

Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Technologies over the Years Sakeenah Khan EEL 30C: Computer Organization Summer Semester Department of Electrical and Computer

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Emerging NVM Enabled Storage Architecture:

Emerging NVM Enabled Storage Architecture: Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections ) Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency

More information

Cache/Memory Optimization. - Krishna Parthaje

Cache/Memory Optimization. - Krishna Parthaje Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values

A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values Mohsen Imani, Abbas Rahimi, Yeseong Kim, Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093,

More information

DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS

DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS Jiacong He Department of Electrical Engineering The University of Texas at Dallas 800 W Campbell Rd, Richardson, TX, USA Email: jiacong.he@utdallas.edu

More information

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing

More information

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer

More information

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Caches. Samira Khan March 23, 2017

Caches. Samira Khan March 23, 2017 Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

Revolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy

Revolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy Revolutionizing Technological s such as and their Multiple Implementation in the Cache Level Hierarchy Michael Mosquera Department of Electrical and Computer Engineering University of Central Florida Orlando,

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

1/19/2009. Data Locality. Exploiting Locality: Caches

1/19/2009. Data Locality. Exploiting Locality: Caches Spring 2009 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Data Locality Temporal: if data item needed now, it is likely to be needed again in near future Spatial: if data item needed now, nearby

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture

ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture Jie Zhang, Miryeong Kwon, Changyoung Park, Myoungsoo Jung, Songkuk Kim Computer Architecture and Memory

More information

Architectural Aspects in Design and Analysis of SOTbased

Architectural Aspects in Design and Analysis of SOTbased Architectural Aspects in Design and Analysis of SOTbased Memories Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril & Mehdi Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August The Engine & DRAM Endurance and Speed with STT MRAM Les Crudele / Andrew J. Walker PhD August 2018 1 Contents The Leaking Creaking Pyramid STT-MRAM: A Compelling Replacement STT-MRAM: A Unique Endurance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Cache Memory Configurations and Their Respective Energy Consumption

Cache Memory Configurations and Their Respective Energy Consumption Cache Memory Configurations and Their Respective Energy Consumption Dylan Petrae Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 Abstract When it

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

CS311 Lecture 21: SRAM/DRAM/FLASH

CS311 Lecture 21: SRAM/DRAM/FLASH S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags

Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags Jinwook Jung, Yohei Nakata, Masahiko Yoshimoto, and Hiroshi Kawaguchi Graduate School of System Informatics, Kobe

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit

More information

AC-DIMM: Associative Computing with STT-MRAM

AC-DIMM: Associative Computing with STT-MRAM AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Memory hierarchy Outline

Memory hierarchy Outline Memory hierarchy Outline Performance impact Principles of memory hierarchy Memory technology and basics 2 Page 1 Performance impact Memory references of a program typically determine the ultimate performance

More information

CS 6354: Memory Hierarchy I. 29 August 2016

CS 6354: Memory Hierarchy I. 29 August 2016 1 CS 6354: Memory Hierarchy I 29 August 2016 Survey results 2 Processor/Memory Gap Figure 2.2 Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 7: Memory Hierarchy and Caches Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Abstraction: Virtual

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory

Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory CHAPTER Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory 20 M.H. Samavatian 1, M. Arjomand 1, R. Bashizade 1, H. Sarbazi-Azad 1,2 Sharif University of Technology, Tehran, Iran

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Register File Organization

Register File Organization Register File Organization Sudhakar Yalamanchili unless otherwise noted (1) To understand the organization of large register files used in GPUs Objective Identify the performance bottlenecks and opportunities

More information

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Wujie Wen, Yaojun Zhang, Lu Zhang and Yiran Chen University of Pittsburgh Loadsa: a slang language means lots of Outline Introduction

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer) Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

15-740/ Computer Architecture, Fall 2011 Midterm Exam II

15-740/ Computer Architecture, Fall 2011 Midterm Exam II 15-740/18-740 Computer Architecture, Fall 2011 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Justin Meza, Yoongu Kim Date: December 2, 2011 Name: Instructions: Problem I (69 points) : Problem

More information

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory Youngbin Jin, Mustafa Shihab, and Myoungsoo Jung Computer Architecture and Memory Systems Laboratory Department of Electrical

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need

More information