SPINTRONIC MEMORY ARCHITECTURE
|
|
- Shauna Francis
- 5 years ago
- Views:
Transcription
1 SPINTRONIC MEMORY ARCHITECTURE Anand Raghunathan Integrated Systems Laboratory School of ECE, Purdue University Rangharajan Venkatesan Shankar Ganesh Ramasubramanian, Ashish Ranjan Kaushik Roy 7 th NCN-NEEDS Summer School Spintronics: Science, Circuits, and Systems
2 AGENDA Background Memory architecture Designing caches with STT-MRAM Exploring DWM for on-chip memory
3 DEMAND FOR ON-CHIP MEMORY Over 50% of chip area devoted to memory Multi-cores, increasing processor-memory gap accelerate the demand for onchip memory Cache Transistors (Million) 1000 Cache transistors 800 % chip transistors in cache Cache trends in Intel microprocessors % chip transistors in cache On-chip memory in SoCs
4 STORAGE HIERARCHIES Fundamental tradeoff between speed and capacity Would like to have both Solution: Organize memory/storage in a hierarchical manner Source: Avnet
5 MICROPROCESSOR ON-CHIP MEMORY HIERARCHY Microprocessors utilize multiple levels of on-chip cache to hide memory latency and improve bandwidth Exploits key properties of memory accesses Temporal Locality: If a location is referenced it is likely to be referenced again in the near future. Spatial Locality: If a location is referenced it is likely that locations near it will be referenced in the near future. Typical on-chip memory hierarchy Intel Core i7 Mobile CPU cache hierarchy
6 CACHE BASICS A cache holds a copy of some subset of data from the next lower level of the hierarchy Each cache line consists of an address tag and a data block Cache address = Tag + Index + Offset Source: MIT OCW, Course 6.823
7 CACHE BASICS Cache access Given an address, check if tag is in cache If yes, cache hit return data from cache If no, cache miss retrieve data from next level of hierarchy Where should the data be written in the cache? What if data that is already present at that location? Source: MIT OCW, Course Average access time = Hit time + Miss rate * Miss penalty
8 CACHE BASICS Caches help hide memory latency and increase throughput Why? LITTLE S LAW: The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = λw.
9 CACHE BASICS Mapping of locations (addresses) may be direct (many-to-one) or associative (many-to-many) Direct-mapped 2-way associative Why associativity? Source: Wikipedia
10 CACHE BASICS A cache consists of a tag array, data array, and control logic Tag array lookup indicates whether access is a hit or miss Tag and data arrays may be accessed serially or in parallel Organization of a direct-mapped cache Source: MIT OCW, Course Source: Wikipedia
11 CACHE BASICS Various design choices present a rich tradeoff space Cache Size Cache Block Associativity Replacement policy Inclusivity Write policy Average access time = Hit time + Miss rate * Miss penalty
12 THE CASE FOR SPINTRONIC ON-CHIP MEMORIES The combination of endurance, speed, non-volatility and density make spintronic memories promising for on-chip applications Source: Toshiba Corp., IEDM 2012
13 DESIGNING CACHES WITH STT-MRAM Fixed Layer Tunneling oxide Free layer
14 STT-MRAM CACHE CHARACTERISTICS Iso-capacity comparison of 1MB SRAM and STT-MRAM caches Area Read latency Write latency SRAM STT-MRAM Leakage Read energy Write energy Adverse Favorable
15 WHERE IS STT-MRAM USED IN CACHE HIERARCHY? CPU Small size Optimized for latency L1 cache (SRAM) Large size Optimized for density L2 cache (STT-MRAM) Spin-Transfer Torque MRAM (STT-MRAM) STT-MRAM is suitable for lower levels of cache hierarchy (due to high write latency) Architectural techniques required to address inefficient writes
16 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)
17 HYBRID SRAM/STT-MRAM L2 CACHE Observation: Writes are concentrated to a small portion of address space Conventional Cache Tag array Data array W0 W1 W2 W3 W4 W5 W6 W7 Way0 Way1 Way2 Way3 Way4 Way5 Way6 Way7 Hybrid cache Split cache into read and write intensive regions Write region SRAM Read region STT-MRAM Policy to control which region a cache block resides in Tag W0 W1 Data Way0 Way1 Tag array W2 W3 W4 W5 W6 W7 Control Policy Write region Hybrid Cache Way2 Way3 Way4 Way5 Way6 Way7 Read region Data array X. Wu et al. ISCA 2009
18 HYBRID SRAM/STT-MRAM L2 CACHE Aims to combine the benefits of SRAM and STT-MRAM Large parts of cache with STT-MRAM Small part of cache with SRAM SRAM used to store write-intensive cache blocks Efficient writes High density, Low leakage SRAM STT-MRAM Hybrid Leakage Density Write
19 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)
20 REDUCING WRITE INTENSITY: WRITE BIASING CPU L1 cache (SRAM) L2 cache (STT-MRAM) Dirty block evictions from L1 cache Write on miss Typical eviction policies (LRU, FIFO) does not distinguish between clean and dirty blocks Only dirty blocks need to be written to L2! Write biasing Modify eviction policy to increase residency of dirty blocks in L1
21 REDUCING WRITE INTENSITY: WRITE BIASING Traditional cache eviction policy (LRU) Status stack of 4-way set-associative cache Load C Store E Highest priority A B C D C A B D E C A B Lowest priority Last accessed block is always inserted at the top of stack (TOS) M. Rasquinha et al. ISLPED 2010 Write biasing policy Status stack of 4-way set-associative cache Clean block Dirty block Insert position K=2 Highest Lowest priority priority A B C D Load C C A B D Promote to TOS Store E C A E B Insert to pos. K Store E E C A B Promote to TOS Store G E C G A Insert to pos. K Store G G E C A Promote to TOS Load A G E A C Promote to pos.k as all blocks above Blocks with repeated writes prioritized K are dirty over recently accessed clean blocks
22 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)
23 REDUCING WRITE INTENSITY Observation: Most bits in dirty cache blocks are unchanged Redundant writes upon eviction Solution Partial line update Redundant bit writes to L2 in SPEC 2006 benchmarks P. Zhou et al. ICCAD 2009
24 PARTIAL LINE UPDATE SCHEME Tag Array Data Array Tag H Tag H Data Array CPU CPU L1 CACHE (SRAM) One Cache Line 64B Write to L2 L1 CACHE (SRAM) History Partial Lines in a 64B Line 16B 16B 16B 16B Write to L2 X X X 64B 16B 16B 16B 16B L2 CACHE (SRAM) Tag Array Data Array L2 CACHE (MRAM) Tag Array Data Array Conventional Cache When a dirty cache line is evicted from L1 cache, the entire cache line is written to L2 cache. Partial line update Track dirty cache lines at finer granularity using additional bits in tag array of L1 cache When cache line is evicted, only dirty parts written to L2 cache S. P. Park et al. DAC 2012
25 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)
26 VOLATILE STT-MRAM Lifetime of cache blocks in caches is small Lifetimes of cache blocks in L2 cache Relax retention time to improve write efficiency Retention Time 10 years 1 sec 10 ms Write latency 11ns 6ns 3ns Retention time vs. write latency for STT-MRAM L2 cache A. Jog et al. DAC 2012
27 VOLATILE STT-MRAM ARCHITECTURE Reducing retention time makes cache volatile Requires refresh or write-back for dirty blocks State machine to determine block status 2-bit state machine for every block 4 states: S0,S1,S2,S3 Initial state: S0 (just written) Intermediate state: S1 Diminishing state: S2 (about to become invalid) Invalid state: S3 A. Jog et al. DAC 2012 Write Valid 0 10 ms (Retention time) Block lifetime T T T Invalid S0 S1 S2 S3 D W W State machine to determine block status W T = Counter pulse width W = Write D = Diminishing I = Invalid I
28 VOLATILE STT-MRAM ARCHITECTURE Stores block status 2-bit counter/block Tag Data MRU LRU Way ID Block State S0 S2 S1 S1 S2 S1 S2 S0 S1 S0 S2 S2 S1 S1 S0 S2 Refresh Write back 2-bit counter in tag array to store block status (S0,S1,S2,S3) For blocks in S2 (diminishing) state Refresh, if more recently used (MRU) Write-back, if less recently used (LRU) A. Jog et al. DAC 2012
29 ARCHITECTURAL OPTIMIZATIONS FOR STT-MRAM Hybrid Cache (X. Wu et al. ISCA 2009, A. Jadidi et al. ISLPED 2011) Reduce write intensity Write biasing (M. Rasquinha et al. ISLPED 2010) Partial line update (S. P. Park et al. DAC 2012 ) Adverse Favorable Volatile STT-MRAM (C. W. Smullen et al. HPCA 2011, A. Jog et al. DAC 2012) Write asymmetry aware architecture (K. Kwon et al. TVLSI 2014)
30 ASYMMETRIC WRITE ARCHITECTURE WITH REDUNDANT BLOCKS (AWARE) Observation: Write latency of STT-MRAM bit-cells is asymmetric (AP P is ~3X faster than P AP) Idea: Have slow and fast writes Fast writes involve only AP P switching Increase frequency of fast writes Add redundant blocks (RBLs) to the cache data array, which are pre-set to AP state Fast write: Write to clean RBLs, and swap with data blocks (DBLs) Slow write: When no clean RBLs available, update DBL and clean all RBLs that share the word-line 0: AP, 1:P Example : Write operation into DBL2 Fast write operation in 5.5ns (AP P) K. Kwon et al. TVLSI 2014 Slow write operation in 16.5ns (P AP)
31 SUMMARY STT-MRAM is a promising candidate for lower-level caches High density, non-volatility and low leakage Key challenges: High write energy, high write latency, asymmetric writes Suitable architectural optimizations can significantly mitigate the impact of inefficient writes
32 EXPLORING DOMAIN WALL MEMORY FOR ON-CHIP CACHES Ferromagnetic Wire I read/write0 I write1 I shift left Fixed Layer MTJ I shift right
33 BACKGROUND: DOMAIN WALL MEMORY (DWM) Ferromagnetic Wire Free layer I shift left Structure Ferromagnetic wire, Magnetic Tunnel Junction (MTJ) Data stored in magnetic domains of ferromagnetic wire Operation Shift, Read, and Write Read/write operations performed using MTJ Bits shifted along the ferromagnetic wire by applying current pulse MTJ Fixed layer
34 DWM HISTORY Initially proposed for secondary storage and storage class memory applications Recent efforts explore potential for on-chip memory and logic Concept S. Parkin et al. Science NEC Prototypes IBM Applications Accelerator memory (NanoArch 2011) Re-configurable logic (TMag. 2011) General purpose cache (ISLPED 2012, DATE 2013, DAC 2013)
35 DWM: BENEFITS AND CHALLENGES DWM offers excellent density Variable access latency is a unique challenge Access time 1ms 100us 10us 1us 100ns 10ns 1ns FLASH- NAND DWM DRAM STT-MRAM FLASH-NOR PCRAM FeRAM SRAM Cell area per bit (F 2 ) Idle Power Low High Write Energy Low Medium High Comparison of different memory technologies (data from S. Parkin et al. Science 2008) Adverse Favorable
36 BASIC DWM BIT-CELL BL DWM bit-cell structure DWM device, Shift ports and RW port Min. sized shift ports Large RW port SWL T S1... WL Shift Port WL BLB T RW1 RW Port MTJ T RW2 Schematic of DWM cell... Shift Port SWL T S2 Bit-cell area dominated by access transistors BL SWL WL WL SWL Layout of DWM cell BLB R. Venkatesan et al. ISLPED 2012
37 DWM: LOGICAL VIEW Tape capable of storing multiple bits Tape head controlled by shift controller Head status is stored to track the current location of tape head Variable access latency Tradeoff between density and access latency Data Input Shift Controller Head Status Shift enable Tape Head Location 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 DWM Tape R. Venkatesan et al. ISLPED 2012 Density vs. latency tradeoff
38 DWM BIT-CELL OPTIMIZATIONS Shift-based write 1 Read latency I Write0 Free Domain Fixed Domains I Write1 Area Write latency Fast and energy-efficient Varying bits per tape Tradeoff between latency and area RWL WWL SWL BL WWL BL T2 RWL T1 SL Leakage Read energy SWL WL MultibitDWM Read-only ports MTJ 1bitDWM Accelerate performance critical reads RWL WWL SL RWL SWL BL Write energy SRAM STT-MRAM DWM SWL RWL WL Read-only port MultibitDWM with read-only ports SL R. Venkatesan et al. DATE 2013, S. Fukami et al. VLSI Symp. 2009
39 TAPECACHE: DWM-BASED CACHE FOR GENERAL- PURPOSE PROCESSORS Processor Address bits WWL BL T2 RWL T1 SL Tag array L1 cache Data array Tag bits Index bits MTJ Tag array Data array L2 cache Tag array Bitlines 1bitDWM RWL WWL SWL BL Comparator DECODER Data array SWL MultibitDWM L1 caches using 1bitDWM Hybrid L2 cache 1bitDWM-based tag array MultibitDWM-based data array R. Venkatesan et al. ISLPED 2012, R. Venkatesan et al. DATE WL SL Data array Tag array 0.5 Area Energy Area/Energy distribution for 1MB SRAM cache Column mux Sense amp Output drivers Data output
40 TAPECACHE: DWM TAPE CLUSTERS Bit-interleaved DTC organization Each bit of a block in a different DWM tape Parallel read of all bits in a cache block Amortized head control circuitry Cache Block 0 Cache Block 1 Tape 1 N Bits Block 0 Bit 1 Block 1 Bit 1 Block 2 Bit 1 Block 3 Bit 1 Tape 2 N Bits Block 0 Bit 2 Block 1 Bit 2 Block 2 Bit 2 Block 3 Bit 2 Tape 3 N Bits Block 0 Bit 3 Block 1 Bit 3 Block 2 Bit 3 Block 3 Bit 3 Tape Head Tape M N Bits Block 0 Bit M Block 1 Bit M Block 2 Bit M Block 3 Bit M Block N Bit 1 Block N Bit 2 Block N Bit 3 Block N Bit M Bitlines DWM Tape Cluster (DTC) Bit-interleaved DTC organization R. Venkatesan et al. ISLPED 2012
41 TAPECACHE DATA ARRAY ORGANIZATION Data array organization Randomly addressable Data Array DWM Tape Cluster (DTC) Bit-interleaved MultibitDWM bit-cell Head status array Stores current head location for each DTC Shift control logic Determines the no. of shifts required to access the block Index bits split into two parts: Decode and Shift Tag Array Comparator Address Bits Tag Bits Index Bits Decode Bits DECODER Shift Bits Head Status Array DTC DTC DTC DTC DTC DTC Bitlines DTC DTC DTC DTC DTC DTC Sense Amp Output Drivers Shift Control Logic Data Array Data Output BLB WWL SWL SWL RWL WWL BL DWM Tape Cluster(DTC) BLB WWL SWL SWL RWL WWL BL R. Venkatesan et al. ISLPED 2012
42 TAPECACHE MANAGEMENT POLICY Tape head selection Static: Each cache block assigned a tape head statically Dynamic: Select the tape head nearest to required cache block Tape head update Eager: Restore the tape head to a default location after each access Lazy: Update head status to track tape head location Exploits spatial locality Pre-shifting: Predict the bit that is likely to be accessed next and position the tape head Address 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x tape head1 tape head2 Address 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x tape head1 Head Status tape head2 R. Venkatesan et al. ISLPED 2012, R. Venkatesan et al. DATE 2013.
43 DEVICE TO ARCHITECTURE SIMULATION FRAMEWORK Architectural evaluation L2 Core Cache (STT I-L1 D-L1 MRAM) L2 Core Cache (STT I-L1 D-L1 MRAM) OS scheduler L2 Core Cache (STT I-L1 D-L1 MRAM) L3 Cache (DWM) Core L2 Cache (DWM) I-L1 D-L1 L2 Core Cache (STT I-L1 D-L1 MRAM) Core L2 Cache (DWM) I-L1 D-L1 SimpleScalar/gem5 DTC DTC DTC Cache design TAG ARRAY DATA ARRAY DTC DTC DTC DWM-CACTI Comparator Hit/Miss Data. OUT Domain wall engineering Device simulation (C. Augustine et al. IEDM 2011)
44 EXPERIMENTAL SETUP System configuration Processor Core Functional Units I/D-Cache Unified L2 Cache Alpha pipeline, Issue width 4, 3GHz Integer - 8 ALUs, 4 multipliers Floating point - 2 ALUs, 2 multipliers 32 KB, Directly mapped, 32B line size 1MB, 4 way set-associative, 128B line size, 4 banks Technology node: 32nm Benchmarks: SPEC2K6 Evaluated energy, area and performance using isocapacity replacement
45 SPINTRONIC CACHES FOR GPGPUS GPUs are everywhere Huge parallelism ( cores) High memory bandwidth requirements Increasing on-chip memory in GPUs with generation Memory consumes significant fraction of GPU area and power Energy dominated by leakage and read On-chip memory size (MB) GPU Mobile Nvidia GPUs AMD GPUs G80 GT200 GPU Servers Radeon 7970 GF104 Desktop GPU GK110 GK104 Radeon 290X Year
46 STAG: OVERVIEW Streaming Multiprocessor Warp Scheduler & Dispatch Core Core Register File Core SM GPGPU Host Interface Thread Scheduler SM SM Challenge Interleaved accesses from multiple SMs Consecutive accesses from an SM belongs to different warps Low locality Shared memory L1 I-Cache L1 D-Cache Tex. Cache Const. Cache 1bitDWM bit-cell SaPB Tag Array Shift logic Data Array L2 Cache DRAM Interface RWL WWL SWL BL Warp 0 Warp N Warp 0 Warp N SM Tag Array SM Data Array L2 Cache SM SWL MultibitDWM bit-cell Proposal Warp-id based prediction Exploits intra-warp localilty Shift-aware prefetch buffer design Eliminates contention from different SMs R. Venkatesan et al. ISCA 2014 WL SL History Table WarpID Address Stride Confidence 127 0x782747AE xAD xBDA Predicted address Shift aware prefetch buffer
47 STAG: RESULTS Simulation setup GPGPU-Sim Rodinia and Parboil benchmarks Results Processor L1 caches L2 cache Baseline System Configuration 16 SMs with 32 cores/sm, 32KB register file, 48KB shared memory I-cache: 4KB, 64B blocks, 4 ways D-cache: 16 KB, 128B blocks, 4 ways Constant cache: 4KB, 64B block, 2 ways Texture cache: 16KB, 64B blocks, 16 ways 768 KB, 128B blocks, 16 way associative, 6 banks IPC (Normalized) STT -MRAM All-1bitDWM STAG DWM-Ideal Energy (Normalized) STT -MRAM 1bitDWM Proposed DWM-Ideal Compared to iso-area SRAM 26% IPC improvement, 4X energy reduction. Compared to iso-area STT-MRAM R. Venkatesan et al. ISCA % IPC improvement, 3.6X energy reduction
48 SUMMARY While initially proposed for secondary storage, DWM is also a promising candidate for on-chip memories Excellent density, in addition to non-volatility and low leakage Key challenge: Variable latency due to shift operations Suitable bit-cell design, cache architecture and management policies can eliminate the impact on performance
49 REFERENCES: STT-MRAM 1. S. P. Park, S. Gupta, N. Mojumder, A. Raghunathan, and K. Roy, Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture, in Proc. DAC, Kon-Woo Kwon, Sri Harsha Choday, Yusung Kim, Kaushik Roy: AWARE (Asymmetric Write Architecture With REdundant Blocks): A High Write Speed STT-MRAM Cache Architecture. IEEE TVLSI, M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili, An energy efficient cache design using Spin Torque Transfer (STT) RAM, in Proc. ISLPED, X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie, Hybrid cache architecture with disparate memory technologies, in Proc. ISCA, A. Jadidi, M. Arjomand, and H. S. Azad, High-endurance and performance efficient design of hybrid cache architectures through adaptive line replacement, in Proc. ISLPED, C. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. Stan, Relaxing non-volatility for fast and energy-efficient STT-RAM caches, in Proc. HPCA, A. Jog, A. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. Das, Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs, in Proc. DAC 2012.
50 REFERENCES: DWM 1. R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, K. Roy and A. Raghunathan, TapeCache: A High Density, Energy Efficient Cache based on Domain Wall Memory, in Proc. ISLPED, R. Venkatesan, M. Sharad, K. Roy and A. Raghunathan, DWM-TAPESTRI - An Energy Efficient All- Spin Cache using Domain wall Shift based Writes, in Proc. DATE, R. Venkatesan, M. Sharad, V. J. Kozhikottu, C. Augustine, A. Raychowdhury, K. Roy, and A. Raghunathan, Cache Design with Domain Wall Memory, IEEE Transactions on Computers. (accepted for publication) 4. R. Venkatesan, S. Ramasubramanium, S. Venkataramani, K. Roy, and A. Raghunathan, STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies, in Proc. ISCA, M. Sharad, R. Venkatesan, A. Raghunathan, and K. Roy, Multi-level Magnetic RAM using Domain wall Shift for Energy-Efficient, High-Density Caches, in Proc. ISLPED, R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, Dense, Energy-efficient All-Spin Cache Hierarchy using Shift based Writes and Multi-Level Storage, ACM Journal on Emerging Technologies in Computing Systems. (accepted for publication)
51 THANK YOU! Questions?
This material is based upon work supported in part by Intel Corporation /DATE13/ c 2013 EDAA
DWM-TAPESTRI - An Energy Efficient All-Spin Cache using Domain wall Shift based Writes Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan School of Electrical and Computer Engineering,
More informationNovel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014
Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced
More informationEmerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationMohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu
Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly
More informationComputing with Spintronics: Circuits and architectures
Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2014 Computing with Spintronics: Circuits and architectures Rangharajan Venkatesan Purdue University Follow this
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationA Coherent Hybrid SRAM and STT-RAM L1 Cache Architecture for Shared Memory Multicores
A Coherent Hybrid and L1 Cache Architecture for Shared Memory Multicores Jianxing Wang, Yenni Tim Weng-Fai Wong, Zhong-Liang Ong Zhenyu Sun, Hai (Helen) Li School of Computing Swanson School of Engineering
More informationImproving Energy Efficiency of Write-asymmetric Memories by Log Style Write
Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Guangyu Sun 1, Yaojun Zhang 2, Yu Wang 3, Yiran Chen 2 1 Center for Energy-efficient Computing and Applications, Peking University
More informationAn Efficient STT-RAM Last Level Cache Architecture for GPUs
An Efficient STT-RAM Last Level Cache Architecture for GPUs Mohammad Hossein Samavatian, Hamed Abbasitabar, Mohammad Arjomand, and Hamid Sarbazi-Azad, HPCAN Lab, Computer Engineering Department, Sharif
More informationA Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,
A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing, RAM, or edram Justin Bates Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 3816-36
More informationAnalysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years
Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Technologies over the Years Sakeenah Khan EEL 30C: Computer Organization Summer Semester Department of Electrical and Computer
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationOAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationEmerging NVM Enabled Storage Architecture:
Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationLecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )
Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency
More informationCache/Memory Optimization. - Krishna Parthaje
Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationA Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values
A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values Mohsen Imani, Abbas Rahimi, Yeseong Kim, Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093,
More informationDESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS
DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS Jiacong He Department of Electrical Engineering The University of Texas at Dallas 800 W Campbell Rd, Richardson, TX, USA Email: jiacong.he@utdallas.edu
More informationContents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory
Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing
More informationBIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory
HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer
More informationHybrid Cache Architecture (HCA) with Disparate Memory Technologies
Hybrid Cache Architecture (HCA) with Disparate Memory Technologies Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie Pennsylvania State University IBM Austin Research Laboratory Acknowledgement:
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationCaches. Samira Khan March 23, 2017
Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationRevolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy
Revolutionizing Technological s such as and their Multiple Implementation in the Cache Level Hierarchy Michael Mosquera Department of Electrical and Computer Engineering University of Central Florida Orlando,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationPhase Change Memory An Architecture and Systems Perspective
Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,
More informationA Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors
, July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As
More information1/19/2009. Data Locality. Exploiting Locality: Caches
Spring 2009 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Data Locality Temporal: if data item needed now, it is likely to be needed again in near future Spatial: if data item needed now, nearby
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture
ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture Jie Zhang, Miryeong Kwon, Changyoung Park, Myoungsoo Jung, Songkuk Kim Computer Architecture and Memory
More informationArchitectural Aspects in Design and Analysis of SOTbased
Architectural Aspects in Design and Analysis of SOTbased Memories Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril & Mehdi Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationCouture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung
Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationThe Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August
The Engine & DRAM Endurance and Speed with STT MRAM Les Crudele / Andrew J. Walker PhD August 2018 1 Contents The Leaking Creaking Pyramid STT-MRAM: A Compelling Replacement STT-MRAM: A Unique Endurance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCache Memory Configurations and Their Respective Energy Consumption
Cache Memory Configurations and Their Respective Energy Consumption Dylan Petrae Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 Abstract When it
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationLimiting the Number of Dirty Cache Lines
Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationCS311 Lecture 21: SRAM/DRAM/FLASH
S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University
More informationComputer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg
Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationQuestion?! Processor comparison!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!
More informationEnergy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags
Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags Jinwook Jung, Yohei Nakata, Masahiko Yoshimoto, and Hiroshi Kawaguchi Graduate School of System Informatics, Kobe
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationCENG3420 Lecture 08: Memory Organization
CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory
More informationSurvey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed
Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit
More informationAC-DIMM: Associative Computing with STT-MRAM
AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory hierarchy Outline
Memory hierarchy Outline Performance impact Principles of memory hierarchy Memory technology and basics 2 Page 1 Performance impact Memory references of a program typically determine the ultimate performance
More informationCS 6354: Memory Hierarchy I. 29 August 2016
1 CS 6354: Memory Hierarchy I 29 August 2016 Survey results 2 Processor/Memory Gap Figure 2.2 Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time
More informationComputer Architecture
Computer Architecture Lecture 7: Memory Hierarchy and Caches Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Abstraction: Virtual
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationECE468 Computer Organization and Architecture. Memory Hierarchy
ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationArchitecting the last-level cache for GPUs using STT-MRAM nonvolatile memory
CHAPTER Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory 20 M.H. Samavatian 1, M. Arjomand 1, R. Bashizade 1, H. Sarbazi-Azad 1,2 Sharif University of Technology, Tehran, Iran
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationRegister File Organization
Register File Organization Sudhakar Yalamanchili unless otherwise noted (1) To understand the organization of large register files used in GPUs Objective Identify the performance bottlenecks and opportunities
More informationLoadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array
Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Wujie Wen, Yaojun Zhang, Lu Zhang and Yiran Chen University of Pittsburgh Loadsa: a slang language means lots of Outline Introduction
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationCaches 3/23/17. Agenda. The Dataflow Model (of a Computer)
Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More information15-740/ Computer Architecture, Fall 2011 Midterm Exam II
15-740/18-740 Computer Architecture, Fall 2011 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Justin Meza, Yoongu Kim Date: December 2, 2011 Name: Instructions: Problem I (69 points) : Problem
More informationArea, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory
Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory Youngbin Jin, Mustafa Shihab, and Myoungsoo Jung Computer Architecture and Memory Systems Laboratory Department of Electrical
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationA Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers
A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea
More informationMemory systems. Memory technology. Memory technology Memory hierarchy Virtual memory
Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need
More information