VLSID KOLKATA, INDIA January 4-8, 2016
|
|
- Gilbert Robertson
- 6 years ago
- Views:
Transcription
1 VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures Ishan Thakkar, Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State University, Fort Collins, CO, U.S.A. {ishan.thakkar, DOI /VLSID
2 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 1
3 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 2
4 Bit Line Introduction Main memory is DRAM It is a critical component of all computing systems: server, desktop, mobile, embedded, sensor DRAM stores data in cell capacitor Fully charged cell-capacitor logic 1 Fully discharged cell-capacitor logic 0 DRAM: Dynamic Random Access Memory Word Line Access Transistor DRAM cell loses data over time, as cell-capacitor leaks charge over time For temperatures below 85 C, DRAM cell loses data in 64ms For higher temperatures, DRAM cell loses data at faster rate Cell Capacitor To preserve data integrity, the charge on each DRAM cell (cell-capacitor) must be periodically restored or refreshed. 3
5 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 4
6 Background on DRAM Structure Based on their structure, DRAMs are classified in two categories: 1. 2D DRAMs: Planar single layer DRAMs 2. 3D-Stacked DRAMs: Multiple 2D DRAM layers stacked on one-another using TSVs 2D DRAM structure TSV: Through Silicon Via 2D DRAM Structure Hierarchy Rank Chip Bank Subarray Bitcell 5
7 <N> <N> <N> <N> <N> 2D DRAM: Rank and Chip Structure DRAM Rank DRAM Chip DRAM Chip... 2D DRAM rank: Multiple chips work in tandem Mux 6
8 3D-Stacked DRAM Structure In this paper, we consider Hybrid Memory Cube (HMC), which is as a standard for 3D-Stacked DRAMs defined by a consortium of industries HMC Structure Hierarchy Vault Bank Subarray Bitcell Hybrid Memory Cube 7
9 Column Address Decoder Row Address Decoder Rows Subarray DRAM Bank Structure 3D-Stacked and 2D DRAMs have similar bank structures Bank Peripherals Columns Bank Core Sense Amplifiers Sense Amplifiers Row Buffer Column Mux Data bits 8
10 Bit Line Row Address Bit Line DRAM Subarray Structure 3D-Stacked and 2D DRAMs have similar subarray structures Word Line Sense Amps Access Transistor Cell Capacitor Word Line DRAM Cell DRAM Cell Sense Amp Sense Amp Sense Amp 9
11 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations PRECHARGE Sense Amplifiers All bitlines of the bank are pre-charged to 0.5 V DD Global Row Dec. Sense Amplifiers Row Buffer Column Address Decoder Column Mux 10
12 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row 4 Sense Amplifiers Row Address Global Row Dec. Sense Amplifiers The target row is opened, Row Buffer Column Address Decoder Column Mux 11
13 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row Address Global Row Dec. Sense Row Amplifiers 4 Sense Amplifiers The target row is opened, then it s captured by SAs Row Buffer Column Address Decoder Column Mux 12
14 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row 4 Sense Row Amplifiers 4 Row Address Global Row Dec. Sense Amplifiers Row Buffer SAs drive each bitline fully either to V DD or 0V restore the open row Column Address Decoder Column Mux 13
15 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row 4 Sense Amplifiers Row Address Global Row Dec. Sense Amplifiers Row Row Buffer 4 Open row is stored in global row buffer Column Address Decoder Column Mux 14
16 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row 4 READ Sense Row Amplifiers 4 Row Address Global Row Dec. Column 1 Column Address Decoder Sense Amplifiers Row Buffer Column Mux Target data block is selected, and then multiplexed out from row buffer 15
17 EN Subarray Dec. Global Address Latch =ID? =ID? EN Subarray Dec. Basic DRAM Operations Subarray ID: 1 PRECHARGE ACTIVATION Row 4 Row 4 READ Sense Row Amplifiers 4 Row Address Global Row Dec. Sense Amplifiers Column Address Decoder Row Buffer A duet of PRECHARGE-ACTIVATION operations restores/refreshes the target row dummy Column Mux PRECHARGE-ACTIVATION Column 1 operations are performed to refresh the rows 16
18 Refresh Refresh: 2D Vs 3D-Stacked DRAMs 3D-Stacked DRAMs have Higher capacity/density more rows need to be refreshed Higher power density higher operating temperature (>85 C) smaller retention period (time before DRAM cells lose data) of 32ms than that of 64ms for 2D DRAMs Thus, refresh problem for 3D-Stacked DRAMs is more critical Therefore, in this study, we target a standardized 3D-Stacked DRAM architecture HMC Dummy ACTIVATION-PRECHARGE are performed on all rows every retention cycle (32 ms) To prevent long pauses a JEDEC standardized Distributed Refresh method is used 17
19 Background: Refresh Operation Distributed Refresh JEDEC standardized method A group of n rows are refreshed every 3.9μs A group of n rows form a Refresh Bundle (RB) Size of RB increases w/ increase in DRAM capacity increases trfc Example Distributed Refresh Operation 1Gb HMC Vault trefi = 3.9µs trfc RB1 Retention Cycle = 32ms trefi = 3.9µs trfc RB2 trefi = 3.9µs trfc RB8192 trefi: Refresh Interval trfc: Refresh Cycle Time Size of RB is 16 trfc trc Row1 trec trc Row2 trec trc trec trc trc trec Row3 Row4 Row15 trc Row16 trc: Row Cycle Time trfc = time taken to refresh entire RB 18
20 Performance Overhead of Distributed Refresh Source: J Liu+, ISCA 2012 Performance overhead of refresh increases with increase in device capacity 19
21 Energy Overhead of Distributed Refresh Source: J Liu+, ISCA 2012 Energy overhead of refresh increases with increase in device capacity 20
22 Energy Overhead of Distributed Refresh Source: J Liu+, ISCA 2012 Refresh is a growing problem, which needs to be addressed to realize low-latency, low-energy DRAMs Energy overhead of refresh increases with increase in device capacity 21
23 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 22
24 Related Work Scattered Refresh improves upon Per-bank Refresh and All-bank Refresh We improve upon Scattered Refresh 23
25 All-Bank Refresh Vs Per-Bank Refresh Distributed Refresh can be implemented at two different granularities All-bank Refresh: All banks are refreshed simultaneously, and none of the banks is allowed to serve any request until refresh is complete Supported by all general purpose DDRx DRAMs DRAM operation is completely stalled no. of available banks (#AB) is zero Exploits bank-level parallelism (BLP) for refreshing smaller trfc Per-bank Refresh: Only one bank is refreshed at a time, so all other banks are allowed to serve other requests Supported by LPDDRx DRAMs #AB > 0 No BLP larger value of trfc trfc: Refresh Cycle Time 24
26 All-Bank Refresh Vs Per-Bank Refresh All-Bank Refresh L = Layer ID B = Bank ID SA = Saubarray ID R = Row ID Smaller value of trfc trfc: Refresh Cycle Time Number of available banks (#AB) = 0 DRAM operation is completely stalled Dummy ACTIVATION-PRECHARGE operations for refresh command Per-Bank Refresh #AB > 0 No BLP larger value of trfc trc: Row Cycle Time Both All-bank Refresh and Per-bank Refresh have drawbacks and they can be improved 25
27 Scattered Refresh Source: T Kalyan+, ISCA 2012 Improves upon Per-bank Refresh uses subarray-level parallelism (SLP) for refresh Each row of RB is mapped to a different subarray SLP gives opportunity to overlap PRECHARGE with next ACTIVATE reduces trfc Example Scattered Refresh Operation HMC Vault Refresh Bundle size of 4 Scattered L = Layer ID B = Bank ID SA = Saubarray ID R = Row ID How does Scattered Refresh compare to Per-bank Refresh and All-bank Refresh? 26
28 Scattered Refresh Example Scattered Refresh Operation HMC Vault Refresh Bundle size of 4 Per-Bank Scattered All-Bank trfc for All-bank Refresh < trfc for Scattered Refresh < trfc for Per-bank Refresh Room for improvement - Scattered Refresh 27
29 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 28
30 Contributions #BLP: Bank-level Parallelism #SLP: Subarray-level Parallelism #AB: Number of banks available to serve other requests while remaining banks are being refreshed Crammed Refresh: Per-bank Refresh + All-bank Refresh 2 banks are refreshed in parallel, instead of 1 bank in Per-bank Refresh and all banks in All-bank Refresh Massed Refresh: Crammed Refresh + Scattered Refresh 2 banks are refreshed in parallel Uses SLP in both banks being refreshed Only 2 banks are refreshed in parallel proof of concept More than 2 banks can also be chosen Idea is to keep balance between #AB and BLP for refresh 29
31 Crammed Refresh trfc Timing Example Crammed Refresh Operation HMC Vault Refresh Bundle size of 4 Per-Bank Scattered Crammed L = Layer ID B = Bank ID SA = Saubarray ID R = Row ID trfc for Crammed Refresh < trfc for Scattered Refresh Bank-level parallelism (BLP) for refresh Only 2 banks are refreshed in parallel #AB>0 30
32 Massed Refresh trfc Timing Example Massed Refresh Operation HMC Vault Refresh Bundle size of 4 Per-Bank Crammed Massed L = Layer ID B = Bank ID SA = Saubarray ID R = Row ID Bank-level parallelism (BLP) + Subarray-level parallelism (SLP) for refresh trfc for Massed Refresh < trfc for Crammed Refresh How to implement BLP and SLP together? 31
33 Subarray-level Parallelism (SLP) Source: Y Kim+, ISCA 2012 Global Row-address Latch Per-Subarray Row-address Latch Global Row-address Latch hinders SLP 32
34 Bank-level Parallelism (BLP) BLP is implemented by masking BankID during refresh To Banks Memory die 4 Memory die 3 Memory die 2 Memory die 1 TSV Launch Pads LayerID LID Row Addr Latch Mask BankID BID EN Logic Base (LoB) Vault Controller Refresh Controller Refresh Scheduler Control Physical Addr Decoder Address Calculator LayerAddr[2] BankAddr[1] RowAddr[14] Physical Address Latch 17-bit Address Counter 33
35 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Contributions Evaluation Setup Evaluation Results Conclusion 34
36 Evaluation Setup Trace-driven simulation for PARSEC benchmarks Memory access traces extracted from detailed cycle-accurate simulations using gem5 These memory traces were then provided as inputs to the DRAM simulator DRAMSim2 Energy, timing and area analysis CACTI-3DD based simulation based on 4Gb HMC quad model DRAMSim2 configuration Configured DRAMSim2 using CACTI-3DD results 35
37 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Motivation Massed Refresh Technique Evaluation Setup Evaluation Results Conclusion 36
38 Results I Energy, Timing, Area 37
39 Results II Throughput PARSEC Benchmarks Crammed refresh achieves 7.1% and 2.9% more throughput on average over distributed per-bank refresh and scattered refresh respectively Massed refresh achieves 8.4% and 4.3% more throughput on average over distributed per-bank refresh and scattered refresh respectively 38
40 Results III Energy Delay Product (EDP) PARSEC Benchmarks Crammed refresh achieves 6.4% and 2.7% less EDP on average over distributed per-bank refresh and scattered refresh respectively Massed refresh achieves 7.5% and 3.9% less EDP on average over distributed per-bank refresh and scattered refresh respectively 39
41 Outline Introduction Background on DRAM Structure and Refresh Operation Related Work Motivation Massed Refresh Technique Evaluation Setup Evaluation Results Conclusion 40
42 Conclusions Proposed Massed Refresh technique exploits Bank-level as well as subarray-level parallelism while refresh operations Proposed Crammed Refresh and Massed Refresh techniques Improve throughput and energy-efficiency of DRAM Crammed Refresh improves upon state-of-the-art 7.1% & 6.4% improvements in throughput and EDP over the distributed per-bank refresh 2.9% & 2.7% improvements in throughput and EDP over the scattered refresh schemes respectively Massed Refresh improves upon state-of-the-art 8.4% & 7.5% improvements in throughput and EDP over the distributed per-bank refresh 4.3% & 3.9% improvements in throughput and EDP over the scattered refresh schemes respectively 41
43 Thank You Questions / Comments? 42
IN recent years, DRAM latency has not improved as rapidly
168 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 3, JULY-SEPTEMBER 2015 3D-ProWiz: An Energy-Efficient and Optically-Interfaced 3D DRAM Architecture with Reduced Data Access Overhead
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationImproving DRAM Performance by Parallelizing Refreshes with Accesses
Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu Executive Summary DRAM refresh interferes
More informationIntroduction to memory system :from device to system
Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume
More informationCS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda
CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda biswap@cse.iitk.ac.in https://www.cse.iitk.ac.in/users/biswap/cs698y.html Row decoder Accessing a Row Access Address
More informationEEM 486: Computer Architecture. Lecture 9. Memory
EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:
More informationENEE 759H, Spring 2005 Memory Systems: Architecture and
SLIDE, Memory Systems: DRAM Device Circuits and Architecture Credit where credit is due: Slides contain original artwork ( Jacob, Wang 005) Overview Processor Processor System Controller Memory Controller
More information+1 (479)
Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial
More informationARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING. Thesis Defense Yoongu Kim
ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING Thesis Defense Yoongu Kim CPU+CACHE MAIN MEMORY STORAGE 2 Complex Problems Large Datasets High Throughput 3 DRAM Module DRAM Chip 1 0 DRAM Cell (Capacitor)
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationCouture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung
Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationComparative Analysis of Contemporary Cache Power Reduction Techniques
Comparative Analysis of Contemporary Cache Power Reduction Techniques Ph.D. Dissertation Proposal Samuel V. Rodriguez Motivation Power dissipation is important across the board, not just portable devices!!
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationMark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness
EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology
More informationRun-Time Reverse Engineering to Optimize DRAM Refresh
Run-Time Reverse Engineering to Optimize DRAM Refresh Deepak M. Mathew, Eder F. Zulian, Matthias Jung, Kira Kraft, Christian Weis, Bruce Jacob, Norbert Wehn Bitline The DRAM Cell 1 Wordline Access Transistor
More informationComputer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Main Memory Lectures These slides are from the Scalable Memory Systems course taught at ACACES 2013 (July 15-19,
More informationThe DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell
EEC 581 Computer Architecture Memory Hierarchy Design (III) Department of Electrical Engineering and Computer Science Cleveland State University The DRAM Cell Word Line (Control) Bit Line (Information)
More informationTechnical Note Designing for High-Density DDR2 Memory
Technical Note Designing for High-Density DDR2 Memory TN-47-16: Designing for High-Density DDR2 Memory Introduction Introduction DDR2 memory supports an extensive assortment of options for the system-level
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationDRAM Main Memory. Dual Inline Memory Module (DIMM)
DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,
More informationLecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction
Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction 1 Refresh Basics A cell is expected to have a retention time of 64ms; every cell must be refreshed within a 64ms window
More informationZ-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.
Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance
More informationComputer Architecture
Computer Architecture Lecture 7: Memory Hierarchy and Caches Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Abstraction: Virtual
More informationOVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler
OVERCOMING THE MEMORY WALL FINAL REPORT By Jennifer Inouye Paul Molloy Matt Wisler ECE/CS 570 OREGON STATE UNIVERSITY Winter 2012 Contents 1. Introduction... 3 2. Background... 5 3. 3D Stacked Memory...
More informationInternal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Internal Memory http://www.yildiz.edu.tr/~naydin 1 2 Outline Semiconductor main memory Random Access Memory
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationTiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive
More informationMemory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall
The Memory Wall EE 357 Unit 13 Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology targets density rather than speed) Large memories
More informationComputer Architecture
Computer Architecture Lecture 1: Introduction and Basics Dr. Ahmed Sallam Suez Canal University Spring 2016 Based on original slides by Prof. Onur Mutlu I Hope You Are Here for This Programming How does
More informationChapter 8 Memory Basics
Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access
More informationMemory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM
ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science
More informationUnderstanding Reduced-Voltage Operation in Modern DRAM Devices
Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee
More informationA High-Level DRAM Timing, Power and Area Exploration Tool
A High-Level DRAM Timing, Power and Area Exploration Tool Omar Naji, Christian Weis, Matthias Jung, Norbert Wehn Microelectronic Systems Design Research Group University of Kaiserslautern, Kaiserslautern,
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationCS311 Lecture 21: SRAM/DRAM/FLASH
S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 8 Dr. Ahmed H. Madian ah_madian@hotmail.com Content Array Subsystems Introduction General memory array architecture SRAM (6-T cell) CAM Read only memory Introduction
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More information3D-Wiz: A Novel High Bandwidth, Optically Interfaced 3D DRAM Architecture with Reduced Random Access Time
3D-Wiz: A Novel High Bandwidth, Optically Interfaced 3D DRAM Architecture with Reduced Random Access Time Ishan G Thakkar, Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State
More informationRAIDR: Retention-Aware Intelligent DRAM Refresh
Carnegie Mellon University Research Showcase @ CMU Department of Electrical and Computer Engineering Carnegie Institute of Technology 6-212 : Retention-Aware Intelligent DRAM Refresh Jamie Liu Carnegie
More informationDIRECT Rambus DRAM has a high-speed interface of
1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama
More information10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Memory Let s Name Some Groups of Bits I need your help. The computer we re going
More informationProcessor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs
Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Shin-Shiun Chen, Chun-Kai Hsu, Hsiu-Chuan Shih, and Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More information18-447: Computer Architecture Lecture 17: Memory Hierarchy and Caches. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 3/26/2012
18-447: Computer Architecture Lecture 17: Memory Hierarchy and Caches Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 3/26/2012 Reminder: Homeworks Homework 5 Due April 2 Topics: Out-of-order
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCOMPUTER ARCHITECTURES
COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and
More informationSpiral 2-9. Tri-State Gates Memories DMA
2-9.1 Spiral 2-9 Tri-State Gates Memories DMA 2-9.2 Learning Outcomes I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and
More informationEFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES
EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES MICRO 2011 @ Porte Alegre, Brazil Gabriel H. Loh [1] and Mark D. Hill [2][1] December 2011 [1] AMD Research [2] University
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationEECS 3201: Digital Logic Design Lecture 7. Ihab Amer, PhD, SMIEEE, P.Eng.
EECS 3201: Digital Logic Design Lecture 7 Ihab Amer, PhD, SMIEEE, P.Eng. 2x2 binary multiplier 2 4x4 Array Multiplier 3 Multiplexer (MUX) 4 MUX Implementations 5 Wider MUXes 6 Logic with MUXes 7 Reducing
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationA Comprehensive Analytical Performance Model of DRAM Caches
A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 13 COMPUTER MEMORY So far, have viewed computer memory in a very simple way Two memory areas in our computer: The register file Small number
More informationComputer Architecture Lecture 19: Memory Hierarchy and Caches. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 3/19/2014
18-447 Computer Architecture Lecture 19: Memory Hierarchy and Caches Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 3/19/2014 Extra Credit Recognition for Lab 3 1. John Greth (13157 ns) 2. Kevin
More informationAPPLICATION NOTE. SH3(-DSP) Interface to SDRAM
APPLICATION NOTE SH3(-DSP) Interface to SDRAM Introduction This application note has been written to aid designers connecting Synchronous Dynamic Random Access Memory (SDRAM) to the Bus State Controller
More informationSemiconductor Memory Classification
ESE37: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 6: November, 7 Memory Overview Today! Memory " Classification " Architecture " Memory core " Periphery (time permitting)!
More informationEmerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationMemory Basics. Course Outline. Introduction to Digital Logic. Copyright 2000 N. AYDIN. All rights reserved. 1. Introduction to Digital Logic.
Introduction to Digital Logic Prof. Nizamettin AYDIN naydin@yildiz.edu.tr naydin@ieee.org ourse Outline. Digital omputers, Number Systems, Arithmetic Operations, Decimal, Alphanumeric, and Gray odes. inary
More informationESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter
More informationIntroduction to Semiconductor Memory Dr. Lynn Fuller Webpage:
ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING Introduction to Semiconductor Memory Webpage: http://people.rit.edu/lffeee 82 Lomb Memorial Drive Rochester, NY 14623-5604 Tel (585) 475-2035
More informationIntegrated Optoelectronic Networks for Application- Driven Multicore Computing
https://livelink.ebs.afrl.af.mil/livelink/llisapi.dll 5/25/2017 AFRL-AFOSR-VA-TR-2017-0102 Integrated Optoelectronic Networks for Application- Driven Multicore Computing Sudeep Pasricha COLORADO STATE
More informationCMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計
CMOS Logic Circuit Design http://www.rcns.hiroshima-u.ac.jp Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計 Memory Circuits (Part 1) Overview of Memory Types Memory with Address-Based Access Principle of Data Access
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationCOMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)
COMP3221: Microprocessors and Embedded Systems Lecture 23: Memory Systems (I) Overview Memory System Hierarchy RAM, ROM, EPROM, EEPROM and FLASH http://www.cse.unsw.edu.au/~cs3221 Lecturer: Hui Wu Session
More informationSemiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM
More informationThermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video
Thermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video Chih-Yuan Chang, Po-Tsang Huang, Yi-Chun Chen, Tian-Sheuan Chang and Wei Hwang Department of Electronics Engineering
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationChapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.
Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized
More informationChapter 5. Internal Memory. Yonsei University
Chapter 5 Internal Memory Contents Main Memory Error Correction Advanced DRAM Organization 5-2 Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory(ram) Read-write
More informationThe Memory Hierarchy Part I
Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells
More informationLecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010
EE4800 CMOS Digital IC Design & Analysis Lecture 11 SRAM Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitryit Multiple Ports Outline Serial Access Memories 11.2 Memory Arrays
More informationDesign-Induced Latency Variation in Modern DRAM Chips:
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms Donghyuk Lee 1,2 Samira Khan 3 Lavanya Subramanian 2 Saugata Ghose 2 Rachata Ausavarungnirun
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationMemory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition
Chapter 6 Memory Circuits and Chapter rray Subsystems from CMOS VLSI Design by Weste and Harris, th Edition E E 80 Introduction to nalog and Digital VLSI Paul M. Furth New Mexico State University Static
More informationMosel Vitelic (IBM-Siemens) V53C181608K60 1Mx16 CMOS EDO DRAM
May 19, 1998 Mosel Vitelic (IBM-Siemens) V53C181608K60 1Mx16 CMOS EDO DRAM Abstract: The Mosel Vitelic V53C181608K60 is a 1Mx16 CMOS DRAM featuring EDO Page Mode Operation, self-refresh, hidden refresh
More informationLecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)
Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips
More informationECE 152 Introduction to Computer Architecture
Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course
More informationSpring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand
Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates
More informationM2 Outline. Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM
M2 Memory Systems M2 Outline Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM Memory Technology Memory MemoryArrays
More informationLow-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM
Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data
More informationABSTRACT. HIGH-PERFORMANCE DRAM SYSTEM DESIGN CONSTRAINTS AND CONSIDERATIONS Joseph G. Gross, Master of Science, 2010
ABSTRACT Title of Document: HIGH-PERFORMANCE DRAM SYSTEM DESIGN CONSTRAINTS AND CONSIDERATIONS Joseph G. Gross, Master of Science, 2010 Thesis Directed By: Dr. Bruce L. Jacob, Assistant Professor, Department
More informationAmbit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
Using Commodity DRAM Technology Vivek Seshadri,5 Donghyuk Lee,5 Thomas Mullins 3,5 Hasan Hassan 4 Amirali Boroumand 5 Jeremie Kim 4,5 Michael A. Kozuch 3 Onur Mutlu 4,5 Phillip B. Gibbons 5 Todd C. Mowry
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache Is the cache indexed with virtual or physical address? To index with a physical address, we
More informationEE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems
EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture
More informationChapter 5 Internal Memory
Chapter 5 Internal Memory Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Read-only
More informationOrganization Row Address Column Address Bank Address Auto Precharge 128Mx8 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10
GENERAL DESCRIPTION The Gigaram is ECC Registered Dual-Die DIMM with 1.25inch (30.00mm) height based on DDR2 technology. DIMMs are available as ECC modules in 256Mx72 (2GByte) organization and density,
More informationMain Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1
Lecture 5 Main Memory Systems Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 5-1 Announcements If you don t have a group of 3, contact us ASAP HW-1 is
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy
More information