Memories: Memory Technology

Similar documents
Topic 21: Memory Technology

Topic 21: Memory Technology

ECE 485/585 Microprocessor System Design

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CpE 442. Memory System

Computer Systems Laboratory Sungkyunkwan University

Lecture 18: DRAM Technologies

EEC 483 Computer Organization

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

The Memory Hierarchy 1

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Mainstream Computer System Components

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Computer System Components

Memory latency: Affects cache miss penalty. Measured by:

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability

Memory latency: Affects cache miss penalty. Measured by:

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

EE414 Embedded Systems Ch 5. Memory Part 2/2

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Main Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1

CS311 Lecture 21: SRAM/DRAM/FLASH

EEM 486: Computer Architecture. Lecture 9. Memory

UNIT V (PROGRAMMABLE LOGIC DEVICES)

Who Cares About the Memory Hierarchy? Time CS 152 Lec16.6. Performance. Recap

CMSC 611: Advanced Computer Architecture

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

CSE502: Computer Architecture CSE 502: Computer Architecture

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

ECSE-2610 Computer Components & Operations (COCO)

Memory. Lecture 22 CS301

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

CSE502: Computer Architecture CSE 502: Computer Architecture

Introduction to memory system :from device to system

Chapter 8 Memory Basics

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CENG4480 Lecture 09: Memory 1

Mainstream Computer System Components

COMPUTER ARCHITECTURES

ECE 485/585 Microprocessor System Design

ENEE 759H, Spring 2005 Memory Systems: Architecture and

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

CENG3420 Lecture 08: Memory Organization

ECE 485/585 Microprocessor System Design

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Computer Memory. Textbook: Chapter 1

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Memory Hierarchy and Caches

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Real Time Embedded Systems

Introduction read-only memory random access memory

a) Memory management unit b) CPU c) PCI d) None of the mentioned

Unleashing the Power of Embedded DRAM

Research Collection. A survey of synchronous RAM architectures. Working Paper. ETH Library. Author(s): Gries, Matthias. Publication Date: 1999

Memory hierarchy Outline

CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

An introduction to SDRAM and memory controllers. 5kk73

Chapter 5 Internal Memory

Later designs used arrays of small ferrite electromagnets, known as core memory.

ECE 2300 Digital Logic & Computer Organization

ECEN 449 Microprocessor System Design. Memories

ECE 571 Advanced Microprocessor-Based Design Lecture 16

Large and Fast: Exploiting Memory Hierarchy

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings

ECE 250 / CS250 Introduction to Computer Architecture

The Memory Component

Design with Microprocessors

chapter 8 The Memory System Chapter Objectives

MEMORY SYSTEM MEMORY TECHNOLOGY SUMMARY DESIGNING MEMORY SYSTEM. The goal in designing any memory system is to provide

EECS150 - Digital Design Lecture 16 - Memory

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

2. Link and Memory Architectures and Technologies

Chapter 5. Internal Memory. Yonsei University

COSC 6385 Computer Architecture - Memory Hierarchies (II)

EECS150 - Digital Design Lecture 16 Memory 1

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

CPE300: Digital System Architecture and Design

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer Memory Basic Concepts. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Transcription:

Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy 1

Outline Survey of various types of memory technologies S (caches, register files) DRAMS (main memory) Variants of DRAMs Latency vs. Throughput again Main Memory Background Random Access Memory (vs. Serial Access Memory) Different flavors at different levels Physical Makeup (CMOS, DRAM) Low Level Architectures (FPM, EDO, BEDO, SDRAM) Cache uses : Static Random Access Memory Fast: 8-16 times as fast as DRAM (also 8-16 times as costly...) Small: 1/4-1/8 the capacity of DRAM No refresh needed, but volatile to power loss Main Memory is DRAM: Dynamic Random Access Memory Slow and big (relative to ) Dynamic: needs to be refreshed periodically (every 8 ms, 1% time) Addresses divided into 2 halves (Memory as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe 2

Basic Set-Associative Cache Structure (from CACTI) Typical Organization: 16-word x 4-bit Din 3 Din 2 Din 1 Din 0 Precharge WrEn Wr Driver & Wr Driver & Wr Driver & Wr Driver & - Precharger+ - Precharger+ - Precharger+ - Precharger+ : : : : - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp + Word 0 Word 1 Word 15 Address Decoder A0 A1 A2 A3 Dout 3 Dout 2 Dout 1 Dout 0 3

Basic Static RAM 6-Transistor 0 1 0 1 word (row select) 0 word 1 bit bit Write: bit 1. Drive bit lines (bit=1, bitbar=0) 2. Select row replaced with pullup Read: to save area 1. Precharge bit and bitbar to Vdd 2. Select row 3. pulls one line low 4. Sense amp on column detects difference between bit and bitbar bit Multi-ported s p = total number of ports w = register cell width without ports h = register cell height without ports So each cell is (w+p)(h+p) in area How many ports needed per register per functional unit? 2 reads, 1 write External port to cache = x How large is the register file given N functional units? Number of registers scale with N Size of each register scales with square of total number of ports = (3+x)N So area of register file scales with N^3. [Rixner et. al., Register organization for media processing, In HPCA 2000. 4

Example: Toshiba Problems with Select = 1 P1 P2 Off On On N1 On Off N2 On bit = 1 bit = 0 Six transistors use up a lot of area Consider when a Zero is stored in the cell: Transistor N1 will try to pull bit to 0 Transistor P2 will try to pull bit bar to 1 5

1-Transistor Memory (DRAM) Write: 1. Drive bit line 2. Select row Read: 1. Precharge bit line to Vdd/2 2. Select row 3. and bit line share charges Very small voltage changes on the bit line 4. Sense (fancy sense amp) Can detect changes of ~10-100k electrons Amplifies and recharges cell 5. Write: restore the value Refresh Basically a dummy read to an entire row bit row select DRAM logical organization (4 Mbit) Square root of bits per RAS/CAS 11 Column Decoder Sense Amps & I/O A0 A10 Address Buffer Row Decoder Memory Array (2,048 x 2,048) Word Line Storage 6

Logic Diagram of a Typical DRAM RAS_L CAS_L WE_L OE_L A 256K x 8 9 DRAM 8 D Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D is bidirectional): WE_L is asserted (Low), OE_L is disasserted (High) D serves as the data input pin WE_L is disasserted (High), OE_L is asserted (Low) D is the data output pin Row and column addresses share the same pins (A) RAS_L goes low: Pins A are latched in as row address CAS_L goes low: Pins A are latched in as column address RAS/CAS edge-sensitive Cycle Time vs. Access Time: Latency vs. Throughput again Cycle Time Access Time Time DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time DRAM (Read/Write) Cycle Time : How frequently can you initiate an access? DRAM (Read/Write) Access Time: How quickly will you get what you want once you initiate an access? DRAM Bandwidth Limitation : How much data can you get from the memory? 7

Main Memory Organizations Simple: CPU, Cache, Bus, Memory same width (32 or 64 bits) Wide: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits; UtraSPARC 512) Banked & Interleaved: CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved Increasing Bandwidth - Interleaving Access Pattern without Interleaving: CPU Memory D1 available Start Access for D1 Start Access for D2 Access Pattern with 4-way Interleaving: CPU Access Bank 0 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again Memory Bank 0 Memory Bank 1 Memory Bank 2 Memory Bank 3 8

Main Memory Performance Timing model (word size is 32 bits) 4 to send address, 56 access time per word, 4 send time per word Cache block is 4 words Simple M.P. = 4 x (4+56+4) = 256 Wide M.P. = 4 + 56 + 4 = 64 (4-word) Interleaved M.P. = 4 + 56 + 4x4 = 76 4-way interleaved memory. Optimized for sequential accesses DRAM Performance A 60 ns (trac) DRAM can perform a row access only every 110 ns (trc) perform column access (tcac) in 15 ns, but time between column accesses is at least 35 ns (tpc). In practice, external address delays and turning around buses make it 40 to 50 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead! Can it be made faster? 9

Improvements on DRAM Fast page mode Send row address once for multiple column addresses Extended Data Out (EDO) Keep data available even after CAS_L is high Synchronous DRAM (SDRAM) Double Data Rate SDRAM (DDR SDRAM) PC2100: 2.1 Gbytes/second, 8 * 133M * 2 Direct Rambus DRAM (DRDRAM) 16-bit internal bus clocked at 400MHz (1.6Gbytes/second) Fast Page Mode DRAM Page: All bits on the same ROW (Spatial Locality) Don t need to wait for wordline to recharge Toggle CAS with new column address 10

Extended Data Out (EDO) Add a latch between sense amps and output pins EDO DRAM Last accessed row data still available in latch, so precharge can be started sooner Variant: Burst EDO (BEDO) Read of write cycles batched in bursts of 4 Address is incremented internally as CAS toggles 11

Synchronous DRAM Has a clock input. Data output is in bursts w/ each element clocked In the past: read the whole row, then select small # of useful bits SDRAM: Don t throw away the bits, use arbitrary number of bits from each row Register holds how many bytes per request, up to entire row. Synchronous DDR DRAMs Double Data Rate: data is driven and received on both rising and falling edges of the clock. This DDR signalling technique is used in both DDR DRAMS, and Rambus DRDRAMs 12

SDRAM and Direct RDRAM (Rambus) DRDRAM: Regular interconnect: High-frequency bus Three-component bus: 1 data, 2 address Memory controller can request for components of a large block in any order can schedule accesses SDRAM and Direct RDRAM 13

DRDRAM Performance Comparison Bandwidth impairment for SDRAM, DDR SDRAM: 1. Bank Conflict (banks share sense amps) 25% prob. of sequential memory accesses hitting same bank 2. Constraints on address command bus 3. Two cycle addressing problem Caused by uneven capacitive loading of command/data bus RDRAM: 1. 32 banks Reduces bank contention 2. Command/data Channel uniformly routed to each device (equal load) 3. Row/column address in separate buses (can be sent on same cycle) 14

Timing Diagram: SDRAM Bank Conflicts DRDRAM Timing 15

Independent Memory Banks Parallel access instead of sequential access (multi-issue vs. pipelined) Multiple controllers, arrays Scheduling accesses to multiple banks (Rixner et al., ISCA 2000) (bank, row, column) 16

Some numbers... DRAM History DRAMs: capacity +40-60%/yr, cost (1 MB) 40%/yr 2.5x cells/area, 1.5x die size in 3 years 1998 DRAM fab line costs $2B DRAM only: density, leakage v. speed Rely on increasing no. of computers & memory per computer (60% market) SIMM or DIMM is replaceable unit => computers use any generation DRAM Commodity, second source industry => high volume, low profit, conservative Little organization innovation in 20 years Don t want to be chip foundries (bad for RDRAM) Order of importance: 1) Cost/bit 2) Capacity First RAMBUS: 10X BW, +30% cost => little initial impact 17