EEM 486: Computer Architecture. Lecture 9. Memory

Similar documents
15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Lecture-14 (Memory Hierarchy) CS422-Spring

Computer Architecture

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Introduction to memory system :from device to system

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Computer Architecture Lecture 19: Memory Hierarchy and Caches. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 3/19/2014

DRAM Main Memory. Dual Inline Memory Module (DIMM)

ECE 485/585 Microprocessor System Design

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

18-447: Computer Architecture Lecture 17: Memory Hierarchy and Caches. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 3/26/2012

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

Topic 21: Memory Technology

Topic 21: Memory Technology

18-447: Computer Architecture Lecture 22: Memory Hierarchy and Caches. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 3/27/2013

The Memory Hierarchy 1

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

The University of Adelaide, School of Computer Science 13 September 2018

CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda

CSE502: Computer Architecture CSE 502: Computer Architecture

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

CS311 Lecture 21: SRAM/DRAM/FLASH

EE414 Embedded Systems Ch 5. Memory Part 2/2

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Chapter 8 Memory Basics

CENG3420 Lecture 08: Memory Organization

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

CSE502: Computer Architecture CSE 502: Computer Architecture

CS152 Computer Architecture and Engineering Lecture 16: Memory System

Memories: Memory Technology

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Mainstream Computer System Components

CENG4480 Lecture 09: Memory 1

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Computer Architecture Lecture 21: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 3/23/2015

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

ECE 485/585 Microprocessor System Design

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Computer Systems Laboratory Sungkyunkwan University

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Main Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

The Memory Hierarchy Part I

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth

Memory Hierarchy and Caches

ECE 250 / CS250 Introduction to Computer Architecture

Computer System Components

The Memory Hierarchy & Cache

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

COMPUTER ARCHITECTURES

CpE 442. Memory System

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

An introduction to SDRAM and memory controllers. 5kk73

M2 Outline. Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM

10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!

ECE 485/585 Microprocessor System Design

Memory. Lecture 22 CS301

Memory Basics. Course Outline. Introduction to Digital Logic. Copyright 2000 N. AYDIN. All rights reserved. 1. Introduction to Digital Logic.

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Design and Implementation of an AHB SRAM Memory Controller

CPE300: Digital System Architecture and Design

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

EN1640: Design of Computing Systems Topic 06: Memory System

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Lecture: Memory Technology Innovations

Mainstream Computer System Components

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings

Lecture 17. Fall 2007 Prof. Thomas Wenisch. row enable. _bitline. Lecture 18 Slide 1 EECS 470

The Memory Component

Design with Microprocessors

Memory technology and optimizations ( 2.3) Main Memory

Computer Memory Basic Concepts. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS429: Computer Organization and Architecture

Concept of Memory. The memory of computer is broadly categories into two categories:

CS429: Computer Organization and Architecture

Storage Technologies and the Memory Hierarchy

CSE 599 I Accelerated Computing - Programming GPUS. Memory performance

Spiral 2-9. Tri-State Gates Memories DMA

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Memory Overview. Overview - Memory Types 2/17/16. Curtis Nelson Walla Walla University

Logic and Computer Design Fundamentals. Chapter 8 Memory Basics

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Computer Organization. 8th Edition. Chapter 5 Internal Memory

2. Link and Memory Architectures and Technologies

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Transcription:

EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu Lec 9.2

Main Memory Main Memory in the System SHARED L3 CACHE CORE 0 L2 CACHE 0 L2 CACHE 2 L2 CACHE 1 L2 CACHE 3 CORE 1 DRAM MEMORY CONTROLLER CORE 2 CORE 3 DRAM INTERFACE DRAM BANKS 4

Ideal Memory Zero access time (latency) Infinite capacity Zero cost Infinite bandwidth (to support multiple accesses in parallel) 5 The Problem Ideal memory s requirements oppose each other Bigger is slower Bigger à Takes longer to determine the location Faster is more expensive Memory technology: SRAM vs. DRAM Higher bandwidth is more expensive Need more banks, more ports, higher frequency, or faster technology 6

The Memory Chip/System Abstraction 7 Main Memory Overview 8

Memory Bank Organization and Operation Read access sequence: 1. Decode row address & drive word-lines 2. Selected bits drive bit-lines Entire row read 3. Amplify row data 4. Decode column address & select subset of row Send to output 5. Precharge bit-lines For next access 9 Memory Technology: DRAM Dynamic random access memory Capacitor charge state indicates stored value Whether the capacitor is charged or discharged indicates storage of 1 or 0 1 capacitor 1 access transistor Capacitor leaks through the RC path DRAM cell loses charge over time DRAM cell needs to be refreshed Refresh: A DRAM controller must periodically read all rows within the allowed refresh time (10s of ms) such that charge is restored in cells _bitline row enable 10

Memory Technology: SRAM Static random access memory Two cross coupled inverters store a single bit Feedback path enables the stored value to persist in the cell 4 transistors for storage 2 transistors for access row select bitline _bitline 11 DRAM vs. SRAM DRAM Slower access (capacitor) Higher density (1T 1C cell) Lower cost Requires refresh (power, performance, circuitry) Manufacturing requires putting capacitor and logic together SRAM Faster access (no capacitor) Lower density (6T cell) Higher cost No need for refresh Manufacturing compatible with logic process (no capacitor) 12

The Problem Bigger is slower SRAM, 512 Bytes, sub-nanosec SRAM, KByte~MByte, ~nanosec DRAM, Gigabyte, ~50 nanosec Hard Disk, Terabyte, ~10 millisec Faster is more expensive (dollars and chip area) SRAM, < 10$ per Megabyte DRAM, < 1$ per Megabyte Hard Disk < 1$ per Gigabyte These sample values scale with time 13 DRAM: Memory Access Protocol Addr RAS n m CAS 2 n bit-cell array 2 n row x 2 m -col (n~m to minimize overall latency) 2 m sense amp and mux 1 A DRAM die is comprised of multiple such arrays Five basic commands Activate Read Write Precharge Refresh To reduce pin count, row and column share same address pins RAS: Row address strobe CAS: Column address strobe 14

DRAM: Basic Operation Access Address: (Row 0, Column 0) (Row 0, Column 1) (Row 0, Column 85) (Row 1, Column 0) Row address 01 Row decoder Columns Rows Commands Activate 0 Read 0 Read 1 Read 85 Precharge Activate 1 Read 0 Empty Row 01 Row Buffer CONFLICT HIT! Column address 185 0 Column mux Data 15 DRAM: Basic Operation A DRAM bank is a 2D array of cells: rows x columns A DRAM row is also called a DRAM page Sense amplifiers also called row buffer Each address is a <row, column> pair Access to a closed row Activate command opens row (placed into row buffer) Read/write command reads/writes column in the row buffer Precharge command closes the row and prepares the bank for next access Access to an open row No need for activate command Read/write command reads/writes column in the row buffer 16

The DRAM Chip Consists of multiple banks (2-16 in Synchronous DRAM) Banks share command/address/data buses The chip itself has a narrow interface (4-16 bits per read) 17 DRAM: Banks 18

128M x 8-bit DRAM Chip 19 The DRAM Bank Structure 20

DDR3 SDRAM Introduced in 2007 SDRAM = Synchronous DRAM = Clocked DDR = Double Data Rate Data transferred on both clock edges ν 400 MHz = 800 MT/s x4, x8, x16 datapath widths Minimum burst length of 8 8 banks 1Gb, 2Gb, 4Gb capacity common Relative to SDR/DDR/DDR2: + bandwidth, ~ latency 21 Main Memory Overview 22

DRAM Modules DRAM chips have narrow interface (typically x4, x8, x16) Multiple chips are put together to form a wide interface DIMM: Dual Inline Memory Module To get a 64-bit DIMM, we need to access 8 chips with 8-bit interfaces Share command/address lines, but not data Advantages Acts like a high-capacity DRAM chip with a wide interface 8x capacity, 8x bandwidth, same latency Disadvantages Granularity: Accesses cannot be smaller than the interface width 8x power 23 A 64-bit Wide DIMM (Physical view) DRAM Chip DRAM Chip DRAM Chip DRAM Chip DRAM Chip DRAM Chip DRAM Chip DRAM Chip Command Data 24

A 64-bit Wide DIMM (logical view) 25 DRAM Ranks A DIMM may include multiple ranks A 64-bit DIMM using 8 chips with x16 interfaces has 2 ranks Each 64-bit group of chips is called a rank All chips in a rank respond to a single command Different ranks share command/address/data lines Select between ranks with Chip Select signal Ranks provide more banks across multiple chips (but don t confuse rank and bank!) 26

The DRAM Subsystem The Top Down View DRAM Subsystem Organization Channel DIMM Rank Chip Bank Row/Column 28

The DRAM subsystem Channel DIMM (Dual in- line memory module) Processor Memory channel Memory channel DRAM Channels Channel: a set of DIMMs in series All DIMMs get the same command, one of the ranks replies System op@ons Single channel system Mul@ple dependent (lock- step) channels Single controller with wider interface (faster cache line refill!) Some@mes called Gang Mode Only works if DIMMs are iden@cal (organiza@on, @ming) Mul@ple independent channels Tradeoffs Requires mul@ple controllers Cost: pins, wires, controller Benefit: higher bandwidth, capacity, flexibility 30

DRAM Channel Op@ons Lock-step Independent CPU MC MC Mul@- CPU (Old school) CPU Front-side bus MC CPU External memory controller adds latency Capacity does not grow with # of CPUs

NUMA Topology (modern) MC CPU QPI MC QPI MC CPU MC Capacity grows with # of CPUs NUMA: Non- uniform Memory Access Breaking down a DIMM DIMM (Dual in- line memory module) Side view Front of DIMM Back of DIMM

Breaking down a DIMM DIMM (Dual in- line memory module) Side view Front of DIMM Back of DIMM Rank 0: collec@on of 8 chips Rank 1 Rank Rank 0 (Front) Rank 1 (Back) <0:63> <0:63> Addr/Cmd CS <0:1> Data <0:63> Memory channel

Breaking down a Rank Rank 0 Chip 0 Chip 1... Chip 7 <0:63> <8:15> <56:63> Data <0:63> Breaking down a Chip Chip 0 Bank 0...

Breaking down a Bank 2kB 1B (column) row 16k- 1 Bank 0... row 0 Row- buffer 1B 1B... 1B DRAM Subsystem Organization Channel DIMM Rank Chip Bank Row/Column 40

Example: Transferring a cache block Physical memory space 0xFFFF F Channel 0... DIMM 0 0x40 64B cache block Mapped to Rank 0 0x00 Example: Transferring a cache block Physical memory space 0xFFFF F Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 64B cache block Data <0:63> 0x00

Example: Transferring a cache block Physical memory space 0xFFFF F Row 0 Col 0 Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 64B cache block Data <0:63> 0x00 Example: Transferring a cache block Physical memory space 0xFFFF F Row 0 Col 0 Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 0x00 8B 64B cache block 8B Data <0:63>

Example: Transferring a cache block Physical memory space 0xFFFF F Row 0 Col 1 Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 0x00 8B 64B cache block Data <0:63> Example: Transferring a cache block Physical memory space 0xFFFF F Row 0 Col 1 Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 0x00 8B 8B 64B cache block 8B Data <0:63>

Example: Transferring a cache block Physical memory space 0xFFFF F Row 0 Col 1 Chip 0 Chip 1 Chip 7 Rank 0... <8:15> <56:63>... 0x40 0x00 8B 8B 64B cache block Data <0:63> A 64B cache block takes 8 I/O cycles to transfer. During the process, 8 columns are read sequenually. Address Mapping (Single Channel) Page/Row interleaving Consecutive rows of memory in consecutive banks Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Bank 0 Bank 1 Bank 2 Bank 3 Address format r k p page index bank page offset 48

Address Mapping (Single Channel) Single-channel system with 8-byte memory bus 2GB memory, 8 banks, 16K rows & 2K columns per bank Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits) 49 Address Mapping (Single Channel) Cache block interleaving Consecutive cache block addresses in consecutive banks cacheline 0 cacheline 4 cacheline 1 cacheline 5 cacheline 2 cacheline 6 cacheline 3 cacheline 7 Bank 0 Bank 1 Bank 2 Bank 3 Address format r p-b k b page index page offset bank page offset 50

Address Mapping (Single Channel) Single-channel system with 8-byte memory bus 2GB memory, 8 banks, 16K rows & 2K columns per bank Row interleaving Consecutive rows of memory in consecutive banks Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits) Cache block interleaving Consecutive cache block addresses in consecutive banks 64 byte cache blocks Row (14 bits) High Column Bank (3 bits) Low Col. Byte in bus (3 bits) 8 bits 3 bits 51