Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings

Similar documents
ECE 250 / CS250 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

ECE 152 / 496 Introduction to Computer Architecture

ECE 250 / CPS 250 Computer Architecture. Main Memory and Virtual Memory

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

Topic 21: Memory Technology

Topic 21: Memory Technology

The Memory Hierarchy 1

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Chapter 8 Memory Basics

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2012 Daniel J. Sorin Duke University

EEM 486: Computer Architecture. Lecture 9. Memory

CS152 Computer Architecture and Engineering Lecture 16: Memory System

Memories: Memory Technology

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies. Readings.

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CENG3420 Lecture 08: Memory Organization

CpE 442. Memory System

CENG4480 Lecture 09: Memory 1

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Where Have We Been? Ch. 6 Memory Technology

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Computer Systems Laboratory Sungkyunkwan University

COSC 6385 Computer Architecture - Memory Hierarchies (III)

ECE468 Computer Organization and Architecture. Memory Hierarchy

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

ECE 485/585 Microprocessor System Design

Lecture 18: DRAM Technologies

CSE502: Computer Architecture CSE 502: Computer Architecture

Recap: Machine Organization

The University of Adelaide, School of Computer Science 13 September 2018

The Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

CS 104 Computer Organization and Design

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Chapter 9: A Closer Look at System Hardware

Chapter 9: A Closer Look at System Hardware 4

ECE232: Hardware Organization and Design

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

CSE502: Computer Architecture CSE 502: Computer Architecture

The Memory Component

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Summary of Computer Architecture

Memory technology and optimizations ( 2.3) Main Memory

DRAM Main Memory. Dual Inline Memory Module (DIMM)

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

The Memory Hierarchy Part I

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

ECE 341. Lecture # 16

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

COSC 6385 Computer Architecture - Memory Hierarchies (II)

CS/ECE 250: Computer Architecture

Lecture-14 (Memory Hierarchy) CS422-Spring

EE414 Embedded Systems Ch 5. Memory Part 2/2

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Memory Hierarchy Y. K. Malaiya

Concept of Memory. The memory of computer is broadly categories into two categories:

Logic and Computer Design Fundamentals. Chapter 8 Memory Basics

Memory hierarchy Outline

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Computer Memory Basic Concepts. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Database Systems II. Secondary Storage

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

CS 261 Fall Mike Lam, Professor. Memory

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

Unleashing the Power of Embedded DRAM

The Memory Hierarchy. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

ECSE-2610 Computer Components & Operations (COCO)

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Computer Memory.

Chapter 5. Internal Memory. Yonsei University

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Memory. Lecture 22 CS301

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9

Chapter 5 Internal Memory

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE 571 Advanced Microprocessor-Based Design Lecture 16

CS311 Lecture 21: SRAM/DRAM/FLASH

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

MEMORY. Objectives. L10 Memory

Transcription:

Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2012 Where We Are in This Course Right Now So far: We know how to design a processor that can fetch, decode, and eecute the instructions in an ISA We can pipeline this processor We understand how to design caches Now: We learn how to implement main memory in DRAM We learn about virtual memory Net: We learn about the lowest level of storage (disks) and I/O 1 2 This Unit: Main Memory Readings Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review DRAM technology A few more transistors Organization: two-level ing Building a memory system Bandwidth matching Error correction Organizing a memory system Virtual memory Address translation and page tables A virtual memory hierarchy Patterson and Hennessy Still in Chapter 5 3 4

Memory Hierarchy Review Concrete Memory Hierarchy Storage: registers, memory, disk Memory is fundamental element (unlike caches or disk) Memory component performance t avg = t hit + % miss * t miss Can t get both low t hit and % miss in a single structure Memory hierarchy Upper components: small, fast, epensive Lower components: big, slow, cheap t avg of hierarchy is close to t hit of upper (fastest) component 10/90 rule: 90% of stuff found in fastest component Temporal/spatial locality: automatic up-down movement I$ CPU L2 D$ Main Memory Disk(swap) 1 st /2 nd (/3 rd ) levels: caches (L1 I$, L1 D$, L2) Made of SRAM Managed in hardware Previous unit of course Below caches level: main memory Made of DRAM Managed in software This unit of course Below memory: disk (swap space) Made of magnetic iron oide disks Managed in software Net unit 5 6 RAM in General (SRAM and DRAM) SRAM 0/1 wordline 0? 0/1? 0/1? 0/1? wordline 1 0/1? 0/1? wordline 2 0/1? 0/1? wordline 3 bitline 1 bitline 0 RAM: large storage arrays Basic structure MN array of bits (M N-bit words) This one is 42 Bits in word connected by wordline Bits in position connected by bitline Operation Address decodes into M wordlines Assert wordline word on bitlines Bit/bitline connection read/write Access latency #ports * #bits???????? SRAM: static RAM Bits as cross-coupled inverters Four transistors per bit More transistors for ports Static means Inverters connected to pwr/gnd Bits naturally/continuously refreshed Bit values never decay Designed for speed 7 8

DRAM Moore s Law (DRAM capacity) DRAM: dynamic RAM Bits as capacitors (if charge, bit=1) Pass transistors as ports One transistor per bit/port Dynamic means Capacitors not connected to pwr/gnd Stored charge decays over time Must be eplicitly refreshed Designed for density Moore s Law Year Capacity $/MB Access time 1980 64Kb $1500 250ns 1988 b $50 120ns 1996 6b $10 60ns 2004 1Gb $0.5 35ns 2008 4Gb ~$0.15 20ns Commodity DRAM parameters 16X increase in capacity every 8 years = 2X every 2 years Not quite 2X every 18 months (Moore s Law) but still close 9 10 DRAM Operation I DRAM Operation II write sa sa Read: similar to SRAM read Phase I: pre-charge bitlines (to ~0.5V) Phase II: decode, enable wordline Capacitor swings bitline voltage up (down) Sense-amplifier interprets swing as 1 (0) Destructive read: word bits now discharged Unlike SRAM Write: similar to SRAM write Phase I: decode, enable wordline Phase II: enable bitlines High bitlines charge corresponding capacitors What about leakage over time? r-i r r/w-i r/w-ii sa sa DL DL Solution: add set of D-latches (row buffer) Read: two steps Step I: read selected word into row buffer Step IIA: read row buffer out to pins Step IIB: write row buffer back to selected word + Solves destructive read problem Write: two steps Step IA: read selected word into row buffer Deletes what was in that word before Step IB: write into row buffer Step II: write row buffer back to selected word + Also helps to solve leakage problem 11 12

DRAM Refresh DRAM Parameters sa sa DRAM periodically refreshes all contents Loops through all words Reads word into row buffer Writes row buffer back into DRAM array 1 2% of DRAM time occupied by refresh DRAM bit array DRAM parameters Large capacity: e.g., 1-4Gb Arranged as square + Minimizes wire length + Maimizes refresh efficiency Narrow interface: 1 16 bit Cheap packages few bus pins Pins are epensive DL DL row buffer Narrow interface: N/2 bits 16Mb DRAM had a 12-bit bus How does that work? 13 14 Access Time and Cycle Time DRAM access slower than SRAM More bits longer wires Buffered access with two-level ing SRAM access latency: 2 3ns DRAM access latency: 20-35ns DRAM cycle time also longer than access time Cycle time: time between start of consecutive accesses SRAM: cycle time = access time Begin second access as soon as first access finishes DRAM: cycle time = 2 * access time Why? Can t begin new access while DRAM is refreshing row Brief History of DRAM DRAM (memory): a major force behind computer industry Modern DRAM came with introduction of IC (1970) Preceded by magnetic core memory (1950s) Core more closely resembles today s disks than memory Core dump is legacy terminology And by mercury delay lines before that (ENIAC) Re-circulating vibrations in mercury tubes the one single development that put computers on their feet was the invention of a reliable form of memory, namely the core memory It s cost was reasonable, it was reliable, and because it was reliable it could in due course be made large Maurice Wilkes Memoirs of a Computer Programmer, 1985 15 16

A Few Flavors of DRAM DRAM comes in several different varieties Go to Dell.com and see what kinds you can get for your laptop SDRAM = synchronous DRAM Fast, clocked DRAM technology Very common now Several flavors: DDR, DDR2, DDR3 RDRAM = Rambus DRAM Very fast, epensive DRAM GDRAM = graphics DRAM GDDR DRAM Packaging DIMM = dual inline memory module E.g., 8 DRAM chips, each chip is 4 or 8 bits wide 17 18 DRAM: A Vast Topic This Unit: Main Memory Many flavors of DRAMs DDR3 SDRAM, RDRAM, GDRAM, etc. Many ways to package them SIMM, DIMM, FB-DIMM, etc. Many different parameters to characterize their timing t RC, t RAC, t RCD, t RAS, etc. Many ways of using row buffer for caching Etc. There s at least one whole tetbook on this topic! And it has ~1K pages We could, but won t, spend rest of semester on DRAM Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review DRAM technology A few more transistors Organization: two level ing Building a memory system Bandwidth matching Error correction Organizing a memory system Virtual memory Address translation and page tables A virtual memory hierarchy 19 20

Building a Memory System An Eample Memory System I$ CPU L2 D$ Main Memory How do we build an efficient main memory out of standard DRAM chips? How many DRAM chips? What width/speed () bus to use? Assume separate bus Parameters 32-bit machine L2 with 3 blocks (must pull 3 out of memory at a time) 16b DRAMs, 20ns access time, 40ns cycle time Each chip is = 8 MB 100MHz (10ns period) bus 100MHz, 32-bit bus How many DRAM chips? How wide to make the bus? Disk(swap) 21 22 First Memory System Design Second Memory System Design T (ns) DRAM Data Bus 10 [31:30] 20 [31:30] 30 refresh [31:30] 40 refresh 4b T (ns) DRAM Bus 10 [31:30] 20 [31:30] 30 refresh [31H] 40 refresh [31L] 1 DRAM + 16b (=) bus Access time: 630ns Not including Cycle time: 640ns DRAM ready to handle another miss Observation: bus idle 75% of time! We have over-designed bus Can we use a cheaper bus? 50 [29:28] 60 [29:28] 70 refresh [29:28] 80 refresh 600 refresh 610 [1:0] 620 [1:0] 630 refresh [1:0] 640 refresh 1 DRAM + 4b bus One DRAM chip, don t need 16b bus DRAM: / 40ns 4b / 10ns Balanced system match bandwidths Access time: 660ns (30ns longer=+4%) Cycle time: 640ns (same as before) + Much cheaper! 50 [29:28] [30H] 60 [29:28] [30L] 70 refresh [29H] 80 refresh [29L] 600 [1:0] [2H] 610 [1:0] [2L] 620 refresh [1H] 640 refresh [1L] 650 [0H] 660 [0L] 23 24

Third Memory System Design 3 0 1 2 15 T (ns) DRAM0 DRAM1 DRAM15 Bus 10 [31:30] [29:28] [1:0] 20 [31:30] [29:28] [1:0] 30 refresh refresh refresh [31:0] 40 refresh refresh refresh How fast can we go? 16 DRAM chips + 3 bus Stripe across chips Byte M in chip (M/2)%16 (e.g., byte 38 is in chip 3) Access time: 30ns Cycle time: 40ns 3 bus is very epensive Latency and Bandwidth In general, given bus parameters Find smallest number of chips that minimizes cycle time Approach: match bandwidths between DRAMs and bus If they don t match, you re paying too much for the one with more bandwidth 25 26 Fourth Memory System Design 0 1 2 3 bus Bus b/w: /10ns DRAM b/w: /40ns 4 DRAM chips Access time: 180ns Cycle time: 160ns T (ns) DRAM0 DRAM1 DRAM2 DRAM3 Bus 10 [31:30] [29:28] [27:26] [25:24] 20 [31:30] [29:28] [27:26] [25:24] 30 refresh refresh refresh refresh [31:30] 40 refresh refresh refresh refresh [29:28] 50 [23:22] [21:20] [19:18] [17:16] [27:26] 60 [23:22] [21:20] [19:18] [17:16] [25:24] 110 refresh refresh refresh refresh [15:14] 120 refresh refresh refresh refresh [13:12] 130 [7:6] [5:4] [3:2] [1:0] [11:10] 140 [7:6] [5:4] [3:2] [1:0] [9:8] 150 refresh refresh refresh refresh [7:6] 160 refresh refresh refresh refresh [5:4] 170 [3:2] 180 [1:0] Memory Access and Clock Frequency Nominal clock frequency applies to CPU and caches Memory bus has its own clock, typically much slower SDRAM operates on bus clock Another reason why processor clock frequency isn t perfect performance metric Clock frequency increases don t reduce memory or bus latency May make misses come out faster At some point memory bandwidth may become a bottleneck Further increases in (core) clock speed won t help at all 27 28

Error Detection and Correction One last thing about DRAM technology: errors DRAM fails at a higher rate than SRAM or CPU logic Capacitor wear Bit flips from energetic α-particle strikes Many more bits Modern DRAM systems: built-in error detection/correction Key idea: checksum-style redundancy Main DRAM chips store, additional chips store f() f() < On read: re-compute f(), compare with stored f() Different? Error Option I (detect): kill program Option II (correct): enough information to fi error? fi and go on Error Detection and Correction f 0 1 2 3 Error detection/correction schemes distinguished by How many (simultaneous) errors they can detect How many (simultaneous) errors they can correct == error f 29 30 Error Detection Eample: Parity Parity: simplest scheme f( N 1 0 ) = XOR( N 1,, 1, 0 ) + Single-error detect: detects a single bit flip (common case) Will miss two simultaneous bit flips But what are the odds of that happening? Zero-error correct: no way to tell which bit flipped Many other schemes eist for detecting/correcting errors Take ECE 254 (Fault Tolerant Computing) for more info Memory Organization So is striped across DRAM chips But how is it organized? Block size? Associativity? Replacement policy? Write-back vs. write-thru? Write-allocate vs. write-non-allocate? Write buffer? Optimizations: victim buffer, prefetching, anything else? 31 32

Low % miss At All Costs For a memory component: t hit vs. % miss tradeoff Upper components (I$, D$) emphasize low t hit Frequent access minimal t hit important t miss is not bad minimal % miss less important Low capacity/associativity/block-size, write-back or write-through Moving down (L2) emphasis turns to % miss Infrequent access minimal t hit less important t miss is bad minimal % miss important High capacity/associativity/block size, write-back For memory, emphasis entirely on % miss t miss is disk access time (measured in ms, not ns) Typical Memory Organization Parameters Parameter I$/D$ L2 Main Memory t hit 1-2ns 10ns 30ns t miss 10ns 30ns 10ms (10M ns) Capacity 8 64KB 128KB 2MB 512MB 8GB Block size 16 3 32 256B 8 64KB pages Associativity 1 4 4 16 Full Replacement Policy NMRU NMRU working set Write-through? Sometimes No No Write-allocate? Sometimes Yes Yes Write buffer? Yes Yes No Victim buffer? Yes No No Prefetching? Sometimes Yes Sometimes 33 34 One Last Gotcha On a 32-bit architecture, there are 2 32 byte es Requires 4 GB of memory But not everyone buys machines with 4 GB of memory And what about 64-bit architectures? Let s take a step back 35