Storage Technologies and the Memory Hierarchy

Similar documents
CS 33. Memory Hierarchy I. CS33 Intro to Computer Systems XVI 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture

Random Access Memory (RAM)

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies

Foundations of Computer Systems

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 26, SPRING 2013

Computer Systems. Memory Hierarchy. Han, Hwansoo

The Memory Hierarchy /18-213/15-513: Introduction to Computer Systems 11 th Lecture, October 3, Today s Instructor: Phil Gibbons

Giving credit where credit is due

Computer Organization: A Programmer's Perspective

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 26, FALL 2012

CISC 360. The Memory Hierarchy Nov 13, 2008

The Memory Hierarchy Sept 29, 2006

The Memory Hierarchy / : Introduction to Computer Systems 10 th Lecture, Feb 12, 2015

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

Today. The Memory Hierarchy. Random-Access Memory (RAM) Nonvolatile Memories. Traditional Bus Structure Connecting CPU and Memory

Today. The Memory Hierarchy. Byte Oriented Memory Organization. Simple Memory Addressing Modes

Random-Access Memory (RAM) Lecture 13 The Memory Hierarchy. Conventional DRAM Organization. SRAM vs DRAM Summary. Topics. d x w DRAM: Key features

Key features. ! RAM is packaged as a chip.! Basic storage unit is a cell (one bit per cell).! Multiple RAM chips form a memory.

CSE 153 Design of Operating Systems

The Memory Hierarchy. Computer Organization 2/12/2015. CSC252 - Spring Memory. Conventional DRAM Organization. Reading DRAM Supercell (2,1)

Large and Fast: Exploiting Memory Hierarchy

NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313.

CSE 153 Design of Operating Systems Fall 2018

The. Memory Hierarchy. Chapter 6

Carnegie Mellon. Carnegie Mellon

Announcements. Outline

CS 153 Design of Operating Systems

Read-only memory (ROM): programmed during production Programmable ROM (PROM): can be programmed once SRAM (Static RAM)

Random-Access Memory (RAM) CISC 360. The Memory Hierarchy Nov 24, Conventional DRAM Organization. SRAM vs DRAM Summary.

CS 261 Fall Mike Lam, Professor. Memory

+ Random-Access Memory (RAM)

Giving credit where credit is due

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

A Computer. Computer organization - Recap. The Memory Hierarchy... Brief Overview of Memory Design. CPU has two components: Memory

Memory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP]

Computer Architecture and System Software Lecture 08: Assembly Language Programming + Memory Hierarchy

CS 261 Fall Mike Lam, Professor. Memory

211: Computer Architecture Summer 2016

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

Where Have We Been? Ch. 6 Memory Technology

Computer Organization: A Programmer's Perspective

UNIT 2 Data Center Environment

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Lecture 18: Memory Systems. Spring 2018 Jason Tang

CS 2461: Computer Architecture 1 The Memory Hierarchy and impact on program performance

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

The Memory Hierarchy 10/25/16

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

CS 31: Intro to Systems Storage and Memory. Kevin Webb Swarthmore College March 17, 2015

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

CENG3420 Lecture 08: Memory Organization

Review: Assembly Programmer s View. The Memory Hierarchy. Random- Access Memory (RAM) Today. NonvolaHle Memories. SRAM vs DRAM Summary

Main Memory (RAM) Organisation

Memory classification:- Topics covered:- types,organization and working

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Chapter 5 Internal Memory

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Computer Organization

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Advanced Parallel Architecture Lesson 4 bis. Annalisa Massini /2015

Computer Hardware 2. Type of Computers. User Point of View. The Motherboard. Inside. Content

Storage. CS 3410 Computer System Organization & Programming

Computer Organization and Assembly Language (CS-506)

Show how to connect three Full Adders to implement a 3-bit ripple-carry adder

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

CENG4480 Lecture 09: Memory 1

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

Chapter 6. Storage & Other I/O

Computer System Architecture

CS 320 February 2, 2018 Ch 5 Memory

Computer Science 61C Spring Friedland and Weaver. Input/Output

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

I/O CANNOT BE IGNORED

Computer Memory. Textbook: Chapter 1

UNIT-V MEMORY ORGANIZATION

I/O CANNOT BE IGNORED

CS311 Lecture 21: SRAM/DRAM/FLASH

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

BTEC Level 3 Unit 2. Computer Components

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Discovering Computers 2012

EEM 486: Computer Architecture. Lecture 9. Memory

1.1 Bits and Bit Patterns. Boolean Operations. Figure 2.1 CPU and main memory connected via a bus. CS11102 Introduction to Computer Science

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

COSC 243. Memory and Storage Systems. Lecture 10 Memory and Storage Systems. COSC 243 (Computer Architecture)

ECE 341. Lecture # 16

Memory and Disk Systems

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Memory Overview. Overview - Memory Types 2/17/16. Curtis Nelson Walla Walla University

COMPUTER ARCHITECTURE

Transcription:

Storage Technologies and the Memory Hierarchy 198:231 Introduction to Computer Organization Lecture 12 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Slides courtesy of R. Bryant and D. O Hallaron, CMU 1

Typical Computer Memory Hierarchy L0: Registers L1: L1 cache (SRAM) L2: L2 cache (SRAM) L3: L3 cache (SRAM) L4: Main memory (DRAM) L5: Local secondary storage (local disks) 2

Key features 198:231 Intro to Computer Organization Lecture 11 12 Random-Access Memory (RAM) RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. Static RAM (SRAM) Each cell stores a bit with a four or six-transistor circuit. Retains value indefinitely, as long as it is kept powered. Relatively insensitive to electrical noise (EMI), radiation, etc. Faster and more expensive than DRAM. Dynamic RAM (DRAM) Each cell stores bit with a capacitor. One transistor is used for access Value must be refreshed every 10-100 ms. More sensitive to disturbances (EMI, radiation, ) than SRAM. Slower and cheaper than SRAM. 3

SRAM vs DRAM Summary Trans. Access Needs Needs per bit time refresh? ECC*? Cost Applications SRAM 4 or 6 1X No Maybe 100x Cache memories DRAM 1 10X Yes Yes 1X Main memories, frame buffers *Error Correcting Code 4

d x w DRAM: 198:231 Intro to Computer Organization Lecture 11 12 Conventional DRAM Organization dw total bits organized as d supercells of size w bits Supercells are organized as a rectangular array with r rows and c columns, where rc = d 16 x 8 DRAM chip cols 0 1 2 3 (to/from CPU) Memory controller 2 bits / addr 8 bits / data rows 0 1 2 3 supercell (2,1) Internal row buffer 5

Reading DRAM Supercell (2,1) Step 1(a): Row access strobe (RAS) selects row 2. Step 1(b): Row 2 copied from DRAM array to row buffer. 16 x 8 DRAM chip Memory controller RAS = 2 2 / addr Rows 0 1 2 Cols 0 1 2 3 8 / data 3 Internal row buffer 6

Reading DRAM Supercell (2,1) Step 2(a): Column access strobe (CAS) selects column 1. Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU. 16 x 8 DRAM chip To CPU Memory controller CAS = 1 2 / addr Rows 0 1 2 Cols 0 1 2 3 supercell (2,1) 8 / data 3 supercell (2,1) Internal row buffer 7

Memory Modules addr (row = i, col = j) DRAM 7 DRAM 0 : supercell (i,j) 64 MB memory module consisting of eight 8Mx8 DRAMs bits 56-63 bits 48-55 bits 40-47 bits 32-39 bits 24-31 bits 16-23 bits 8-15 bits 0-7 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 64-bit doubleword at main memory address A Memory controller 64-bit doubleword 8

Enhanced DRAMs Basic DRAM cell has not changed since its invention in 1966. Commercialized by Intel in 1970. DRAM cores with better interface logic and faster I/O : Synchronous DRAM (SDRAM) Uses a conventional clock signal instead of asynchronous control Allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS) Can output the contents of a supercell at a faster rate than asynchronous DRAMs Double data-rate synchronous DRAM (DDR SDRAM) Double edge clocking sends two bits per cycle per pin Different types distinguished by size of small prefetch buffer:» DDR (2 bits), DDR2 (4 bits), DDR4 (8 bits) By 2010, standard for most server and desktop systems 9

Nonvolatile Memories DRAM and SRAM are volatile memories Lose information if powered off. Nonvolatile memories retain value even if powered off Read-only memory (ROM): programmed during production Programmable ROM (PROM): can be programmed once Erasable PROM (EPROM): can be bulk erased (UV, X-Ray) Electrically erasable PROM (EEPROM): electronic erase capability Flash memory: EEPROMs with partial (sector) erase capability Wears out after about 100,000 erasings Ubiquitous digital cameras, mobile phones, USB flash drives, etc. Uses for Nonvolatile Memories Firmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems, ) Solid state disks (replace rotating disks in thumb drives, smart phones, mp3 players, tablets, laptops, ) Disk caches 10

Traditional Bus Structure Connecting CPU and Memory A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. CPU chip Register file ALU System bus Memory bus Bus interface I/O bridge Main memory 11

Memory Read Transaction (1) CPU places address A on the memory bus. Register file Load operation: movl A, %eax %eax ALU Bus interface I/O bridge A Main memory 0 x A 12

Memory Read Transaction (2) Main memory reads A from the memory bus, retrieves word x, and places it on the bus. Register file Load operation: movl A, %eax %eax ALU I/O bridge Main x memory 0 Bus interface x A 13

Memory Read Transaction (3) CPU read word x from the bus and copies it into register %eax. Register file Load operation: movl A, %eax %eax x ALU I/O bridge Main memory 0 Bus interface x A 14

Memory Write Transaction (1) CPU places address A on bus. Main memory reads address A from bus. Register file Store operation: movl %eax, A %eax y ALU Bus interface I/O bridge A Main memory 0 A 15

Memory Write Transaction (2) CPU copies data word y in %eax and places y on the bus. Register file Store operation: movl %eax, A %eax y ALU Bus interface I/O bridge y Main memory 0 A 16

Memory Write Transaction (3) Main memory reads data word y from the bus and stores it at address A. register file Store operation: movl %eax, A %eax y ALU I/O bridge main memory 0 bus interface y A 17

198:231 Intro to Computer Organization Lecture 11 What s Inside A Disk Drive? Arm Spindle Platters Actuator SCSI connector Electronics (including a processor and memory!) Image courtesy of Seagate Technology 18

Disk Geometry Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps. Tracks Surface Track k Gaps Spindle Sectors 19

Disk Geometry (Muliple-Platter View) Aligned tracks form a cylinder. Cylinder k Surface 0 Surface 1 Surface 2 Surface 3 Surface 4 Surface 5 Platter 0 Platter 1 Platter 2 Spindle 20

Disk Capacity Capacity: maximum number of bits that can be stored. Vendors express capacity in units of gigabytes (GB) where 1 GB = 10 9 Bytes or terabytes (TB) where 1 TB = 10 12 Bytes Capacity is determined by these technology factors: Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. Areal density (bits/sq in): product of recording and track density. Modern disks partition tracks into disjoint subsets called recording zones Each track in a zone has the same number of sectors, determined by the circumference of innermost track. Each zone has a different number of sectors/track 21

Multiple Zone Recording Current disk drives use multiple zone recording where the number of sectors per track depends on the track number More sectors on outer tracks than on inner tracks Allows better utilization of outer tracks which have larger circumference more tracks outer tracks has more sectors inner tracks has fewer sectors 22

Computing Disk Capacity Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk) Example: 512 bytes/sector 300 sectors/track (on average) 20,000 tracks/surface 2 surfaces/platter 5 platters/disk Capacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB 23

spindle 198:231 Intro to Computer Organization Lecture 11 12 Disk Operation (Single-Platter View) The disk surface spins at a fixed rotational rate The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle spindle spindle By moving radially, the arm can position the read/write head over any track. 24

Disk Structure Single Platter, Top View Surface organized into tracks Tracks divided into sectors 25

Disk Access Head in position above a track 26

Disk Access Rotation is counter-clockwise 27

Disk Access Read About to read blue sector 28

Disk Access Read After BLUE read After reading blue sector 29

Disk Access Read After BLUE read Red request scheduled next 30

Disk Access Seek After BLUE read Seek for RED Seek to red s track 31

Disk Access Rotational Latency After BLUE read Seek for RED Rotational latency Wait for red sector to rotate around 32

Disk Access Read After BLUE read Seek for RED Rotational latency After RED read Complete read of red 33

Disk Access Service Time Components After BLUE read Seek for RED Rotational latency After RED read Data transfer Seek Rotational latency Data transfer 34

Disk Access Time Average time to access some target sector approximated by : Taccess = Tavg seek + Tavg rotation + Tavg transfer Seek time (Tavg seek) Time to position heads over cylinder containing target sector. Typical Tavg seek is 3 9 ms Rotational latency (Tavg rotation) Time waiting for first bit of target sector to pass under r/w head. Tavg rotation = 1/2 x 1/(# RPMs) x 60 sec/1 min Typical # RPMs = 7200 RPMs Transfer time (Tavg transfer) Time to read the bits in the target sector. Tavg transfer = 1/(# RPMs) x 1/(avg # sectors/track) x 60 secs/1 min. 35

Given: 198:231 Intro to Computer Organization Lecture 11 12 Disk Access Time Example Rotational rate = 7,200 RPM Average seek time = 9 ms. Avg # sectors/track = 400. Derived: Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. Tavg transfer = 60/7200 RPM x 1/(400 sectors/track) x 1000 ms/sec = 0.02 ms Taccess = 9 ms + 4 ms + 0.02 ms Important points: Access time dominated by seek time and rotational latency. First bit in a sector is the most expensive, the rest are free. SRAM access time is about 4 ns/doubleword, DRAM about 60 ns Disk is about 40,000 times slower than SRAM, 2,500 times slower then DRAM. 36

Disk Formatting Low-level formatting or physical formatting: dividing a disk into sectors that the disk controller can read and write Each sector holds header information, data, and error correction code (ECC) Legacy format uses 512 data bytes per sector Newer drives use 4096 data bytes per sector (4k Advanced Format) 37

Logical Blocks Disk is divided into logical blocks, where the logical block is the smallest unit of transfer Logical block is mapped to one or more contiguous sectors on disk Typical sector size = 512 bytes (legacy), 4096 bytes (4k Advanced Format) In paging system, logical block size is typically set to page size (e.g., 8192 bytes) Logical block addressing (LBA) is used for logical blocks Blocks are simply indexed 0, 1, 2, etc. Disk controller maps logical block address to a specific physical location on disk e.g., a CHS (Cylinder-Head-Sector) physical address Older disk drives require OS to explicitly specify CHS address of block In newer drives, OS need only specify logical block address disk controller does the mapping to CHS address 38

CPU chip Register file I/O Bus ALU System bus Memory bus Bus interface I/O bridge Main memory USB controller Graphics adapter I/O bus Disk controller Expansion slots for other devices such as network adapters. Mouse Keyboard Monitor Disk 39

CPU chip Register file 198:231 Intro to Computer Organization Lecture 11 12 Reading a Disk Sector (1) ALU CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller. Bus interface Main memory I/O bus USB controller mouse keyboard Graphics adapter Monitor Disk controller Disk 40

CPU chip Register file 198:231 Intro to Computer Organization Lecture 11 12 Reading a Disk Sector (2) ALU Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory. Bus interface Main memory I/O bus USB controller Graphics adapter Disk controller Mouse Keyboard Monitor Disk 41

CPU chip Register file 198:231 Intro to Computer Organization Lecture 11 12 Reading a Disk Sector (3) ALU When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special interrupt pin on the CPU) Bus interface Main memory I/O bus USB controller Graphics adapter Disk controller Mouse Keyboard Monitor Disk 42

Solid State Disk (SSD) Flash memory Block 0 198:231 Intro to Computer Organization Lecture 11 12 Solid State Disks (SSDs) I/O bus Page 0 Page 1 Page P-1 Flash translation layer Requests to read and write logical disk blocks Block B-1 Page 0 Page 1 Page P-1 Pages: 512KB to 4KB, Blocks: 32 to 128 pages Data read/written in units of pages Page can be written only after its block has been erased A block wears out after 100,000 repeated writes 43

SSD Performance Characteristics Sequential read tput* 250 MB/s Sequential write tput 170 MB/s Random read tput 140 MB/s Random write tput 14 MB/s Random read access 30 μs Random write access 300 μs *throughput Why are random writes so slow? Erasing a block is slow (around 1 ms) Write to a page triggers a copy of all useful pages in the block Find a used block (new block) and erase it Write the page into the new block Copy other pages from old block to the new block 44

Advantages SSD Tradeoffs vs Rotating Disks No moving parts faster, less power, more rugged Disadvantages Have the potential to wear out Mitigated by wear leveling logic in flash translation layer E.g. Intel X25 guarantees 1 petabyte (10 15 bytes) of random writes before they wear out In 2010, about 100 times more expensive per byte Applications MP3 players, smart phones, laptops Beginning to appear in desktops and servers 45

SRAM 198:231 Intro to Computer Organization Lecture 11 12 Storage Trends Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980 $/MB 19,200 2,900 320 256 100 75 60 320 access (ns) 300 150 35 15 3 2 1.5 200 DRAM Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980 $/MB 8,000 880 100 30 1 0.1 0.06 130,000 access (ns) 375 200 100 70 60 50 40 9 typical size (MB) 0.064 0.256 4 16 64 2,000 8,000 125,000 Disk Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980 $/MB 500 100 8 0.30 0.01 0.005 0.0003 1,600,000 access (ms) 87 75 28 10 8 4 3 29 typical size (MB) 1 10 160 1,000 20,000 160,000 1,500,000 1,500,000 46

CPU Clock Rates 1980 1990 1995 2000 2003 2005 2010 2010:1980 CPU 8080 386 Pentium P-III P-4 Core 2 Core i7 --- Clock rate (MHz) 1 20 150 600 3300 2000 2500 2500 Cycle time (ns) 1000 50 6 1.6 0.3 0.50 0.4 2500 Cores 1 1 1 1 1 2 4 4 Effective cycle 1000 50 6 1.6 0.3 0.25 0.1 10,000 time (ns) Inflection point in computer history when designers hit the Power Wall 47

ns 198:231 Intro to Computer Organization Lecture 11 12 The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds 100,000,000.0 10,000,000.0 Disk 1,000,000.0 100,000.0 SSD 10,000.0 1,000.0 100.0 DRAM Disk seek time Flash SSD access time DRAM access time SRAM access time CPU cycle time Effective CPU cycle time 10.0 1.0 0.1 CPU 0.0 1980 1985 1990 1995 2000 2003 2005 2010 Year 48

Memory Hierarchies Some fundamental and enduring properties of hardware and software: Fast storage technologies cost more per byte, have less capacity, and require more power (heat!) The gap between CPU and main memory speed is widening Implications Organize memory and storage systems as a memory hierarchy Use caching to close the bridge the CPU-memory gap Smaller, faster storage device acts as a staging area for a subset of the data in a larger, slower device Next topic 49

Typical Computer Memory Hierarchy L0: Registers CPU registers hold words retrieved from L1 cache Smaller, faster, costlier per byte L2: L1: L1 cache (SRAM) L2 cache (SRAM) L1 cache holds cache lines retrieved from L2 cache L2 cache holds cache lines retrieved from L3 cache Larger, slower, cheaper per byte L5: L4: L3: L3 cache (DRAM) Main memory (DRAM) Local secondary storage (local disks) L3 cache holds cache lines retrieved from main memory Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers 50