CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Similar documents
CpE 442. Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System

ECE468 Computer Organization and Architecture. Memory Hierarchy

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CSC Memory System. A. A Hierarchy and Driving Forces

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Technologies. Technology Trends

ECE ECE4680

EEC 483 Computer Organization

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Memory hierarchy Outline

COSC 6385 Computer Architecture - Memory Hierarchies (I)

CENG3420 Lecture 08: Memory Organization

Topic 21: Memory Technology

Topic 21: Memory Technology

Course Administration

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Modern Computer Architecture

CPE 631 Lecture 04: CPU Caches

14:332:331. Week 13 Basics of Cache

Advanced Computer Architecture

14:332:331. Week 13 Basics of Cache

Memories: Memory Technology

CENG4480 Lecture 09: Memory 1

Handout 4 Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

ECE232: Hardware Organization and Design

Memory. Lecture 22 CS301

Memory Hierarchy: Caches, Virtual Memory

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Recap: Machine Organization

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

LECTURE 11. Memory Hierarchy

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

The Memory Hierarchy Part I

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

EE 4683/5683: COMPUTER ARCHITECTURE

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Who Cares About the Memory Hierarchy? Time CS 152 Lec16.6. Performance. Recap

The Memory Hierarchy & Cache

Memory hierarchy review. ECE 154B Dmitri Strukov

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Introduction to memory system :from device to system

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

MIPS) ( MUX

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches

Lecture-14 (Memory Hierarchy) CS422-Spring

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy and Caches

Lecture 11. Virtual Memory Review: Memory Hierarchy

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

Computer Architecture

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

The University of Adelaide, School of Computer Science 13 September 2018

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

ECE 485/585 Microprocessor System Design

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

CS161 Design and Architecture of Computer Systems. Cache $$$$$

EEM 486: Computer Architecture. Lecture 9. Memory

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

ECE468 Computer Organization and Architecture. Virtual Memory

ECE 485/585 Microprocessor System Design

ECE4680 Computer Organization and Architecture. Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

The Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

Welcome to Part 3: Memory Systems and I/O

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

Computer Architecture Spring 2016

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Transcription:

cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example SPARCstation 2 s System Summary Hierarchy Direct Mapped Chapter 7, Appendix B, pages 26-33 cps 14 memory.2 RW Fall 2 The Big Picture Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic System cps 14 memory.3 RW Fall 2

cps 14 memory.4 RW Fall 2 Technologies Random Access Random is good access time is the same for all locations DRAM Dynamic Random Access - High density, low power, cheap, slow - Dynamic content must be refreshed regularly Static Random Access - Low density, high power, expensive, fast - Static content will last forever (until lose power) Not-so-random Access Technology Access time varies from location to location and from time to time Examples Disk, CDROM Sequential Access Technology access time linear in location (e.g.,tape) Hierarchy The various technologies listed have varying speed and cost Registers DRAM Disk Nsec/access 3 5-25 5-8 5-1 million Cost per MByte $25, $1-25 $5-1 $.1-.2 Programs are known to spend most of their time accessing a small part of their address space -- Principle of Locality Motivates designing memory as a HIERARCHY, with small, fast levels holding recently accessed data (which you hope is likely to be accessed again soon), while larger, slower-access levels hold larger and larger pieces of the total memory the program needs -- Disk holding it all Scheme gives the ILLUSION that all program memory can be accessed equally quickly, by having OS and hardware move data items from level to level, based on the access pattern of the program TAPE is generally so slow that it is reserved only for daily or monthly backups, to protect users from disk drive failures and viruses -- some applications may need it to store huge data bases, also. cps 14 memory.5 RW Fall 2 Random Access (RAM) Technology Why do computer professionals need to know about RAM technology? Processor performance is usually limited by memory latency and bandwidth. Latency The time it takes to access a word in memory. Bandwidth The average speed of access to memory (Words/Sec). As IC densities increase, lots of memory will fit on processor chip - Tailor on-chip memory to specific needs. - Instruction cache - Data cache - Write buffer What makes RAM different from a bunch of flip-flops? Density RAM is much denser Speed RAM access is slower than flip-flop (register) access. cps 14 memory.6 RW Fall 2

cps 14 memory.7 RW Fall 2 Technology Trends Capacity Speed Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 1.4x in 1 years Disk 2x in 3 years 1.4x in 1 years DRAM Year Size Cycle Time 11! 21! 198 64 Kb 25 ns 1983 256 Kb 22 ns 1986 1 Mb 19 ns 1989 4 Mb 165 ns 1992 16 Mb 145 ns 1995 64 Mb 12 ns Static RAM 6-Transistor 1 word (row select) word 1 bit bit Write bit 1. Drive bit lines (bit=1, bit=) 2.. Select row 3. Strength of BIT line signal over-rides latch bit Read 1. Precharge bit and bit to Vdd 2.. Select row 3. pulls one line low 4. Sense amp on column detects difference between bit and bit cps 14 memory.8 RW Fall 2 Typical Organization 16-word x 4-bit Din 3 Din 2 Din 1 Din Precharge WrEn Wr Driver & Wr Driver & Wr Driver & Wr Driver & - Pre-charger + - Pre-charger+ - Pre-charger+ - Pre-charger+ Word Word 1 Address Decoder A A1 A2 A3 - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp+ Word 15 Dout 3 Dout 2 Dout 1 Dout cps 14 memory.9 RW Fall 2

cps 14 memory.1 RW Fall 2 1-Transistor (DRAM) Write 1. Drive bit line 2.. Select row row select Read 1. Precharge bit line to Vdd 2.. Select row bit 3. and bit line share charges - Very small voltage changes on the bit line 4. Sense (fancy sense amp) - Can detect changes of ~1 million electrons 5. Write restore the value Refresh 1. Just do a dummy read to every cell. Introduction to DRAM Dynamic RAM (DRAM) Refresh required Very high density Low power (.1 -.5 W active,.25-1 mw standby) Low cost per bit Pin sensitive (few pins) - Output Enable (OE_L) - Write Enable (WE_L) - Row address strobe (ras) - Col address strobe (cas) addr log N 2 r o w N cell array N bits SA & col OE_L N WE_L D cps 14 memory.11 RW Fall 2 Classical DRAM Organization (square) bit (data) lines r o w d e c o d e r RAM Array Each intersection represents a 1-T DRAM word (row) select row address Sense-Amps, Column Selector & I/O Circuits Column Address data Row and Column Address together Select 1 bit at a time cps 14 memory.12 RW Fall 2

cps 14 memory.13 RW Fall 2 Typical DRAM Organization Typical DRAMs access multiple bits in parallel Example 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits Row and column addresses are applied to all 8 planes in parallel Plane 7 512 rows 512 cols Plane One Plane of 256 Kb DRAM Plane 256 Kb DRAM D<1> 256 Kb DRAM D<7> D<> Increasing Bandwidth - Interleaving Access Pattern without Interleaving CPU D1 available Start Access for D1 Start Access for D2 Access Pattern with 4-way Interleaving Access Bank Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank again CPU Bank Bank 1 Bank 2 Bank 3 cps 14 memory.14 RW Fall 2 SPARCstation 2 s System Overview Controller Bus (SIMM Bus) 128-bit wide datapath Processor Bus (Mbus) 64-bit wide Module 7 External Module 6 Module 5 Module 4 Module 3 Module 2 Module 1 Processor Module (Mbus Module) SuperSPARC Processor Instruction Data Register File Module cps 14 memory.15 RW Fall 2

cps 14 memory.16 RW Fall 2 Principle of Locality A Summary Two Different Types of Locality are observed Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. By taking advantage of the principle of locality Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. DRAM is slow but cheap and dense Good choice for presenting the user with a BIG memory system is fast but expensive and not very dense Good choice for providing the user FAST access time. The Motivation for s System Processor DRAM Motivation Large memories (DRAM) are slow Small memories () are fast Make the average access time small by Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory cps 14 memory.17 RW Fall 2 Levels of the Hierarchy Capacity Access Time Cost CPU Registers 1s Bytes <1s ns K Bytes 1-1 ns ~$2/MByte Main M Bytes 1ns-1ns ~$6/Mbyte Disk G Bytes 1,,ns ~$1/GByte Tape infinite sec-min ~$5/GByte Registers Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 8-128 bytes OS 512-4K bytes user/operator Mbytes Upper Level faster Larger Lower Level cps 14 memory.18 RW Fall 2

cps 14 memory.19 RW Fall 2 The Principle of Locality Probability of reference Address Space 2 n The Principle of Locality Programs access a relatively small portion of their address space during any period of a few hundred instructions. Example 9% of time in 1% of the code Two Different Types of Locality Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. Hierarchy Principles of Operation At any given time, data is copied between only 2 adjacent levels Upper Level () the one closer to the processor - Smaller, faster, and uses more expensive technology Lower Level () the one further away from the processor - Bigger, slower, and uses less expensive technology Block The minimum unit of information that can either be present or not present in the top level of the hierarchy To Processor From Processor Upper Level () Blk X Lower Level () Blk Y cps 14 memory.2 RW Fall 2 Hierarchy Terminology Hit data appears in some block in the upper level (example Block X) Hit Rate the fraction of memory accesses found in the upper level Hit Time Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty = Time to replace a block in the upper level + Time to deliver the block to the processor Hit Time << Miss Penalty To Processor From Processor Upper Level () Blk X Lower Level () Blk Y cps 14 memory.21 RW Fall 2

cps 14 memory.22 RW Fall 2 Basic Terminology Typical Values Typical Values Block (line) size 16-128 bytes Hit time 1-4 cycles Miss penalty 1-1 cycles (and increasing) (access time) (1-1 cycles) (transfer time) (5-22 cycles) Miss rate 1% - 2% Size 8 KB - 4 MB How Does Work? Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Keep more recently accessed data items closer to the processor Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. Move blocks consisting of contiguous words to the cache To Processor From Processor Upper Level Blk X Lower Level Blk Y cps 14 memory.23 RW Fall 2 Implementation Principles A memory address must be examined quickly, to determine if the addressed location IS or IS NOT in the cache = look-up table BUT Hardware has time for only ONE reference to the memory which implements the cache Hardware approximation Use SOME bits of the address to determine WHERE in cache data at that address MUST be stored Called cache index bits Since many addresses have SAME index bits, cache also stores the rest of the address Called cache tag bits Lowest order bits of address are used to select BYTE within a cache BLOCK 31 9 4 Tag Index Byte Select cps 14 memory.24 RW Fall 2

cps 14 memory.25 RW Fall 2 Direct Mapped Direct Mapped cache is an array of fixed size blocks. Each block holds consecutive bytes of main memory data. The Tag Array holds the Block Address. A valid bit associated with each cache block tells if the data is valid. (The data is not valid if it hasn t been loaded from memory.) Index The location of a block (and it s tag) in the cache. Byte Select The byte location within the cache block. -Index = (<Address> Mod (_Size))/ Block_Size Byte-Select = <Address> Mod (Block_Size) Tag = <Address> / (_Size) The Simplest Direct Mapped 1 Byte Address 1 2 3 4 5 6 7 8 9 A B C D E F 4 Byte Direct Mapped Index 1 2 3 Location can be occupied by data from location, 4, 8,... etc. In general any memory location whose 2 LSBs of the address are s Address<1> => cache index Which one should we place in the cache? How can we tell which one is in the cache? cps 14 memory.26 RW Fall 2 Tag and Index Assume a 32-bit memory (byte ) address A 2 N byte direct mapped cache with 1-Byte blocks - Index The lower N bits of the memory address - Tag The upper (32 - N) bits of the memory address 31 Example address x53. holds 2 8 = 256 bytes N Tag Example x5 Index Ex x3 Valid Bit Stored as part of the cache state x5 2 N Bytes Direct Mapped Byte Byte 1 1 Byte 2 2 Byte 3 3 Byte 2**N -1 2 N - 1 cps 14 memory.27 RW Fall 2

Access Example Start Up V Tag Data Access 1 (miss) Access 1 (HIT) M [1] 1 M [11] Miss Handling Load Data Write Tag & Set V Access 1 1 (miss) M [1] Access 1 1 (HIT) M [1] 1 M [11] Load Data Write Tag & Set V M [1] 1 M [11] Sad Fact of Life A lot of misses at start up Compulsory Misses - Cold start misses - Process migration cps 14 memory.28 RW Fall 2 Definition of a Block Block the cache data that has its own cache tag Our previous extreme example 4-byte Direct Mapped cache Block Size = 1 Byte Take advantage of Temporal Locality If a byte is referenced, it will tend to be referenced soon. Did not take advantage of Spatial Locality If a byte is referenced, its adjacent bytes will be referenced soon. In order to take advantage of Spatial Locality increase the block size Valid Tag Direct Mapped Data Byte Byte 1 Byte 2 Byte 3 cps 14 memory.29 RW Fall 2 Example 1 KB Direct Mapped with 32 Byte Blocks For a 2 N byte cache with Block Size = 2 M Bytes The uppermost (32 - N) bits are always the Tag The lowest M bits are the Byte Select bits Index bits N-M bits just higher-order than the Byte Select 31 Valid Bit Tag Tag x5 Example x5 Stored as part of the cache state 9 Index Ex x1 Data Byte 31 cps 14 memory.3 RW Fall 2 4 Byte Select Ex x Byte 1 Byte Select Byte 1 2 3 31

cps 14 memory.31 RW Fall 2 Block Size Tradeoff In general, larger block sizes take advantage of spatial locality BUT Larger block size means larger miss penalty - Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up - Too few cache blocks In general, Average Access Time Miss Penalty = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate Average Access Time Miss Rate Exploits Spatial Locality Fewer blocks compromises temporal locality Increased Miss Penalty & Miss Rate Block Size Block Size Block Size