Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Similar documents
Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

CPU Pipelining Issues

6.004 Computation Structures Spring 2009

The Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1

The Memory Hierarchy. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

LECTURE 11. Memory Hierarchy

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory. Lecture 22 CS301

Page 1. Multilevel Memories (Improving performance using a little cash )

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CSE 2021: Computer Organization

Chapter 5. Memory Technology

Memory. Objectives. Introduction. 6.2 Types of Memory

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L18-Cache Structure 1

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

CSE 2021: Computer Organization

EN1640: Design of Computing Systems Topic 06: Memory System

Key Point. What are Cache lines

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CpE 442. Memory System

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L15-Cache Structure 1

Memory Hierarchy Y. K. Malaiya

Memory hierarchy and cache

EEC 483 Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Memory hierarchy Outline

Cycle Time for Non-pipelined & Pipelined processors

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 6 Objectives

Modern Computer Architecture

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

The Memory Hierarchy & Cache

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Caching Basics. Memory Hierarchies

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

UCB CS61C : Machine Structures

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

CENG3420 Lecture 08: Memory Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Course Administration

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

EN1640: Design of Computing Systems Topic 06: Memory System

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

CENG4480 Lecture 09: Memory 1

CS152 Computer Architecture and Engineering Lecture 16: Memory System

V. Primary & Secondary Memory!

MEMORY. Objectives. L10 Memory

ECE232: Hardware Organization and Design

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

14:332:331. Week 13 Basics of Cache

Computer Architecture

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memory Hierarchies &

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

Caches. Hiding Memory Access Times

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Lecture-14 (Memory Hierarchy) CS422-Spring

UCB CS61C : Machine Structures

Memory Hierarchy. Mehran Rezaei

EE 4683/5683: COMPUTER ARCHITECTURE

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

www-inst.eecs.berkeley.edu/~cs61c/

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Memory Hierarchy: Caches, Virtual Memory

COMP 3221: Microprocessors and Embedded Systems

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Transcription:

Memory Hierarchy It makes me look faster, don t you think? Are you dressed like the Easter Bunny? Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity (Study Chapter 5) L19 Memory Hierarchy 1

What Do We Want in a Memory? minimips PC INST MADDR MDATA Wr ADDR DOUT ADDR DATA R/W MEMORY Capacity Latency Cost Register 1000 s of bits 10 ps $$$$ SRAM 100 s Kbytes 0.2 ns $$$ DRAM 100 s Mbytes 5 ns $ Hard disk* 100 s Gbytes 10 ms Want? * non-volatile 4 Gbyte 0.2 ns cheap L19 Memory Hierarchy 2

Tricks for Increasing Throughput Multiplexed Address Col. 1 Col. 2 Col. 3 bit lines word lines The first thing that should pop into you mind when asked to speed up a digital design N Row Address Decoder Col. 2 M Row 1 Row 2 Row 2 N PIPELINING Synchronous DRAM (SDRAM) ($10 per Gbyte) N Column Multiplexer/Shifter memory cell (one bit) Double Data Rate Synchronous DRAM (DDR) Clock t 1 t 2 t 3 t 4 D Data out L19 Memory Hierarchy 3

Hard Disk Drives Typical high-end drive: Average latency = 4 ms (7200 rpm) Average seek time = 8.5 ms Transfer rate = 300 Mbytes/s (SATA) Capacity = 1000 G byte Cost = $100 (10 Gbyte) figures from www.pctechguide.com L19 Memory Hierarchy 4

Quantity vs Quality Memory systems can be either: BIG and SLOW... or SMALL and FAST. $/GB We ve explored a range of device-design trade-offs. 1000 100 10 1.1.01 SRAM (1000$/GB, 0.2 ns) DRAM (25$/GB, 5 ns) DISK (0.10$/GB, 10 ms) DVD Burner (0.02$/GB, 120ms) 10-8 10-6 10-3 1 100 Is there an ARCHITECTURAL solution to this DELIMA? Access Time L19 Memory Hierarchy 5

Managing Memory via Programming In reality, systems are built with a mixture of all these various memory types SRAM CPU MAIN MEM How do we make the most effective use of each memory? We could push all of these issues off to programmers Keep most frequently used variables and stack in SRAM Keep large data structures (arrays, lists, etc) in DRAM Keep bigger data structures on disk (databases) on DISK It is harder than you think data usage evolves over a program s execution L19 Memory Hierarchy 6

Best of Both Worlds What we REALLY want: A BIG, FAST memory! (Keep everything within instant access) We d like to have a memory system that PERFORMS like 2 GBytes of SRAM; but COSTS like 512 MBytes of slow memory. SURPRISE: We can (nearly) get our wish! KEY: Use a hierarchy of memory technologies: SRAM CPU MAIN MEM L19 Memory Hierarchy 7

Key IDEA Keep the most often-used data in a small, fast SRAM (often on to CPU chip) Refer to Main Memory only rarely, for remaining data. The reason this strategy works: LOCALITY Locality of Reference: Reference to location X at time t implies that reference to location X+ΔX at time t+δt becomes more probable as ΔX and Δt approach zero. L19 Memory Hierarchy 8

Typical Memory Reference Patterns address stack data program MEMORY TRACE A temporal sequence of memory references (addresses) from a real program. TEMPORAL LOCALITY If an item is referenced, it will tend to be referenced again soon SPATIAL LOCALITY If an item is referenced, nearby items will tend to be referenced soon. time L19 Memory Hierarchy 9

Working Set address S stack Δ t data program S is the set of locations accessed during Δt. Working set: a set S which changes slowly w.r.t. access time. Working set size, S Δt time L19 Memory Hierarchy 10

Exploiting the Memory Hierarchy Approach 1 (Cray, others): Expose Hierarchy Registers, Main Memory, Disk each available as storage alternatives; Tell programmers: Use them cleverly Approach 2: Hide Hierarchy SRAM CPU MAIN MEM Programming model: SINGLE kind of memory, single address space. Machine AUTOMATICALLY assigns locations to fast or slow memory, depending on usage patterns. CPU Small Static Dynamic RAM HARD DISK MAIN MEMORY L19 Memory Hierarchy 11

Why We Care CPU performance is dominated by memory performance. More significant than: ISA, circuit optimization, pipelining, super-scalar, etc CPU Small Static Dynamic RAM HARD DISK CACHE MAIN MEMORY VIRTUAL MEMORY SWAP SPACE TRICK #1: How to make slow MAIN MEMORY appear faster than it is. Technique: CACHEING Next 2 Lectures TRICK #2: How to make a small MAIN MEMORY appear bigger than it is. Technique: VIRTUAL MEMORY Lecture after that L19 Memory Hierarchy 12

The Cache Idea: Program-Transparent Memory Hierarchy CPU 1.0 (1.0-α) 100 37 DYNAMIC RAM "CACHE" "MAIN MEMORY" Cache contains TEMPORARY COPIES of selected main memory locations... eg. Mem[100] = 37 GOALS: 1) Improve the average access time α (1-α) HIT RATIO: Fraction of refs found in CACHE. MISS RATIO: Remaining references. Challenge: To make the hit ratio as high as possible. t ave = αt c + (1 α)(t c + t m ) = t c + (1 α)t m 2) Transparency (compatibility, programming ease) Why, on a miss, do I incur the access penalty for both main memory and cache? L19 Memory Hierarchy 13

How High of a Hit Ratio? Suppose we can easily build an on-chip static memory with a 800 ps access time, but the fastest dynamic memories that we can buy for main memory have an average access time of 10 ns. How high of a hit rate do we need to sustain an average access time of 1 ns? Solve forα t ave = t c + (1 α)t m α = 1 t t ave c 1 0.8 = 1 = 98% t m 10 WOW, a cache really needs to be good? L19 Memory Hierarchy 14

The Cache Principle Find Hart, Lee 5-Minute Access Time: 5-Second Access Time: ALGORTHIM: Look on your desk for the requested information first, if its not there check secondary storage L19 Memory Hierarchy 15

Basic Cache Algorithm Tag A B CPU Data Mem[A] Mem[B] (1-α) MAIN MEMORY ON REFERENCE TO Mem[X]: Look for X among cache tags... HIT: X = TAG(i), for some cache line i READ: return DATA(i) WRITE: change DATA(i); Start Write to Mem(X) MISS: X not found in TAG of any cache line REPLACEMENT SELECTION: Select some LINE k to hold Mem[X] (Allocation) READ: Read Mem[X] Set TAG(k)=X, DATA(K)=Mem[X] X here is a memory address. WRITE: Start Write to Mem(X) Set TAG(k)=X, DATA(K)= new Mem[X] L19 Memory Hierarchy 16

Cache Sits between CPU and main memory Very fast memory that stores TAGs and DATA TAG is the memory address (or part of it) DATA is a copy of memory at the address given by TAG Line 0 Line 1 Line 2 Line 3 1000 17 1040 1 1032 97 1008 11 Tag Cache Data Memory 1000 17 1004 23 1008 11 1012 5 1016 29 1020 38 1024 44 1028 99 1032 97 1036 25 1040 1 1044 4 L19 Memory Hierarchy 17

Cache Access On load we compare TAG entries to the ADDRESS we re loading If Found a HIT return the DATA 1000 17 If Not Found a MISS 1004 23 go to memory get the data decide where it goes in the cache, put it and its address (TAG) in the cache 1008 1012 11 5 Line 0 Line 1 Line 2 Line 3 1000 17 1040 1 1032 97 1008 11 Tag Cache Data 1016 29 1020 38 1024 44 1028 99 1032 97 1036 25 1040 1 1044 4 Memory L19 Memory Hierarchy 18

How Many Words per Tag? Caches usually get more data than requested (Why?) Each LINE typically stores more than 1 word, 16-64 bytes (4-16 Words) per line is common A bigger LINE means: 1) fewer misses because of spatial locality 2) fewer TAG bits per DATA bits but bigger LINE means longer time on miss Cache Memory 1000 17 1004 23 1008 11 1012 5 1016 29 1020 38 Line 0 Line 1 Line 2 Line 3 1000 17 23 1040 1 4 1032 97 25 1008 11 5 Tag Data 1024 44 1028 99 1032 97 1036 25 1040 1 1044 4 L19 Memory Hierarchy 19

How do we Search the Cache TAGs? Find Hart, Lee Nope, Smith Nope, Jones Associativity: The degree of parallelism used in the lookup of Tags HERE IT IS! Nope, LeVile L19 Memory Hierarchy 20

Fully-Associative Cache Incoming Address TAG =? Data The extreme in associatively: All TAGS are searched in parallel TAG =? Data HIT Data items from *any* address can be located in *any* cache line TAG =? Data Data Out L19 Memory Hierarchy 21

Find Hart, Lee Direct-Mapped Cache (non-associative) NO Parallelism: Look in JUST ONE place, determined by parameters of incoming request (address bits)... can use ordinary RAM as table Z Y H B A L19 Memory Hierarchy 22

Direct-Map Example With 8 byte lines, 3 low-order bits determine the byte within the line With 4 cache lines, the next 2 bits determine which line to use Line 0 Line 1 Line 2 Line 3 1024d = 10000000000 2 line = 00 2 = 0 10 1000d = 01111101000 2 line = 01 2 = 1 10 1040d = 10000010000 2 line = 10 2 = 2 10 Cache 1024 44 99 1000 17 23 1040 1 4 1016 29 38 Tag Data 1000 17 1004 23 1008 11 1012 5 1016 29 1020 38 1024 44 1028 99 1032 97 1036 25 1040 1 1044 4 Memory L19 Memory Hierarchy 23

Direct Mapping Miss What happens when we now ask for address 1008? 1008 10 = 01111110000 2 line = 10 2 = 2 10 but earlier we put 1040 there... 1040 10 = 10000010000 2 line = 10 2 = 2 10 1000 17 Memory 1004 23 Line 0 Line 1 Line 2 Line 3 Cache 1024 44 99 1000 17 23 1040 1008 11 45 1016 29 38 Tag Data 1008 11 1012 5 1016 29 1020 38 1024 44 1028 99 1032 97 1036 25 1040 1 1044 4 L19 Memory Hierarchy 24

Direct Mapped Cache LOW-COST Leader: Requires only a single comparator and use ordinary (fast) static RAM for cache tags & data: Incoming Address T K K-bit Cache Index K x (T + D)-bit static RAM Tag Data DISADVANTAGE: COLLISIONS T Upper-address bits QUESTION: Why not use HIGH-order bits as the Cache Index? =? HIT Data Out D-bit data word L19 Memory Hierarchy 25

A Problem with Collisions Find Heel, Art Find Here, Al T. Find Hart, Lee Nope, I ve got Heel under H PROBLEM: Contention among H s... - CAN T cache both Hart & Heel... Suppose H s tend to come at once? ==> BETTER IDEA: File by LAST letter! Z Y H B L19 Memory Hierarchy 26

Cache Questions = Cash Questions What lies between Fully Associate and Direct-Mapped? When I put something new into the cache, what data gets thrown out? How many processor words should there be per tag? When I write to cache, should I also write to memory? What do I do when a write misses cache, should space in cache be allocated for the written address. What if I have INPUT/OUTPUT devices located at certain memory addresses, do we cache them? L19 Memory Hierarchy 27