Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Similar documents
Question?! Processor comparison!

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Modern Computer Architecture

Advanced Computer Architecture

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

EE 4683/5683: COMPUTER ARCHITECTURE

CPE 631 Lecture 04: CPU Caches

Course Administration

Handout 4 Memory Hierarchy

Memory Hierarchy: Motivation

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Memory Hierarchy: The motivation

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

1/19/2009. Data Locality. Exploiting Locality: Caches

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

ECE468 Computer Organization and Architecture. Virtual Memory

CSC Memory System. A. A Hierarchy and Driving Forces

14:332:331. Week 13 Basics of Cache

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory

The Memory Hierarchy & Cache

ECE468 Computer Organization and Architecture. Memory Hierarchy

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caching Basics. Memory Hierarchies

LECTURE 11. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE ECE4680

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Page 1. Multilevel Memories (Improving performance using a little cash )

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CS61C : Machine Structures

Memory hierarchy review. ECE 154B Dmitri Strukov

Page 1. Memory Hierarchies (Part 2)

Computer Architecture Spring 2016

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Memory System Design Part II. Bharadwaj Amrutur ECE Dept. IISc Bangalore.

CMPSC 311- Introduction to Systems Programming Module: Caching

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

V. Primary & Secondary Memory!

Lecture-14 (Memory Hierarchy) CS422-Spring

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

ECE331: Hardware Organization and Design

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Review : Pipelining. Memory Hierarchy

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

14:332:331. Week 13 Basics of Cache

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

COSC3330 Computer Architecture Lecture 19. Cache

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

CS61C : Machine Structures

Caches. Samira Khan March 23, 2017

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

CS 136: Advanced Architecture. Review of Caches

CS61C : Machine Structures

Memory. Lecture 22 CS301

Memory Hierarchy: Caches, Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

EN1640: Design of Computing Systems Topic 06: Memory System

CS3350B Computer Architecture

Transcription:

1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of a modern microprocessor (Also, explain how they interact with each other, with main memory, and with external storage media...) 2 3 4 Question? Let!s go back to a course goal... How much of a chip is memory? At the end of the semester, you should be able to... 10% 25% 50% 75%...describe the fundamental components required in a single core of a modern microprocessor (Also, explain how they interact with each other, with main memory, and with external storage media...) 85% Example How do on-chip memory, processor logic, main memory, disk interact?

5 Memory and Pipelining In our 5 stage pipe, we!ve constantly been assuming that we can access our operand from memory in 1 clock cycle We!d like this to be the case, and its is possible...but its also complicated We!ll discuss how this happens in the next several lectures We!ll talk about Memory Technology Memory Hierarchy Caches Memory Virtual Memory 6 If I say Memory what do you think of? Memory Comes in Many Flavors SRAM (Static Random Access Memory) DRAM (Dynamic Random Access Memory) ROM, Flash, etc. Disks, Tapes, etc. Difference in speed, price and size Fast is small and/or expensive Large is slow and/or expensive The search is on for a universal memory What!s a universal memory Fast and non-volatile. May be MRAM, PCRAM, etc. etc. Let!s start with DRAM. Its generally the largest piece of RAM. Performance 1000 10 0 10 7 Is there a problem with DRAM? 1 Processor-DRAM Memory Gap (latency) Why is this a problem? 1980 1981 1982 1983 1984 1985 1986 1987 1988 Moore!s Law 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time CPU DRAM µproc 60%/yr. (2X/1.5yr) Processor-Memory Performance Gap: grows 50% / year DRAM 9%/yr. (2X/10yrs) More on DRAM DRAM takes about 100 ns... What if clock rate for our 5 stage pipeline was 1 GHZ? Would the memory and fetch stages stall for 100 cycles? Actually, yes... if only DRAM Caches come to the rescue As transistors have gotten smaller, its allowed us to put more (faster) memory closer to the processing logic The idea is to mask the latency of having to go off to main memory 8

9 The principle of locality says that most programs don!t access all code or data uniformly i.e. in a loop, small subset of instructions might be executed over and over again & a block of memory addresses might be accessed sequentially This has lead to memory hierarchies Some important things to note: Fast memory is expensive Levels of memory usually smaller/faster than previous Levels of memory usually subset one another All the stuff in a higher level is in some level below it Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns 1-0.1 cents/bit Main Memory M Bytes 200ns- 500ns $.0001-.00001 cents /bit Disk G Bytes, 10 ms (10,000,000 ns) -5-6 10-10 cents/bit Tape infinite sec-min 10-8 10 The Full Memory Hierarchy always reuse a good idea Our current focus Registers Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 8-128 bytes OS 4K-16K bytes user/operator Mbytes Upper Level faster Larger Lower Level 11 12 Let!s talk about caches more on the chalkboard Accessing data in different levels of memory hierarchy: A performance perspective

13 Terminology Summary Hit: data appears in block in upper level (i.e. block X in cache) Hit Rate: fraction of memory access found in upper level Hit Time: time to access upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieved from a block in the lower level (i.e. block Y in memory) Miss Rate = 1 - (Hit Rate) Miss Penalty: Extra time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty (500 instructions on 21264) 14 Average Memory Access Time AMAT = HitTime + (1 - h) x MissPenalty Hit time: basic time of every access. Hit rate (h): fraction of access that hit Miss penalty: extra time to fetch a block from lower level, including time to replace in CPU To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y 15 16 A brief description of a cache Cache = next level of memory hierarchy up from register file All values in register file should be in cache Caches: Memory between registers and DRAM Cache entries usually referred to as blocks Block is minimum amount of information that can be in cache If we!re looking for item in a cache and find it, have a cache hit; it not a cache miss Cache miss rate = fraction of accesses not in the cache Miss penalty is # of clock cycles required b/c of the miss Mem. stall cycles = Inst. count x Mem. ref./inst. x Miss rate x Miss penalty

17 Cache Basics Fast (but small) memory close to processor When data referenced If in cache, use cache instead of memory If not in cache, bring into cache (actually, bring entire block of data, too) Maybe have to kick something else out to do it! Important decisions Placement: where in the cache can a block go Identification: how do we find a block in cache Replacement: what to kick out to make room in cache Write policy: What do we do about writes 18 Cache Basics Cache consists of block-sized lines Line size typically power of two Typically 16 to 128 bytes in size Example Suppose block size is 128 bytes Lowest seven bits determine offset within block Read data at address A=0x7fffa3f4 Address begins to block with base address 0x7fffa380 19 Some initial questions to consider 20 Where can a block be placed in a cache? Where can a block be placed in an upper level of memory hierarchy (i.e. a cache)? How is a block found in an upper level of memory hierarchy? Which cache block should be replaced on a cache miss if entire cache is full and we want to bring in new data? What happens if a you want to write back to a memory location? Do you just write to the cache? Do you write somewhere else? 3 schemes for block placement in a cache: Direct mapped cache: Block (or data to be stored) can go to only 1 place in cache Usually: (Block address) MOD (# of blocks in the cache) Fully associative cache: Block can be placed anywhere in cache Set associative cache: Set = a group of blocks in the cache Block mapped onto a set & then block can be placed anywhere within that set Usually: (Block address) MOD (# of sets in the cache) If n blocks, we call it n-way set associative

21 Where can a block be placed in a cache? 22 Associativity Cache: Fully Associative Direct Mapped Set Associative 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Set 0Set 1Set 2Set 3 If you have associativity > 1 you have to have a replacement policy FIFO LRU Random Block 12 can go anywhere 1 2 3 4 5 6 7 8 9.. Block 12 can go only into Block 4 (12 mod 8) Memory: 12 Block 12 can go anywhere in set 0 (12 mod 4) Full or Full-map associativity means you check every tag in parallel and a memory block can go into any cache block Virtual memory is effectively fully associative (But don!t worry about virtual memory yet) 23 How is a block found in the cache? Cache!s have address tag on each block frame that provides block address Tag of every cache block that might have entry is examined against CPU address (in parallel! why?) Each entry usually has a valid bit Tells us if cache data is useful/not garbage If bit is not set, there can!t be a match How does address provided to CPU relate to entry in cache? Entry divided between block address & block offset and further divided between tag field & index field 24 How is a block found in the cache? Tag Block Address Index Block offset field selects data from block (i.e. address of desired data within block) Index field selects a specific set Tag field is compared against it for a hit Could we compare on more of address than the tag? Not necessary; checking index is redundant Used to select set to be checked Ex.: Address stored in set 0 must have 0 in index field Offset not necessary in comparison entire block is present or not and all block offsets must match Block Offset

25 Which block should be replaced on a cache miss? If we look something up in cache and entry not there, generally want to get data from memory and put it in cache B/c principle of locality says we!ll probably use it again Direct mapped caches have 1 choice of what block to replace Fully associative or set associative offer more choices Usually 2 strategies: Random pick any possible block and replace it LRU stands for Least Recently Used Why not throw out the block not used for the longest time Usually approximated, not much better than random i.e. 5.18% vs. 5.69% for 16KB 2-way set associative Caches: Replacement Policies 26 27 What happens on a write? FYI most accesses to a cache are reads: Used to fetch instructions (reads) Most instructions don!t write to memory For MIPS only about 7% of memory traffic involve writes Translates to about 25% of cache data traffic Make common case fast! Optimize cache for reads! Actually pretty easy to do Can read block while comparing/reading tag Block read begins as soon as address available If a hit, address just passed right on to CPU Writes take longer. Any idea why? 28 What happens on a write? Generically, there are 2 kinds of write policies: Write through (or store through) With write through, information written to block in cache and to block in lower-level memory Write back (or copy back) With write back, information written only to cache. It will be written back to lower-level memory when cache block is replaced The dirty bit: Each cache entry usually has a bit that specifies if a write has occurred in that block or not Helps reduce frequency of writes to lower-level memory upon block replacement

29 What happens on a write? 30 What happens on a write? Write back versus write through: Write back advantageous because: Writes occur at the speed of cache and don!t incur delay of lower-level memory Multiple writes to cache block result in only 1 lower-level memory access Write through advantageous because: Lower-levels of memory have most recent copy of data If CPU has to wait for a write, we have write stall 1 way around this is a write buffer Ideally, CPU shouldn!t have to stall during a write Instead, data written to buffer which sends it to lowerlevels of memory hierarchy What if we want to write and block we want to write to isn!t in cache? There are 2 common policies: Write allocate: (or fetch on write) The block is loaded on a write miss The idea behind this is that subsequent writes will be captured by the cache (ideal for a write back cache) No-write allocate: (or write around) Block modified in lower-level and not loaded into cache Usually used for write-through caches (subsequent writes still have to go to memory) 31 Memory access equations Using what we defined on previous slide, we can say: Memory stall clock cycles = Reads x Read miss rate x Read miss penalty + " Writes x Write miss rate x Write miss penalty Often, reads and writes are combined/averaged: Memory stall cycles = Memory access x Miss rate x Miss penalty (approximation) Also possible to factor in instruction count to get a complete formula: 32 Reducing cache misses Obviously, we want data accesses to result in cache hits, not misses this will optimize performance Start by looking at ways to increase % of hits. but first look at 3 kinds of misses! Compulsory misses: Very 1st access to cache block will not be a hit the data!s not there yet! Capacity misses: Cache is only so big. Won!t be able to store every block accessed in a program must swap out! Conflict misses: Result from set-associative or direct mapped caches Blocks discarded/retrieved if too many map to a location