Memory Hierarchy Design (Appendix B and Chapter 2)

Similar documents
CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS3350B Computer Architecture

Page 1. Memory Hierarchies (Part 2)

EE 4683/5683: COMPUTER ARCHITECTURE

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Course Administration

14:332:331. Week 13 Basics of Cache

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

ECE331: Hardware Organization and Design

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CSE 2021: Computer Organization

ECE331: Hardware Organization and Design

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Memory Hierarchy. Slides contents from:

CSE 2021: Computer Organization

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy. Slides contents from:

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Review: Computer Organization

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Improving Cache Performance

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Why memory hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

The University of Adelaide, School of Computer Science 13 September 2018

ECE232: Hardware Organization and Design

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Improving Cache Performance

EEC 483 Computer Organization

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

14:332:331. Week 13 Basics of Cache

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Computer Systems Laboratory Sungkyunkwan University

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

ECEC 355: Cache Design

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

V. Primary & Secondary Memory!

Lecture 11 Cache. Peng Liu.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LECTURE 11. Memory Hierarchy

COSC3330 Computer Architecture Lecture 19. Cache

Chapter 5. Memory Technology

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

12 Cache-Organization 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

A Cache Hierarchy in a Computer System

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Advanced Memory Organizations

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Direct-Mapped and Set Associative Caches. Instructor: Steven Ho

ECE260: Fundamentals of Computer Engineering

Page 1. Multilevel Memories (Improving performance using a little cash )

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Introduction to cache memories

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

ארכיטקטורת יחידת עיבוד מרכזי ת

Key Point. What are Cache lines

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Logical Diagram of a Set-associative Cache Accessing a Cache

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Introduction. Memory Hierarchy

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Transcription:

CS359: Computer Architecture Memory Hierarchy Design (Appendix B and Chapter 2) Yanyan Shen Department of Computer Science and Engineering 1

Four Memory Hierarchy Questions Q1 (block placement): where can a block be placed in cache? Q2 (block identification): how is a block found if it is in cache? Q3 (block replacement): which block should be replaced on a miss? Q4 (write strategy): what happens on a write? 2

Q1: Block Placement Cache: 8 blocks Memory: 32 blocks 3

4

Q2: Block Identification Caches have an address tag on each block frame that gives the block address It is checked to see if it matches the block address from the processor Each cache block has a valid bit To indicate if the entry has a valid address 31 0 5

Two questions to answer (in hardware): Question 2: How do we know if a data item is in the cache? Question 1: If it is, how do we find it? 6

Direct mapped Each memory block is mapped to exactly one block in the cache lots of blocks must share blocks in the cache Address mapping (to answer Q1): (block address) modulo (# of blocks in the cache) Have a tag associated with each cache block that contains the address information (the upper portion of the address) required to identify the block (to answer Q2) 7

Caching: A Simple First Example Cache Index Valid 00 01 10 11 Tag Q2: Is it there? Data Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Main Memory 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx Byte addressable! One-word blocks Two low order bits define the byte in the word (32b words) Q1: How do we find it? Use next 2 low order memory address bits the index to determine which cache block (ie, modulo the number of blocks in the cache) (block address) modulo (# of blocks in the cache) 8

Direct Mapped Cache Example 31 30 13 12 11 2 1 0 Block offset Hit Tag 20 Index 10 Data Index 0 1 2 1021 1022 1023 Valid Tag 20 Data 32 One word blocks, cache size = 1K words (or 4KB) 9

Multiword Block Direct Mapped Cache Hit Tag 31 30 13 12 11 4 3 2 1 0 Byte offset 20 Index 8 Block offset Data IndexValid 0 1 2 253 254 255 Tag 20 Data Four words/block, cache size = 1K words 32 10

Set Associative Mapped Allow more flexible block placement In a direct mapped cache a memory block maps to exactly one cache block At the other extreme, could allow a memory block to be mapped to any cache block fully associative cache A compromise is to divide the cache into sets each of which consists of n ways (n-way set associative) A memory block maps to a unique set (specified by the index field) and can be placed in any way of that set (so there are n choices) (block address) modulo (# sets in the cache) 11

Set Associative Cache Example Cache Way 0 1 Set 0 1 0 1 V Tag Q2: Is it there? Data Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache Main Memory 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx One-word blocks Two low order bits define the byte in the word (32b words) Q1: How do we find it? Use next 1 low order memory address bit to determine which cache set (ie, modulo the number of sets in the cache) 12

Four-Way Set Associative Cache 31 30 13 12 11 2 1 0 Byte offset Tag 22 Index 8 2 8 = 256 sets each with four ways (each with one block) Index 0 1 2 253 254 255 V Tag Data 0 1 2 253 254 255 V Tag Data 0 1 2 253 254 255 V Tag Data 0 1 2 253 254 255 Way 0 Way 1 Way 2 Way 3 V Tag Data 32 Hit 4x1 select Data 13

Q3: Block Replacement When a miss occurs, the cache controller must select a block to be replaced with the desired data Data cache misses per 1000 instructions comparing LRU, Random and FIFO 14

Q4: Write Strategy Write data to cache Two write policies Write-through The information is written to both the block in the cache and the block in the lower-level memory Write-back The information is written only to the block in the cache The modified cache block is written to main memory only when it is replaced 15

Handling Cache Hits Read hits (I$ and D$) this is what we want! Write hits (D$ only) require the cache and memory to be consistent always write the data into both the cache block and the next level in the memory hierarchy (write-through) writes run at the speed of the next level in the memory hierarchy so slow! or can use a write buffer and stall only if the write buffer is full allow cache and memory to be inconsistent write the data only into the cache block (write-back the cache block to the next level in the memory hierarchy when that cache block is evicted ) need a dirty bit for each data cache block to tell if it needs to be written back to memory when it is evicted can use a write buffer to help buffer writebacks of dirty blocks 16

Handling Cache Misses (Single Word Blocks) Read misses (I$ and D$) stall the pipeline, fetch the block from the next level in the memory hierarchy, install it in the cache and send the requested word to the processor, then let the pipeline resume Write misses (D$ only) 1 stall the pipeline, fetch the block from next level in the memory hierarchy, install it in the cache (which may involve having to evict a dirty block if using a write-back cache), write the word from the processor to the cache, then let the pipeline resume or 2 Write allocate first fetch the block from memory and then write the word to the block or 3 No-write allocate skip the cache write (but must invalidate that cache block since it will now hold stale data) and just write the word to the write buffer (and eventually to the next memory level), no need to stall if the write buffer isn t full 17

Handling Cache Misses (Multiword Blocks) Read misses (I$ and D$) Processed the same as for single word blocks a miss returns the entire block from memory Miss penalty grows as block size grows Early restart processor resumes execution as soon as the requested word of the block is returned Requested word first requested word is transferred from the memory to the cache (and processor) first Nonblocking cache allows the processor to continue to access the cache while the cache is handling an earlier miss Write misses (D$) If using write allocate must first fetch the block from memory and then write the word to the block 18

Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors Aggregate peak bandwidth grows with # cores: Intel Core i7 can generate two references per core per clock Four cores and 32 GHz clock 256 billion 64-bit data references/second + 128 billion 128-bit instruction references = 4096 GB/s! DRAM bandwidth is only 6% of this (25 GB/s) 19

Cache Performance Measuring Memory Hierarchy Performance Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Hit Ratio? Hit Time : Miss Penalty : The time to hit in the cache The time to get data from the main memory Miss Rate : Fraction of memory accesses not found in the cache 20

Then, Processor Performance? Previously Now CPU Time = Instruction Counts * CPI * Cycle Time = CPU Execution Clock Cycles * Cycle Time CPU Time = (CPU Execution Clock Cycles + Memory Stall Clock Cycles) * Cycle Time Thus, cache behavior have a great impact on processor performance!! 21

CPU Time as function of CPI CPU Time = (CPU Execution Clock Cycles + Memory Stall Clock Cycles) * Cycle Time = (IIII CCCCCC + IIII MMMMMMMMMMMM AAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIII MMMMMMMM RRRRRRRR MMMMMMMM PPPPPPPPPPPPPP) * Cycle Time = IC * (CCCCCC + MMMMMMMMMMMM AAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIII MMMMMMMM RRRRRRRR MMMMMMMM PPPPPPPPPPPPPP) * Cycle Time The lower CPI, the greater impact of the misses 22