ECSE 425 Lecture 21: More Cache Basics; Cache Performance

Similar documents
Caching Basics. Memory Hierarchies

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Mul$- Level Caches

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Main Memory Supporting Caches

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

ECSE 425 Lecture 20: Cache Basics

Improving Cache Performance

Improving Cache Performance

CPE 631 Lecture 04: CPU Caches

Introduction to cache memories

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Lecture 11 Cache. Peng Liu.

Page 1. Multilevel Memories (Improving performance using a little cash )

CS 136: Advanced Architecture. Review of Caches

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

COSC3330 Computer Architecture Lecture 19. Cache

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Lec 11 How to improve cache performance

Caches Concepts Review

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Computer Architecture CS372 Exam 3

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Memory System Design Part II. Bharadwaj Amrutur ECE Dept. IISc Bangalore.

Advanced Memory Organizations

Aleksandar Milenkovich 1

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Adapted from David Patterson s slides on graduate computer architecture

Cache Memory and Performance

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

ECE331: Hardware Organization and Design

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Question?! Processor comparison!

CS 61C: Great Ideas in Computer Architecture Lecture 16: Caches, Part 3

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Page 1. The Problem. Caches. Balancing Act. The Solution. Widening memory gap. As always. Deepen memory hierarchy

CPE 631 Lecture 06: Cache Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Aleksandar Milenkovich 1

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

EE 4683/5683: COMPUTER ARCHITECTURE

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Modern Computer Architecture

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ECE232: Hardware Organization and Design

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

COSC 6385 Computer Architecture - Memory Hierarchies (I)

CS422 Computer Architecture

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Page 1. Memory Hierarchies (Part 2)

Memory Hierarchy. Slides contents from:

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

ארכיטקטורת יחידת עיבוד מרכזי ת

Lecture 11: Memory Systems -- Cache Organiza9on and Performance CSE 564 Computer Architecture Summer 2017

Final Exam Fall 2007

CS3350B Computer Architecture

ECE331: Hardware Organization and Design

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996

Memory Hierarchy. Mehran Rezaei

Memory. Lecture 22 CS301

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Why memory hierarchy

Lecture 11 Reducing Cache Misses. Computer Architectures S

ECE331: Hardware Organization and Design

Pipelined processors and Hazards

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Architecture Spring 2016

CS161 Design and Architecture of Computer Systems. Cache $$$$$

EITF20: Computer Architecture Part4.1.1: Cache - 2

The Memory Hierarchy & Cache

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

LECTURE 11. Memory Hierarchy

Computer Architecture Spring 2016

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

5 Memory-Hierarchy Design

ECE331: Hardware Organization and Design

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Tutorial 11. Final Exam Review

Transcription:

ECSE 425 Lecture 21: More Cache Basics; Cache Performance H&P Appendix C 2011 Gross, Hayward, Arbel, Vu, Meyer Textbook figures

Last Time Two QuesRons Q1: Block placement Q2: Block idenrficaron ECSE 425, Fall 2011, Lecture 21 2

Today Two more quesrons Q3: Block replacement Q4: Write strategy Performance of CPUs with cache Appendix C.2 ECSE 425, Fall 2011, Lecture 21 3

Q3: Block Replacement Which block should be replaced on a miss? Direct mapped no choice! Index specifies the cache block for replacement Advantage: simple logic Disadvantage: higher miss rates Fully associarve or set associarve Several blocks to choose from Make good choices to reduce miss rates ECSE 425, Fall 2011, Lecture 20 4

Three Replacement Strategies Random simple! Least- Recently Used (LRU) Recently accessed blocks will likely be accessed again The converse is also true: blocks not recently accessed are less likely to be accessed again True LRU is too expensive; esrmate it E.g., record when items are accessed First in, first out (FIFO) Replace oldest block Another approximaron of LRU ECSE 425, Fall 2011, Lecture 20 5

Q4: Write Strategy Most cache accesses are reads All instrucron accesses are reads MIPS: 10% stores, 26% loads Writes represent 7% of overall memory traffic 28% of traffic to the data cache Two conflicrng design principles Make the common case fast: oprmize for reads Amdahl s law: don t neglect writes ECSE 425, Fall 2011, Lecture 20 6

Reads vs. Writes Reads are the easiest to make fast Read block at same Rme as tag check Discard data if tags don t match Power is the only drawback Why are writes slow? Writes cannot begin unrl ajer tag check Writes can be of variable width Reads too; power is the only drawback of reading more ECSE 425, Fall 2011, Lecture 20 7

Two Write Strategies Write back Write informaron only to block in the cache Write the modified cache block to the main memory only when it is replaced Write through Write informaron to the block in the cache and Write to the block in lower- level memory ECSE 425, Fall 2011, Lecture 20 8

Write Back Advantages: Writes occur at speed of cache MulRple writes to a block coalesced into one write to main memory Reduces pressure on memory bandwidth Reduces power dissipated in the memory system Dirty bit kept for each block Indicates whether the block is dirty (modified while in the cache) or clean (not modified) On replacement, clean blocks are discarded Only dirty blocks trigger wrirng to the next level ECSE 425, Fall 2011, Lecture 20 9

Write Through Simple, easier to implement Cache is always clean Read misses never result in writes to lower level Main memory always has current data Reduces the complexity of cache coherency ECSE 425, Fall 2011, Lecture 20 10

Stalling on Writes Avoid stalling on writes by using a write buffer Once data is wrioen to the buffer, conrnue Loads must then check the write buffer for data If the data is cached, but the write is srll buffered, cache doesn t contain the current value ECSE 425, Fall 2011, Lecture 20 11

Two Write Miss Strategies Write allocate Make write misses act like read misses Retrieve the block, then proceed as if access was a hit No- write allocate Don t retrieve the block: modify it directly in lower- level memory instead Normally WB caches write allocate (benefit from locality) WT caches don t write allocate (avoid redundant writes to mulrple levels of memory) ECSE 425, Fall 2011, Lecture 20 12

Average Memory Access Time Miss rate Accesses that miss / Total accesses Convenient metric Independent of the speed of the hardware A beoer measure Average Memory Access Time AMAT = HitTime + MissRate MissPenalty ECSE 425, Fall 2011, Lecture 21 13

Example: Unified vs. Split Caches InstrucRon and data streams are different InstrucRons are fixed size and read- only InstrucRon stream is predictable Divide cache capacity between two caches An instrucron cache (I$) accessed during IF A data cache (D$) accessed during MEM ECSE 425, Fall 2011, Lecture 21 14

Example: Unified vs. Split Caches, Cont d 16 KB I$ and 16 KB D$ vs. 32 KB U$ 16 KB I$ has 3.83 misses per 1000 instrucrons 16 KB D$ has 40.9 misses per 1000 instrucrons 32 KB U$ has 43.3 misses per 1000 instrucrons Assume 10% of instrucrons are stores 26% of instrucrons are loads 1 cycle hit Rme 100 cycle miss penalty 1 extra cycle penalty for U$ structural hazard What is the AMAT in each case? Does miss rate predict AMAT? ECSE 425, Fall 2011, Lecture 21 15

CPU Performance with Imperfect Caches When caches are perfect CPUTime = IC CPI base CCTime One cycle of hit Rme is included in CPI base When caches aren t perfect, stalls! ECSE 425, Fall 2011, Lecture 21 16

CPU Performance Example 1 Assume In- order processor Miss penalty of 200 CC CPI = 1 (when we ignore memory stalls) 1.5 memory accesses per instrucron on average 30 misses / 1000 instrucrons Compare CPUTime with, and without cache ECSE 425, Fall 2011, Lecture 21 17

Cache Impact on Performance As CPI decreases The relarve penalty of a cache miss increases With faster clocks, a fixed memory delay yields more stall clock cycles! Amdahl s law states that If we decrease CPI, But average memory access Rme is fixed, then The overall speedup is limited by the fracron of Rme spent compurng relarve to total execuron Rme ECSE 425, Fall 2011, Lecture 21 18

CPU Performance Example 2 With a perfect cache CPI = 1.6 CC = 0.35 ns 1.4 mem refs / instrucron Compare two 128 KB caches 64 bytes blocks Miss penalty = 65 ns Hit Rme = 1 cc Direct mapped Miss rate = 2.1% 2- way set associarve CC 2- way = 1.35 CC 1- way Miss rate = 1.9% What is CPUTime in each case? Does AMAT predict CPUTime? ECSE 425, Fall 2011, Lecture 21 19

AMAT is not CPU Time! In the previous example: AMAT is lower for 2- way CPUTime is lower for 1- way Full simularon is the best predictor of performance 2- way set associarve cache increases the clock cycle for ALL instrucrons Slower ALU operarons Slower hits (the common case) Degrades performance despite improving miss rate ECSE 425, Fall 2011, Lecture 21 20

Out- of- Order ExecuRon Miss penalty does not mean the same thing The machine does not totally stall on a cache miss Other instrucrons allowed to proceed What is the miss penalty for OOO ExecuRon? Full latency of the memory miss? No The non- overlapped latency when the CPU must stall Define: Because of O- O- E

Summary Block replacement Random, LRU, FIFO Write strategy Write- back Write- through Write misses Write allocate No- write allocate Performance with Caches Miss Rate Average Memory Access Time CPU Time OOO- E vs. IO- E Some of the miss penalty can be overlapped ECSE 425, Fall 2011, Lecture 21 22

Next Time Basic cache oprmizarons Appendix C.3 ECSE 425, Fall 2011, Lecture 21 23