Pollard s Attempt to Explain Cache Memory

Similar documents
COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

A Cache Hierarchy in a Computer System

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Advanced Caching Techniques

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory Hierarchies 2009 DAT105

Advanced Caching Techniques

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Spring 2016

LECTURE 5: MEMORY HIERARCHY DESIGN

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Lecture 11. Virtual Memory Review: Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

CS3350B Computer Architecture

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 2: Memory Hierarchy Design Part 2

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Lecture notes for CS Chapter 2, part 1 10/23/18

Caching Basics. Memory Hierarchies

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

Lec 11 How to improve cache performance

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

EE 4683/5683: COMPUTER ARCHITECTURE

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Chapter 2: Memory Hierarchy Design Part 2

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Advanced cache optimizations. ECE 154B Dmitri Strukov

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Adapted from David Patterson s slides on graduate computer architecture

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

14:332:331. Week 13 Basics of Cache

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

CMSC 611: Advanced Computer Architecture. Cache and Memory

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Page 1. Memory Hierarchies (Part 2)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Lecture-18 (Cache Optimizations) CS422-Spring

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

ECE 30 Introduction to Computer Engineering

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Logical Diagram of a Set-associative Cache Accessing a Cache

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

EITF20: Computer Architecture Part4.1.1: Cache - 2

Introduction. Memory Hierarchy

14:332:331. Week 13 Basics of Cache

Memory Hierarchy Design

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Caches. Hiding Memory Access Times

Course Administration

Memory Hierarchy. Slides contents from:

Cache Performance (H&P 5.3; 5.5; 5.6)

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

CS422 Computer Architecture

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Handout 4 Memory Hierarchy

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

ארכיטקטורת יחידת עיבוד מרכזי ת

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Transcription:

Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1

Problem with System: Mis-matched Speed CPU CPU Speed: as fast as technology and application allow Big memory: must be slow to be economical One Solution: Small(er), Fast(er) for Active Stuff CPU Cache Main Cache designed so interaction with CPU happens at CPU speed Cache-to-Main Store interaction designed for larger data transfers 2

Step 1: Choose Sizes of Memories Cache Main Cache : 256 KBytes Main : 512 MBytes Step 2: Organize in Lines (Blocks) Cache Main Cache : 256 Kbytes or 8192 Lines with 32 bytes/line Main : 512 Mbytes or 16 M-lines with 32 bytes/line 3

Observation 1: 29 Bits Needed for Main Store Address 28 0 Main Store Address Observation 2: Each (Cache and Main Store) Made up of 32 Byte Lines Cache : 8192 Lines (only 128 shown here) 4

Observation 3: Address Within Line Takes 5 Bits 28 5 4 0 Main Store Address Address Within Line Step 3: Organize Lines into Groups or Sets of Four 0 Cache : 8192 Lines organized in 2048 Sets 2047 5

Observation 4: 2048 Sets Requires 11 Bits of Address 28 7 17 5 4 0 Main Store Address Set Identifier Bits Address Within Line Observation 5: Remainder of Address Bits Form (part of) Tag Tag Address Bits 28 7 17 5 4 0 Main Store Address Set Identifier Bits Address Within Line 6

Step 4: Add Tag Section To Cache Tags 0 Data 2047 Set Number Read Action Step 1: CPU Requests Read At Address CPU Generates Address: Address Separated into Tag, Set Number, Offset within Line CPU Cache Main 7

Read Action Step 2: Cache Checks for Line in Set Cache Compares Tag Bits Against Four Lines in Set: Any Match Results in Cache Hit CPU Cache Main Read Action Step 3: Cache Hit: Data Sent Immediately to CPU On Cache Hit, CPU Continiues Activity Immediately; No Pause In CPU Activity CPU Cache Main 8

Read Action Step 4: On Cache Miss, Controller Requests Line from Main Store CPU Cache Main Cache Miss: Cache Controller Moves Line from Main Store to Cache (four transfers if bus width = 8 bytes) Additional Cache Issues Write Protocol How to handle CPU writes Write Back: write changes handled in cache Write Thru: writes also modifdy main store Set size: how many lines per set Line-in-set selection algorithms vs reality Write Back sequence of events Cache concept and I/O requirements 9

Higher CPU Speeds Leads to Multi-Level Caching CPU Level 1 Cache Level 2 Cache Main Keywords from Text Cache Virtual Direct Mapped Set associative Fully associative Valid bit Block address Write thru Instruction cache Average access time Cache hit Page Miss penalty Dirty bit Block offset Write back Data cache Hit time 10

Keywords from Text Cache miss Page fault Miss rate Least recently used Tag field Write allocate Unified cache Misses per instruction Block/line Locality Address trace Set Random replacement Index field No-write allocate Write buffer Stall Four Hierarchy Ques Q1: Where can block be placed Q2: How is block found Q3: Which block replaced on miss Q4: What happens on write 11

Cache Performance Books version of equation: Average memory access time = hit time + Miss rate Miss penalty Reducing Cache Miss Penalty Technique 1: Multilevel Cache Technique 2: Critical word first, early restart Technique 3: Giving priority to read misses over writes Technique 4: Merging write buffer Technique 5: Victim Caches 12

Reducing Miss Rate Miss types: Compulsory (cold-start, first reference) Capacity (won t fit) Conflict (collision misses, interference misses) Miss rates depend on variety of factors Reducing Miss Rate Technique 1: Larger block size Technique 2: Larger Caches Technique 3: Higher associativity Technique 4: Way Prediction, Pseudoassociative Caches Technique 5: Compiler optimization Loop interchange Blocking 13

Miss Rate Help via Parallelism Nonblocking Caches Reduce Stalls on Cache Misses Hardware Prefetching of Instr, Data Compiler Controlled Prefetching Register prefetch / Cache prefetch Faulting/non-faulting Reducing Hit Time Small and Simple Caches Avoiding Address Translation during Indexing of the Cache Pipelined Cache Access Trace Caches 14

Main Organizations Wider Main Simple Interleaved Independent Banks SRAM DRAM ROM PROM EPROM Flash SSRAM DDR DRAM Technology 15

Virtual Vocabulary of virtual memory: Page / Segment Page size / Segment size Dirty page/segment Page replacement Address mapping TLB Use bit / reference bit Fragmentation Questions, Revisited Where can a page be placed in memory? How is a block found if it is in main memory? Which block should be replaced on a virtual memory miss? What happens on a write? 16

Page Size Selection Page table size (make page size bigger, less space required for page table) Larger page size conducive to larger/faster caches Transferring larger pages more efficient TLB fixed size; larger pages means more address space available at any time Smaller: conserve storage Fallacies and Pitfalls Fallacy: Predicting cache performance of one program from another Pitfall: Simulating enough instructions to get accurate performance measures of the memory hierarchy Pitfall: Too small an address space Pitfall: Emphasizing memory bandwidth in DRAMs versus memory latency 17

Fallacies and Pitfalls Pitfall: Delivering high memory bandwidth in a cache based system Pitfall: Ignoring the impact of the operating system on the performance of the memory hierarchy Pitfall: Relying on the operating systems to change the page size over time 18