CS 261 Fall Caching. Mike Lam, Professor. (get it??)

Similar documents
Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Today Cache memory organization and operation Performance impact of caches

Cache memories are small, fast SRAM based memories managed automatically in hardware.

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010

CISC 360. Cache Memories Nov 25, 2008

Cache Memories October 8, 2007

Giving credit where credit is due

Problem: Processor Memory BoJleneck

Memory hierarchies: caches and their impact on the running time

Memory Hierarchy. Slides contents from:

211: Computer Architecture Summer 2016

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

Last class. Caches. Direct mapped

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Types of Cache Misses. The Hardware/So<ware Interface CSE351 Winter Cache Read. General Cache OrganizaJon (S, E, B) Memory and Caches II

The Hardware/So<ware Interface CSE351 Winter 2013

CS 240 Stage 3 Abstractions for Practical Systems

How to Write Fast Numerical Code

CS429: Computer Organization and Architecture

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

CSE502: Computer Architecture CSE 502: Computer Architecture

CS 261 Fall Mike Lam, Professor. Virtual Memory

Cache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

ν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.

Cache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory

Caches III. CSE 351 Spring Instructor: Ruth Anderson

How to Write Fast Numerical Code Spring 2011 Lecture 7. Instructor: Markus Püschel TA: Georg Ofenbeck

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Logical Diagram of a Set-associative Cache Accessing a Cache

1. Memory technology & Hierarchy

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

CMPSC 311- Introduction to Systems Programming Module: Caching

14:332:331. Week 13 Basics of Cache

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations

Introduction. Memory Hierarchy

CS3350B Computer Architecture

Carnegie Mellon. Cache Memories

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Page 1. Multilevel Memories (Improving performance using a little cash )

CMPSC 311- Introduction to Systems Programming Module: Caching

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Advanced Memory Organizations

Cache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron

Memory Hierarchy. Slides contents from:

CS370 Operating Systems

CPE300: Digital System Architecture and Design

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Caches III CSE 351 Spring

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Problem: Processor- Memory Bo<leneck

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Advanced Computer Architecture

CS399 New Beginnings. Jonathan Walpole

Fundamentals of Computer Systems

CS429: Computer Organization and Architecture

Caches Spring 2016 May 4: Caches 1

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Memory. Objectives. Introduction. 6.2 Types of Memory

Caches III. CSE 351 Winter Instructor: Mark Wyse

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Memory Hierarchy: Caches, Virtual Memory

CS241 Computer Organization Spring Principle of Locality

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Introduction to OpenMP. Lecture 10: Caches

Memory hierarchy review. ECE 154B Dmitri Strukov

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CPSC 352. Chapter 7: Memory. Computer Organization. Principles of Computer Architecture by M. Murdocca and V. Heuring

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Page 1. Memory Hierarchies (Part 2)

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Lecture 2: Memory Systems

Transcription:

CS 261 Fall 2017 Mike Lam, Professor Caching (get it??)

Topics Caching Cache policies and implementations Performance impact General strategies

Caching A cache is a small, fast memory that acts as a buffer or staging area for a larger, slower memory Fundamental CS system design concept Data is transferred in blocks Slower caches use larger block sizes Cache hit vs. cache miss Hit ratio: # hits / # memory accesses

Caches A cache always begins cold (empty) As the cache loads data, it is warmed up Every request will be a cold miss initially This effect can cause performance measurement variation during experiments if not controlled for A working set is a collection of elements needed repeatedly for a particular computation If the working set doesn't fit in cache, this is called a capacity miss

Cache implementations What data structure can we use to implement caches? Need FAST lookups and containment checks

Cache implementations What data structure can we use to implement caches? Need FAST lookups and containment checks From CS 240: use a hash table! Cache address = "real address" % CACHE_SIZE What if we wanted our cache to store blocks longer than a single byte? What if multiple "real" addresses map to the same cache slot?

Cache implementations A cache line is a block or sequence of bytes that is moved between memory levels in a single operation A cache set is a collection of one or more cache lines Each cache line contains a tag to identify the source address and a valid flag/bit indicating whether the value is up-to-date Cache parameters (S, E, B, m): S = # of cache sets = 2s s = # of bits for set index E = # of lines per cache set B = block (cache line) size = 2b b = # of bits for block offset m = # of bits for memory address M = size of memory in bytes = 2m C = total cache capacity = S x E x B t = # of tag bits = m - s - b

Cache implementations Direct-mapped (E = 1) caches

Cache implementations Set-associative (1 < E < C/B) caches Two-way associative

Cache implementations Fully-associative (E = C/B) caches

Cache implementations Why use the middle bits for the set index? Contiguous memory blocks should map to different cache sets

Cache architecture Example: Intel Core i7 Per-core: Registers L1 d-cache and i-cache Data and instructions L2 unified cache Shared: L3 unified cache Main memory

Cache policies If a cache set is full, a cache miss in that set requires blocks to be replaced or evicted Policies: Random replacement Least recently used Least frequently used These policies require additional overhead More important for lower levels of the memory hierarchy

Cache policies How should we handle writes to a cached value? Write-through: immediately update to lower level Write-back: defer update until replacement/eviction Typically used for higher levels of memory hierarchy Typically used for lower levels of memory hierarchy How should we handle write misses? Write-allocate: load then update Typically used for write-back caches No-write-allocate: update without loading Typically used for write-through caches

Performance impact Metrics Hit rate/ratio: # hits / # memory accesses (1 miss rate) Miss rate/ratio: # misses / # memory accesses Hit time: delay in accessing data for a cache hit Miss penalty: delay in loading data for a cache miss Read throughput (or "bandwidth"): the rate that a program reads data from a memory system General observations: Larger cache = higher hit rate but higher hit time Lower miss rates = higher read throughput

Temporal locality Working set size vs. throughput

Spatial locality Stride vs. throughput

Memory mountain Stride and WSS vs. read throughput

Memory mountain (stu) Output of lscpu: Architecture: x86_64 Byte Order: Little Endian CPU(s): 24 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 Vendor ID: Intel Model name: Intel(R) Xeon(R) CPU E5-2640 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K

Case study: matrix multiply

Case study: matrix multiply

Optimization strategies Focus on the common cases Focus on the code regions that dominate runtime Focus on inner loops and minimize cache misses Favor repeated local accesses (temporal locality) Favor stride-1 access patterns (spatial locality)

Next time Virtual memory: an OS-level memory cache