CMPSC 311- Introduction to Systems Programming Module: Caching

Similar documents
CMPSC 311- Introduction to Systems Programming Module: Caching

+ Random-Access Memory (RAM)

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

CS3350B Computer Architecture

Caches II CSE 351 Spring #include <yoda.h> int is_try(int do_flag) { return!(do_flag!do_flag); }

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

F28HS Hardware-Software Interface: Systems Programming

Problem: Processor- Memory Bo<leneck

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Course Administration

ECE468 Computer Organization and Architecture. Memory Hierarchy

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Page 1. Multilevel Memories (Improving performance using a little cash )

CMPSC 311- Introduction to Systems Programming Module: UNIX/Operating Systems

Read-only memory (ROM): programmed during production Programmable ROM (PROM): can be programmed once SRAM (Static RAM)

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Fundamentals of Computer Systems

Introduction to OpenMP. Lecture 10: Caches

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory Hierarchy: Caches, Virtual Memory

EE 4683/5683: COMPUTER ARCHITECTURE

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Memory Management! Goals of this Lecture!

Chapter 6 Objectives

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory hierarchies: caches and their impact on the running time

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Advanced Memory Organizations

LECTURE 11. Memory Hierarchy

Cray XE6 Performance Workshop

CS3350B Computer Architecture

211: Computer Architecture Summer 2016

Advanced Computer Architecture

Lecture 12: Memory hierarchy & caches

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Question?! Processor comparison!

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Caches II. CSE 351 Spring Instructor: Ruth Anderson

Memory. Objectives. Introduction. 6.2 Types of Memory

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Caches Design of Parallel and High-Performance Computing Recitation Session

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Caches II. CSE 351 Autumn Instructor: Justin Hsia

Memory Hierarchy. Slides contents from:

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

CS 240 Stage 3 Abstractions for Practical Systems

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS 153 Design of Operating Systems

CS 136: Advanced Architecture. Review of Caches

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Cache Memory Mapping Techniques. Continue to read pp

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Memory Hierarchies &

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

MIPS) ( MUX

Caching Basics. Memory Hierarchies

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Caches. Samira Khan March 23, 2017

Fundamentals of Computer Systems

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Memory Hierarchy. Slides contents from:

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CPE300: Digital System Architecture and Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE502: Computer Architecture CSE 502: Computer Architecture

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

The University of Adelaide, School of Computer Science 13 September 2018

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Transcription:

CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2016

Reminder: Memory Hierarchy L0: Registers CPU registers hold words retrieved from L1 cache Smaller, faster, costlier per byte Larger, slower, cheaper per byte L4: L3: L2: L1: L1 cache (SRAM) L2 cache (SRAM) Main memory (DRAM) Local secondary storage (local disks) L1 cache holds cache lines retrieved from L2 cache L2 cache holds cache lines retrieved from main memory Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers L5: Remote secondary storage (tapes, distributed file systems, Web servers) Page 4

Processor Caches Most modern computers have multiple layers of caches to manage data passing into and out of the processors L1 very fast and small, processor adjacent L2 a bit slower but often much larger L3 larger still, maybe off chip May be shared amongst processors in multi-core system Memory slowest, least expensive Instruction caches are different from data caches CPU Level 1 Cache (small, fast) Level 2 Cache (bigger, slower) Level 3 Cache (bigger, slower) Memory (huge, slow, and inexpensive) Page

Caches Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy: For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work? Because of locality, programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. Page 6

Locality Caches exploit locality to improve performance, of which there are two types: Spatial locality: accessed data used is tend to be close to data you already accessed Temporal (time) locality: data that is accessed is likely to be accessed again soon This leads to two cache design strategies Spatial: cache items in blocks larger than that accessed Temporal: keep stuff used recently around longer Page

General Cache Concepts Cache 8 9 14 3 Page 8

General Cache Concepts Cache 8 9 14 3 Cache Lines Page 9

General Cache Concepts Cache 84 9 14 10 3 Smaller, faster, more expensive memory caches a subset of the blocks 10 4 Data is copied in block- sized transfer units Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Larger, slower, cheaper memory viewed as parmmoned into blocks Page 10

Cache Hit Cache Request: 14 8 9 14 3 Data in block b is needed Block b is in cache: Hit! Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Page 11

Cache Miss Cache Request: 12 8 12 9 14 3 Data in block b is needed Block b is not in cache: Miss! 12 Request: 12 Block b is fetched from memory Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block b is stored in cache Placement policy: determines where b goes Replacement policy: determines which block gets evicted (vicim) Page 12

Placement Policy Q: When a new block comes in, where in the cache can you keep it? A: Depends on the placement policy Anywhere (fully associative) Why not do this all the time? Exactly one cache line (direct-mapped) Commonly, block i is mapped to cache line (i mod t) where t is the total number of lines One of n cache lines (n-way set-associative) Fully Associative Page

Placement Policy Q: When a new block comes in, where in the cache can you keep it? A: Depends on the placement policy Anywhere (fully associative) Why not do this all the time? Exactly one cache line (direct-mapped) Commonly, block i is mapped to cache line (i mod t) where t is the total number of lines One of n cache lines (n-way set-associative) Direct Mapped Page

Placement Policy Q: When a new block comes in, where in the cache can you keep it? A: Depends on the placement policy Anywhere (fully associative) Why not do this all the time? Exactly one cache line (direct-mapped) Commonly, block i is mapped to cache line (i mod t) where t is the total number of lines One of n cache lines (n-way set-associative) N-Way Set Associated Page

Types of Cache Misses Cold (compulsory) miss Cold misses occur because the cache is empty. Capacity miss Occurs when the set of active cache blocks (working set) is larger than the cache. Conflict miss (direct mapping only) Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. E.g. Block i at level k+1 must be placed in block (i mod 4) at level k. Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8,... would miss every time. Page 16

Conflict Miss Page 17

Conflict Miss Page 18

Conflict Miss Page 19

Conflict Miss Page 20

Cache replacement policy When your cache is full and you acquire a new value, you must evict a previously stored value Performance of cache is determined by how smart you are in evicting values, known as a cache eviction policy Popular policies Least recently used (LRU) - eject the value that has been in the cache the longest without being accessed Least frequently used (LFU) - eject the value that accessed the least number of times First in-first out (FIFO) - eject the same order they come in Policy efficiency is measured by the hit performance (how often is something asked for and found) and measured costs Determined by working set and workload Page 21

Cache performance A cache hit is when the referenced information is served out of the cache A cache miss occurs referenced information cannot be served out of the cache The hit ratio is the: hit ratio = # cache hits # total accesses Note that the efficiency of a cache is almost entirely determined by the hit ratio. Page

Cache performance The average memory access time can be calculated: Where memory latency = hit cost + p(miss) miss penalty hit cost is the cost to access out of cache miss penalty is the cost to serve out of main memory P(miss) is the probability of a cache access resulting in a miss, e.g., ratio of hits/total accesses E.g., for a hit cost of 25 usec and, penalty of 250 usec, and cache hit rate of 80%: 25 usec +(0.2 250 usec) = 25 + 50 usec = 75 usec This is the average access time through the cache. Page

Example: 4 Line LRU Cache 1 T=0 Mem 4 Mem 1 T=1 0 3 Mem 1 4 (miss) (miss) (miss) Mem Mem Mem 1 0 1 4 0 1 1 4 3 time 0, memory[1] = 0 time 1, memory[4] = 1 time 2, memory[3] = 2 time 3, memory[1] = 3 time 4, memory[5] = 4 time 5, memory[1] = 5 time 6, memory[4] = 6 time 7, memory[0] = 7 time 8, memory[3] = 8 time 9, memory[1] = 9 T=2 0 1 0 1 2 1 Mem 1 4 3 (hit) Mem 1 4 3 T=3 0 1 2 3 1 2 5 Mem 1 4 3 (miss) Mem 1 4 3 5 T=4 3 1 2 3 1 2 4 Page

Example: 4 Line LRU Cache 1 Mem 1 4 3 5 (hit) Mem 1 4 3 5 T=5 3 1 2 4 5 1 2 4 4 Mem 1 4 3 5 T=6 5 1 2 4 0 Mem 1 4 3 5 (hit) Mem 1 4 3 5 5 6 2 4 (miss) Mem 1 4 0 5 time 0, memory[1] = 0 time 1, memory[4] = 1 time 2, memory[3] = 2 time 3, memory[1] = 3 time 4, memory[5] = 4 time 5, memory[1] = 5 time 6, memory[4] = 6 time 7, memory[0] = 7 time 8, memory[3] = 8 time 9, memory[1] = 9 T=7 5 6 2 4 5 6 7 4 3 Mem 1 4 0 5 (miss) Mem 1 4 0 3 T=8 5 6 7 4 5 6 7 8 1 Mem 1 4 0 3 (hit) Mem 1 4 0 3 T=9 5 6 7 8 9 6 7 8 Page

Example: 4 Line LRU Cache Result: 6 misses, 4 hits Pr(miss) = 0.6 Assume Hit cost (100 usec) Miss penalty (1000 usec) time 0, memory[1] = 0 time 1, memory[4] = 1 time 2, memory[3] = 2 time 3, memory[1] = 3 time 4, memory[5] = 4 time 5, memory[1] = 5 time 6, memory[4] = 6 time 7, memory[0] = 7 time 8, memory[3] = 8 time 9, memory[1] = 9 So the average memory access time is: 100 usec +(0.6 1000 usec) = 100 + 600 = 700 usec Q: Why is the performance so poor? Page