Cache Policies. Philipp Koehn. 6 April 2018

Similar documents
Physical characteristics (such as packaging, volatility, and erasability Organization.

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

1. Creates the illusion of an address space much larger than the physical memory

Locality and Data Accesses video is wrong one notes when video is correct

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Cache Memory and Performance

ECE7995 Caching and Prefetching Techniques in Computer Systems. Lecture 8: Buffer Cache in Main Memory (I)

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

ECEC 355: Cache Design

Chapter 6 Objectives

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Course Administration

Cache memory. Lecture 4. Principles, structure, mapping

CISC 360. Cache Memories Exercises Dec 3, 2009

MIPS) ( MUX

UCB CS61C : Machine Structures

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

LECTURE 11. Memory Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Cache Memory Mapping Techniques. Continue to read pp

EE 4683/5683: COMPUTER ARCHITECTURE

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Page 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"

Caching Basics. Memory Hierarchies

COMP 3221: Microprocessors and Embedded Systems

EEC 483 Computer Organization

Memory Hierarchies &

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Review : Pipelining. Memory Hierarchy

Cache Memory - II. Some of the slides are adopted from David Patterson (UCB)

Memory Management! Goals of this Lecture!

CPE300: Digital System Architecture and Design

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Structure of Computer Systems

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

CMPSC 311- Introduction to Systems Programming Module: Caching

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Lctures 33: Cache Memory - I. Some of the slides are adopted from David Patterson (UCB)

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory. Objectives. Introduction. 6.2 Types of Memory

A Review on Cache Memory with Multiprocessor System

Virtual Memory: From Address Translation to Demand Paging

UCB CS61C : Machine Structures

Computer Architecture. Lecture 8: Virtual Memory

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Fundamentals of Computer Systems

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

ELEC-H-473 Microprocessor architecture Caches

www-inst.eecs.berkeley.edu/~cs61c/

CS61C : Machine Structures

CS61C : Machine Structures

Question?! Processor comparison!

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Memory hierarchy and cache

Review: Computer Organization

Chapter 8. Virtual Memory

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Computer Organization (Autonomous)

Operating Systems (1DT020 & 1TT802) Lecture 9 Memory Management : Demand paging & page replacement. Léon Mugwaneza

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

ECE 2300 Digital Logic & Computer Organization. More Caches

Memory Hierarchy: Caches, Virtual Memory

TDT 4260 lecture 3 spring semester 2015

Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS.

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Virtual Memory: From Address Translation to Demand Paging

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

CS61C : Machine Structures

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Memory. Memory Technologies

Memory Management. Dr. Yingwu Zhu

Levels in memory hierarchy

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Transcription:

Cache Policies Philipp Koehn 6 April 2018

Memory Tradeoff 1 Fastest memory is on same chip as CPU... but it is not very big (say, 32 KB in L1 cache) Slowest memory is DRAM on different chips... but can be very large (say, 256GB in compute server) Goal: illusion that large memory is fast Idea: use small memory as cache for large memory Note: in reality there are additional levels of cache (L1, L2, L3)

Simplified View 2 Processor Smaller memory mirrors some of the large memory content

3 cache organization

Previously: Direct Mapping 4 Each memory block is mapped to a specific slot in cache Use part of the address as index to cache 0010 0011 1101 1100 0001 0011 1010 1111 Tag Index Offset Since multiple memory blocks are mapped to same slot contention, newly loaded blocks discard old ones

Concerns 5 Is this the best we got? Some benefits from locality: neighboring memory blocks placed in different cache slots But: we may have to pre-empt useful cached blocks We do not even know which ones are still useful

Now: Associative Cache 6 Place block anywhere in cache Block tag now full block address in main memory Previously: 32-bit memory address gets mapped to 0010 0011 1101 1100 0001 0011 1010 1111 Tag Index Offset Now 0010 0011 1101 1100 0001 0011 1010 1111 Tag Offset Index

Cache Organization 7 Cache sizes block size: 256 bytes (8 bit address) cache size: 1MB (4096 slots) Tag Valid Data (24 bits) (1 bit) 256 bytes. 0 1 $d0f012 1 93 xx f4 xx 8d xx 19 xx... xx xx xx xx... 4095 Read memory value for address $d0f01234 cache miss load into cache data block: $d0f01200-$d0f012ff tag: $d0f012 placed somewhere (say, index 1)

Trade-Off 8 Direct mapping (slot determined from address) disadvantage: two useful blocks content for same slot many cache misses Associative (lookup table for slot) disadvantage: finding block in cache expensive slow, power-hungry Looking for a compromise

Set-Associative Cache 9 Mix of direct and associative mapping From direct mapping: use part of the address to determine a subset of cache 0010 0011 1101 11 00 0001 0011 1010 1111 Tag Index Offset Associative mapping: more than one slot for each indexed part of cache

Cache Organization 10 Cache sizes block size: 256 bytes (8 bit address) cache size: 1MB (1024 sets of 4 slots) Index Tag Valid Data (14 bits) (1 bit) 256 bytes 0 xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 1 xx xx xx xx xx xx xx xx............

Cache Read Control (4-Way Associate) 11 Tag Index Offset Decoder Tag Valid Data Tag Valid Data Tag Valid Data Tag Valid Data Tag Valid Data Tag Valid Data Tag Valid Data Tag Valid Data = = = = AND AND AND AND Hit Control OR Select Select CPU Main Memory Data Path

Caching Strategies 12 Read in blocks as needed If cache full, discard blocks based on randomly number of times accessed least recently used first in, fast out

13 first in, first out

First In, First Out (FIFO) 14 Consider order in which cache blocks loaded Oldest block gets discarded first Need to keep a record of when blocks were loaded

Timestamp 15 Each record requires additional timestamp Index Tag Valid Timestamp Data (14 bits) (1 bit) 256 bytes 0 xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 1 xx xx xx xx xx xx xx xx............... Store actual time? time can be easily set when slot filled but: finding oldest slot requires loop with min calculation

Maintain Order 16 Actual access time not needed, but ordering of cache For instance, for 4-way associative array 0 = newest block 3 = oldest block When new slot needed find slot with timestamp value 3 use slot for new memory block increase all timestamp counters by 1

Example 17 Initial Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 0 0 xx xx xx xx xx xx xx xx 0 xx xx xx xx xx xx xx xx 0 xx xx xx xx xx xx xx xx

Example 18 First block Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 3e12 0 11 4f 4e 53 ff 00 01... 0 10 xx xx xx xx xx xx xx xx 0 01 xx xx xx xx xx xx xx xx 0 00 xx xx xx xx xx xx xx xx All valid bits are 0 Each slot has unique order value

Example 19 Second block Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 3e12 1 01 4f 4e 53 ff 00 01... 0ff0 1 00 00 01 f0 01 02 63... 0 11 xx xx xx xx xx xx xx xx 0 10 xx xx xx xx xx xx xx xx Load data Set valid bit Increase order counters

Example 20 Third block Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 3e12 1 10 4f 4e 53 ff 00 01... 0ff0 1 01 00 01 f0 01 02 63... 6043 1 00 f0 f0 f0 34 12 60... 0 11 xx xx xx xx xx xx xx xx Load data Set valid bit Increase order counters

Example 21 Fourth block Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 3e12 1 11 4f 4e 53 ff 00 01... 0ff0 1 10 00 01 f0 01 02 63... 2043 1 01 f0 f0 f0 34 12 60... 37ab 1 00 4a 42 43 52 4a 4a... Load data Set valid bit Increase order counters

Example 22 Fifth block Index Tag Valid Order Data (14 bits) (1 bit) 256 bytes 0 0561 1 00 9a 8b 7d 3d 4a 44... 0ff0 1 11 00 01 f0 01 02 63... 2043 1 10 f0 f0 f0 34 12 60... 37ab 1 01 4a 42 43 52 4a 4a... Discard oldest block Load new data Increase order counters

23 least recently used

Least Recently Used (LRU) 24 Base decision on last-used time, not load time Keeps frequently used blocks longer in cache Also need to maintain order Update with every read (not just miss)

Example 25 Slot 0 Slot 1 Slot 2 Slot 3 Access Order Access Order Access Order Access Order 01 11 10 00 01 11 10 Hit 00 10 Hit 00 11 01 Hit 00 01 11 10 01 10 Miss 00 11 Miss: increase all counters Hit least recently used: Hit most recently used: increase all counters no change Hit others: increase some counters

Quite Complicated 26 First look up order of accessed block Compare each other block s order to that value Increasingly costly with higher associativity Note: this has to be done every time memory is accessed (not just during cache misses)

Aproximation: Bit Shifting 27 Keep an (n-1)-bit map for an n-way associative set Each time a block in a set is accessed shift all bits to the right set the highest bit of the accessed block Slot with value 0 is candidate for removal

Example 28 Slot 0 Slot 1 Slot 2 Slot 3 Access Order Access Order Access Order Access Order 010 000 001 100 001 Hit 100 000 010 000 010 Miss 100 001 000 Hit 101 010 000 000 Hit 110 001 000 Miss 100 011 000 000 There may be multiple blocks with order pattern 000 pick one randomly Maybe do not change, if most recently used block is used again