Caches III. CSE 351 Autumn Instructor: Justin Hsia

Similar documents
Administrivia. Caches III. Making memory accesses fast! Associativity. Cache Organization (3) Example Placement

Caches III CSE 351 Spring

Caches III. CSE 351 Autumn Instructor: Justin Hsia

Caches III. CSE 351 Winter Instructor: Mark Wyse

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

Caches II. CSE 351 Autumn Instructor: Justin Hsia

Caches III. CSE 351 Spring Instructor: Ruth Anderson

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

Caches II. CSE 351 Spring Instructor: Ruth Anderson

Caches II. CSE 351 Autumn Instructor: Justin Hsia

Memory, Data, & Addressing I

Memory Allocation II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Integers II. CSE 351 Autumn Instructor: Justin Hsia

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

Memory Allocation I. CSE 351 Autumn Instructor: Justin Hsia

The Stack & Procedures

Virtual Memory Wrap Up

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

CS 240 Stage 3 Abstractions for Practical Systems

x86 64 Programming I CSE 351 Autumn 2017 Instructor: Justin Hsia

General Cache Mechanics. Memory Hierarchy: Cache. Cache Hit. Cache Miss CPU. Cache. Memory CPU CPU. Cache. Cache. Memory. Memory

Caches Spring 2016 May 4: Caches 1

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Arrays. CSE 351 Autumn Instructor: Justin Hsia

CSE141 Problem Set #4 Solutions

Cache Wrap Up, System Control Flow

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12

CS 61C: Great Ideas in Computer Architecture. Multilevel Caches, Cache Questions

CS61C : Machine Structures

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Direct-Mapped and Set Associative Caches. Instructor: Steven Ho

CS61C : Machine Structures

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Logical Diagram of a Set-associative Cache Accessing a Cache

exercise 4 byte blocks, 4 sets index valid tag value 00

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

13-1 Memory and Caches

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Caches II CSE 351 Spring #include <yoda.h> int is_try(int do_flag) { return!(do_flag!do_flag); }

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

CS3350B Computer Architecture

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

LECTURE 11. Memory Hierarchy

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.

Types of Cache Misses. The Hardware/So<ware Interface CSE351 Winter Cache Read. General Cache OrganizaJon (S, E, B) Memory and Caches II

The Hardware/So<ware Interface CSE351 Winter 2013

And in Review! ! Locality of reference is a Big Idea! 3. Load Word from 0x !

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

ECE331: Hardware Organization and Design

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Introduction. Memory Hierarchy

Course Administration

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Processes. CSE 351 Autumn Instructor: Justin Hsia

EEC 483 Computer Organization

associativity terminology

CMPSC 311- Introduction to Systems Programming Module: Caching

Lecture 17: Memory Hierarchy: Cache Design

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Com S 321 Problem Set 3

Buffer Overflows. CSE 351 Autumn Instructor: Justin Hsia

COSC 3406: COMPUTER ORGANIZATION

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Agenda. Recap: Components of a Computer. Agenda. Recap: Cache Performance and Average Memory Access Time (AMAT) Recap: Typical Memory Hierarchy

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

Review : Pipelining. Memory Hierarchy

Computer Architecture Spring 2016

How to Write Fast Numerical Code Spring 2011 Lecture 7. Instructor: Markus Püschel TA: Georg Ofenbeck

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Levels in memory hierarchy

6 th Lecture :: The Cache - Part Three

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Structs & Alignment. CSE 351 Autumn Instructor: Justin Hsia

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

ECE331: Hardware Organization and Design

EE 4683/5683: COMPUTER ARCHITECTURE

Page 1. Memory Hierarchies (Part 2)

CMPSC 311- Introduction to Systems Programming Module: Caching

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CSC258: Computer Organization. Memory Systems

6.004 Tutorial Problems L14 Cache Implementation

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Transcription:

Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan https://what if.xkcd.com/111/

Administrivia Midterm regrade requests due end of tonight Lab 3 due Friday HW 4 is released, due next Friday (11/17) No lecture on Friday Veteran s Day! 2

Making memory accesses fast! Cache basics Principle of locality Memory hierarchies Cache organization Direct mapped (sets; index + tag) Associativity (ways) Replacement policy Handling writes Program optimizations that consider caches 3

Associativity What if we could store data in any place in the cache? More complicated hardware = more power consumed, slower So we combine the two ideas: Each address maps to exactly one set Each set can store block in more than one way 0 1 2 3 4 5 6 7 1-way: 8 sets, 1 block each direct mapped 2-way: 4 sets, 2 blocks each Set 0 1 2 3 Set 0 1 4-way: 2 sets, 4 blocks each Set 0 8-way: 1 set, 8 blocks fully associative 4

Cache Organization (3) Note: The textbook uses b for offset bits Associativity ( ): # of ways for each set Such a cache is called an way set associative cache We now index into cache sets, of which there are Use lowest = bits of block address Direct mapped: = 1, so = log / as we saw previously Fully associative: = /, so = 0 bits Used for tag comparison Selects the set Selects the byte from block Tag () Index () Offset () Decreasing associativity Direct mapped (only one way) Increasing associativity Fully associative (only one set) 5

Example Placement Where would data from address 0x1833 be placed? Binary: 0b 0001 1000 0011 0011 block size: 16 B capacity: 8 blocks address: 16 bits = = log // = log bit address: Tag () Index () Offset () Set Tag 0 1 2 3 4 5 6 7 =? Direct mapped Data =? =? 2 way set associative 4 way set associative Set Tag Data Set Tag Data 0 1 2 0 1 3 6

Block Replacement Any empty block in the correct set may be used to store block If there are no empty blocks, which one should we replace? No choice for direct mapped caches Caches typically use something close to least recently used (LRU) (hardware usually implements not most recently used ) Set Tag 0 1 2 3 4 5 6 7 Direct mapped Data 2 way set associative Set Tag Data 0 1 2 3 4 way set associative Set Tag Data 0 1 7

Peer Instruction Question We have a cache of size 2 KiB with block size of 128 B. If our cache has 2 sets, what is its associativity? Vote at http://pollev.com/justinh A. 2 B. 4 C. 8 D. 16 E. We re lost If addresses are 16 bits wide, how wide is the Tag field? 8

General Cache Organization (,, ) = blocks/lines per set set line (block plus management bits) = # sets = 2 V Tag 0 1 2 K 1 Cache size: data bytes (doesn t include V or Tag) valid bit = bytes per block 9

Notation Review We just introduced a lot of new variable names! Please be mindful of block size notation when you look at past exam questions or are watching videos Variable This Quarter Formulas Block size Cache size Associativity Number of Sets Address space Address width Tag field width Index field width Offset field width in book in book 2 log 2 log 2 log log // 10

Cache Read = blocks/lines per set 1) Locate set 2) Check if any line in set is valid and has matching tag: hit 3) Locate data starting at offset Address of byte in memory: bits bits bits = # sets = 2 tag set index block offset data begins at this offset v tag 0 1 2 K 1 valid bit = bytes per block 11

Example: Direct Mapped Cache ( = 1) Direct mapped: One line per set Block Size = 8 B Address of int: bits 0 01 100 = 2 sets find set 12

Example: Direct Mapped Cache ( = 1) Direct mapped: One line per set Block Size = 8 B valid? + match?: yes = hit Address of int: bits 0 01 100 block offset 13

Example: Direct Mapped Cache ( = 1) Direct mapped: One line per set Block Size = 8 B valid? + match?: yes = hit Address of int: bits 0 01 100 block offset int (4 B) is here This is why we want alignment! No match? Then old line gets evicted and replaced 14

Example: Set Associative Cache ( = 2) 2 way: Two lines per set Block Size = 8 B Address of short int: bits 0 01 100 find set 15

Example: Set Associative Cache ( = 2) 2 way: Two lines per set Block Size = 8 B compare both Address of short int: bits 0 01 100 valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 block offset 16

Example: Set Associative Cache ( = 2) 2 way: Two lines per set Block Size = 8 B compare both Address of short int: bits 0 01 100 valid? + match: yes = hit block offset short int (2 B) is here No match? One line in set is selected for eviction and replacement Replacement policies: random, least recently used (LRU), 17

Types of Cache Misses: 3 C s! Compulsory (cold) miss Occurs on first access to a block Conflict miss Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot e.g. referencing blocks 0, 8, 0, 8,... could miss every time Direct mapped caches have more conflict misses than way set associative (where > 1) Capacity miss Occurs when the set of active cache blocks (the working set) is larger than the cache (just won t fit, even if cache was fullyassociative) Note: Fully associative only has Compulsory and Capacity misses 18

What about writes? Multiple copies of data exist: L1, L2, possibly L3, main memory What to do on a write hit? Write through: write immediately to next level Write back: defer write to next level until line is evicted (replaced) Must track which cache lines have been modified ( dirty bit ) What to do on a write miss? Write allocate: ( fetch on write ) load into cache, update line in cache Good if more writes or reads to the location follow No write allocate: ( write around ) just write immediately to memory Typical caches: Write back + Write allocate, usually Write through + No write allocate, occasionally 19

Write back, write allocate example Contents of memory stored at address G Cache G 0xBEEF 0 dirty bit tag (there is only one set in this tiny cache, so the tag is the entire block address!) Memory F G 0xCAFE 0xBEEF In this example we are sort of ignoring block offsets. Here a block holds 2 bytes (16 bits, 4 hex digits). Normally a block would be much bigger and thus there would be multiple items per block. While only one item in that block would be written at a time, the entire line would be brought into cache. 20

Write back, write allocate example mov 0xFACE, F Cache G 0xBEEF 0 dirty bit Memory F G 0xCAFE 0xBEEF 21

Write back, write allocate example mov 0xFACE, F Cache UF 0xCAFE 0xBEEF 0 dirty bit Step 1: Bring F into cache Memory F G 0xCAFE 0xBEEF 22

Write back, write allocate example mov 0xFACE, F Cache UF 0xFACE 0xCAFE 0xBEEF 01 dirty bit Memory F 0xCAFE Step 2: Write 0xFACE to cache only and set dirty bit G 0xBEEF 23

Write back, write allocate example mov 0xFACE, F mov 0xFEED, F Cache UF 0xFACE 0xCAFE 0xBEEF 01 dirty bit Memory F 0xCAFE Write hit! Write 0xFEED to cache only G 0xBEEF 24

Write back, write allocate example mov 0xFACE, F mov 0xFEED, F mov G, %rax Cache UF 0xFEED 0xCAFE 0xBEEF 01 dirty bit Memory F G 0xCAFE 0xBEEF 25

Write back, write allocate example mov 0xFACE, F mov 0xFEED, F mov G, %rax Cache G 0xBEEF 0 dirty bit Memory F 0xFEED 1. Write F back to memory since it is dirty 2. Bring G into the cache so we can copy it into %rax G 0xBEEF 26

Peer Instruction Question Which of the following cache statements is FALSE? Vote at http://pollev.com/justinh A. We can reduce compulsory misses by decreasing our block size B. We can reduce conflict misses by increasing associativity C. A write back cache will save time for code with good temporal locality on writes D. A write through cache will always match data with the memory hierarchy level below it E. We re lost 27

Example Cache Parameters Problem 1 MiB address space, 125 cycles to go to memory. Fill in the following table: Cache Size 4 KiB Block Size 16 B Associativity 4 way Hit Time 3 cycles Miss Rate 20% Write Policy Write through Replacement Policy LRU Tag Bits Index Bits Offset Bits AMAT 10 6 4 AMAT = 3 + 0.2 * 125 = 28 28

Example Code Analysis Problem Assuming the cache starts cold (all blocks invalid), calculate the miss rate for the following loop: = 20 bits, = 4 KiB, = 16 B, = 4 #define AR_SIZE 2048 int int_ar[ar_size], sum=0; // &int_ar=0x80000 for (int i=0; i<ar_size; i++) sum += int_ar[i]; for (int j=ar_size-1; j>=0; j--) sum += int_ar[i]; 29