Recitation 7 Caches and Blocking. 9 October 2017

Similar documents
Recitation Caches and Blocking. 4 March 2019

Cache Lab Implementation and Blocking

Carnegie Mellon. Cache Lab. Recitation 7: Oct 11 th, 2016

Recitation 2/18/2012

Recitation: C Review. TA s 20 Feb 2017

CSCI2467: Systems Programming Concepts

CSE351 Spring 2018, Final Exam June 6, 2018

13-1 Memory and Caches

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.

Cache and Virtual Memory Simulations

Your submitted proj3.c must compile and run on linprog as in the following example:

CSE351 Winter 2016, Final Examination March 16, 2016

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.

Recitation: Cache Lab & C

Your submitted proj3.c must compile and run on linprog as in the following example:

Recitation: Attack Lab

CSCI 356 Spring 2018 Project 5: Understanding Cache Memories Due: Thursday, April 5, 11:59PM

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Bernhard Boser & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Krste Asanović & Randy Katz

Recitation 14: Proxy Lab Part 2

3) Cache, money. Dollar bills, y all. (24 min, 15 pts)

Today s Learning Objectives

- b: size of block in bytes. - s: number of sets in the cache (or total number of sets between split caches). - n: associativity of cache.

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

MPATE-GE 2618: C Programming for Music Technology. Syllabus

UW CSE 351, Winter 2013 Final Exam

Agenda. Components of a Computer. Computer Memory Type Name Addr Value. Pointer Type. Pointers. CS 61C: Great Ideas in Computer Architecture

15-740/ Computer Architecture

CSE 333 Lecture 2 - arrays, memory, pointers

Fall 2018 Discussion 2: September 3, 2018

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 4: Caching (2) Welcome!

CS 61C: Great Ideas in Computer Architecture C Pointers. Instructors: Vladimir Stojanovic & Nicholas Weaver

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 5: Caching (2) Welcome!

ENCM 335 Fall 2018 Lab 6 for the Week of October 22 Complete Instructions

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

ECE 353 Lab 1: Cache Simulation

CS 61C: Great Ideas in Computer Architecture. Multilevel Caches, Cache Questions

CSE 361 Fall 2017 Lab Assignment L2: Defusing a Binary Bomb Assigned: Wednesday Sept. 20 Due: Wednesday Oct. 04 at 11:59 pm

EECE.2160: ECE Application Programming Fall 2017 Exam 3 December 16, 2017

Recitation #11 Malloc Lab. November 7th, 2017

Key Point. What are Cache lines

Announcements. Lecture 05a Header Classes. Midterm Format. Midterm Questions. More Midterm Stuff 9/19/17. Memory Management Strategy #0 (Review)

Distributed: Monday, Nov. 30, Due: Monday, Dec. 7 at midnight

Way-associative cache

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

Computer Organization & Systems Exam I Example Questions

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

3. Address Translation (25 pts)

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

CS261: HOMEWORK 2 Due 04/13/2012, at 2pm

15-213, Fall 2007 Midterm Exam

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

CS61c Summer 2014 Midterm Exam

University of California, Berkeley College of Engineering

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Cache Memory and Performance

ECE2049 E17 Lecture 2: Data Representations & C Programming Basics

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

COE318 Lecture Notes Week 4 (Sept 26, 2011)

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100

CS24 Week 4 Lecture 1

CS113: Lecture 7. Topics: The C Preprocessor. I/O, Streams, Files

FOR Loop. FOR Loop has three parts:initialization,condition,increment. Syntax. for(initialization;condition;increment){ body;

Standard C Library Functions

CS61C : Machine Structures

Floating-point lab deadline moved until Wednesday Today: characters, strings, scanf Characters, strings, scanf questions clicker questions

COP Programming Assignment #7

MCS-284, Fall 2018 Cache Lab: Understanding Cache Memories Assigned: Friday, 11/30 Due: Friday, 12/14, by midnight (via Moodle)

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CPSC 213, Summer 2017, Term 2 Midterm Exam Solution Date: July 27, 2017; Instructor: Anthony Estey

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Department of Electrical Engineering and Computer Sciences Spring 2013 Instructor: Dr. Dan Garcia CS61C Midterm

Part I: Pen & Paper Exercises, Cache

ICS(II), Fall 2017 Cache Lab: Understanding Cache Memories Assigned: Thursday, October 26, 2017 Due: Sunday, November 19, 2017, 11:59PM

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

11 'e' 'x' 'e' 'm' 'p' 'l' 'i' 'f' 'i' 'e' 'd' bool equal(const unsigned char pstr[], const char *cstr) {

l Determine if a number is odd or even l Determine if a number/character is in a range - 1 to 10 (inclusive) - between a and z (inclusive)

CS 240 Stage 3 Abstractions for Practical Systems

18-447: Computer Architecture Lecture 16: Virtual Memory

CIS 2107 Computer Systems and Low-Level Programming Spring 2012 Final

Problem 1 (logic design)

CS3330 Exam 3 Spring 2015

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS 351 Sample Final Exam, Spring 2015

Laboratory 2: Programming Basics and Variables. Lecture notes: 1. A quick review of hello_comment.c 2. Some useful information

Monday, October 30, 6pm - 8pm ME 3380 Thursday, November 2, 6pm - 8pm ME 4499

CS 3330 Exam 2 Fall 2017 Computing ID:

Caches Design of Parallel and High-Performance Computing Recitation Session

CS152 Computer Architecture and Engineering

C for Java Programmers 1. Last Week. Overview of the differences between C and Java. The C language (keywords, types, functies, etc.

Hacettepe University Department Of Computer Engineering BBM 103 Introduction to Programming Experiment 5

Lab # 4. Files & Queues in C

These problems are provided to you as a guide for practice. The questions cover important concepts covered in class.

CS 3330 Exam 3 Fall 2017 Computing ID:

Memory Management! Goals of this Lecture!

High Performance Computing

Transcription:

15-213 Recitation 7 Caches and Blocking 9 October 2017

Agenda Reminders Revisiting Cache Lab Caching Review Blocking to reduce cache misses Cache alignment

Reminders Cache Lab is due Thursday! Exam1 is just a week away! Start doing practice problems. Come to the review session.

Reminders: Cache Lab Two parts Write a cache simulator Hopefully you've started this part by now Optimize some code to minimize cache misses Programming style will be graded starting now Worth about a letter grade on this assignment See the appendix. But that s incomplete, so also see the style guide on the course website. Other details are in the writeup.

fscanf() returns -1 if the data does not match the format string or there is no more input Use it to parse the trace files Cache Lab: Parsing Input with fscanf fscanf() is exactly like scanf() except that you specify a stream to use (i.e. an open file) instead of always reading from standard input The parameters to fscanf are 1: a stream pointer of type FILE*, e.g. from fopen() 2: a format string specifying how to parse the input 3+: pointers to each of the variables that will store the parsed data

fscanf() Example FILE *pfile; /* pointer to FILE object */ pfile = fopen( trace.txt, r ); /* open trace file for reading */ /* verify that pfile is non-null! */ char access_type; unsigned long address; int size; /* line format is S 2f,1 or L 7d0,3 */ /* so we need to read a character, a hex number, and a decimal number */ /* put those in the format string along with the fixed formatting */ while (fscanf(pfile, %c %lx,%d, &access_type, &address, &size) > 0) { /* do stuff */ } fclose(pfile); /* always close file when done */

Cache Lab: Cache Simulator Hints You are only counting hits, misses, and evictions Use LRU (Least Recently Used) replacement policy Structs are a great way to bundle up the different parts of a cache line (valid bit, tag, LRU counter, etc.) A cache is like a 2D array of cache lines One dimension represents set associativity E, the other the number of sets S: struct cache_line cache[s][e]; Your simulator needs to handle different values of S, E, and b (block size) given at run time Allocate your space dynamically

Class Question / Discussions We ll work through a series of questions Write down your answer for each question You can discuss with your classmates

What Type of Locality? The following function exhibits which type of locality? Consider only array accesses. void who(int *arr, int size) { for (int i = 0; i < size-1; ++i) arr[i] = arr[i+1]; } A. Spatial B. Temporal C. Both A and B D. Neither A nor B 9

What Type of Locality? The following function exhibits which type of locality? Consider only array accesses. void coo(int *arr, int size) { for (int i = size-2; i >= 0; --i) arr[i] = arr[i+1]; } A. Spatial B. Temporal C. Both A and B D. Neither A nor B 10

Calculating Cache Parameters Given the following address partition, how many int values will fit in a single data block? Address: 31 18 bits 10 bits 4 bits Tag 11 Set index 0 Block offset A. 0 B. 1 C. 2 D. 4 # of int in block E. Unknown: We need more info

Interlude: terminology A direct-mapped cache only contains one line per set. This means E = 2 e = 1. Memory 000 001 010 011 100 101 110 111 Cache (bytes) B0 B1 B0 B1 Cache (lines) Cache (sets) L0 S0

Interlude: terminology A fully associative cache has 1 set, and many lines for that one set. This means S = 2 s = 1. Memory 000 001 010 011 100 101 110 111 Cache (bytes) B0 B1 B0 B1 Cache (lines) Cache (sets) L0 S0

Direct-Mapped Cache Example Assuming a 32-bit address (i.e. m=32), how many bits are used for tag (t), set index (s), and block offset (b). 8 bytes per data block Set 0: Valid Tag Cache block E = 1 lines per set Set 1: Valid Tag Cache block Set 2: Set 3: Valid Tag Cache block Valid Tag Cache block t bits s bits b bits 31 0 Tag Set index Block offset 14 t s b A. 1 2 3 B. 27 2 3 C. 25 4 3 D. 1 4 8 E. 20 4 8

Which Set Is it? Which set may the address 0xFA1C be located in? 8 bytes per data block Set 0: Valid Tag Cache block E = 1 lines per set Set 1: Set 2: Set 3: Valid Tag Cache block Valid Tag Cache block Valid Tag Cache block 27 bits 2 bits 3 bits 31 0 Tag Set index Block offset 15 A. 0 B. 1 C. 2 D. 3 Set # for 0xFA1C E. More than one of the above

Cache Block Range What range of addresses will be in the same block as address 0xFA1C? 8 bytes per data block Set 0: Set 1: Set 2: Set 3: Valid Valid Tag Tag Cache block Cache block Valid Tag Cache block Valid Tag Cache block 27 bits 2 bits 3 bits 31 0 Tag Set index Block offset 16 Addr. Range A. 0xFA1C B. 0xFA1C 0xFA23 C. 0xFA1C 0xFA1F D. 0xFA18 0xFA1F E. It depends on the access size (byte, word, etc)

Cache Misses If N = 16, how many bytes does the loop access of A? int foo(int* a, int N) { int i, sum = 0; for(i = 0; i < N; i++) sum += a[i]; return sum; } Accessed Bytes A 4 B 16 C 64 D 256

Cache Misses If there is a 48B cache with 8 bytes per block and 3 cache lines per set, how many misses if foo is called twice? N still equals 16 int foo(int* a, int N) { int i, sum = 0; for(i = 0; i < N; i++) sum += a[i]; return sum; } Misses A 0 B 8 C 12 D 14 E 16

Cache-Friendly Code Keep memory accesses bunched together In both time and space (address) The working set at any time should be smaller than the cache Avoid access patterns that cause conflict misses Align accesses to use fewer cache sets (often means dividing data structures into pieces whose sizes are powers of 2)

Blocking Example We have a 2D array int[4][4] A; Cache is fully associative and can hold two lines Each line can hold two int values Discuss the following questions with your neighbor: What is the best miss rate for traversing A once? What order does of traversal did you use? What other traversal orders can achieve this miss rate?

Class Discussion What did the optimal transversal orders have in common? How does the pattern generalize to int[8][8] A and a cache that holds 4 lines each of 4 int s?

Cache-alignment Suppose you have arrays int[8] A, B, temp; A[0], B[0] and temp[0] all correspond to byte 0 of set 0 on the cache. We say that all three arrays are cache-aligned. For example, suppose we use a direct-mapped cache. If we request first A[0] then B[0], the cache will evict the line containing A[0].

Very Hard Cache Problem We will use a direct-mapped cache with 2 sets, which each can hold up to 4 int s. How can we copy A into B, shifted over by 1 position? The most efficient way? (Use temp!) A 0 1 2 3 4 5 6 7 B 0 1 2 3 4 5 6 7

temp 0 1 2 3 4 5 6 7 A 0 1 2 3 4 5 6 7 B 0 1 2 3 4 5 6 7 Number of misses:

temp 0 1 2 3 4 5 6 7 A 0 1 2 3 4 5 6 7 B 0 1 2 3 4 5 6 7 Number of misses: Could ve been 16 misses otherwise! We would save even more if the block size were larger, or if temp were already cached

If You Get Stuck Please read the writeup Read it again after doing ~25% of the lab CS:APP Chapter 6 View lecture notes and course FAQ at http://www.cs.cmu.edu/~213 Office hours Sunday through Thursday 5:00-9:00pm in WeH 5207 Post a private question on Piazza man malloc, man gdb, gdb's help command http://csapp.cs.cmu.edu/public/waside/waside-blocking.pdf

Appendix: C Programming Style Properly document your code Header comments, overall operation of large blocks, any tricky bits Write robust code check error and failure conditions Write modular code Use interfaces for data structures, e.g. create/insert/remove/free functions for a linked list No magic numbers use #define Formatting 80 characters per line Consistent braces and whitespace No memory or file descriptor leaks