EE 352 Lab 5 Cache Me If You Can

Size: px
Start display at page:

Download "EE 352 Lab 5 Cache Me If You Can"

Transcription

1 EE 52 Lab 5 Cache Me If You Can Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply and compare their performance benefits in regard to cache behavior. You will explore the interaction of algorithms and cache architecture. (Make sure you have both versions of your matrix multiply code working before exploring the caching effects.) You will use the MARS Cache Simulator tool to perform your experiments. 2 What you will learn This lab is intended to give you insight and intuition for how to write cache-conscious code and to understand the benefits of certain cache architecture parameters. Background Information and Notes Blocked Matrix Multiply: A traditional NxN matrix multiply can be implemented with a triple nested for loop. However, when N is large, the number of data points accessed before significant reuse is proportional to N 2. This can lead to cache performance and, in turn, overall performance. The advantage of cache memory is predicated on our ability to fit the working data set into the cache. Thus, we can break the large matrix (e.g. 8x8) into smaller matrices (e.g. 2x2) and define the matrix multiply operation recursively (i.e. calculate the overall matrix product by calculating the product of smaller blocks of the matrix. The code for doing this is straightforward though it may require some examination to see exactly what values are being accessed. We have attempted to provide visualization below in addition to the code. // b = block size, N = matrix dimension for(i=0; i < N; i=i+b) for(j=0; j < N; j=j+b) for(k=0; k < N; k=k+b) for(ii=i; ii < i+b; ii++) for(jj=j; jj < j+b; jj++) for(kk=k; kk < k+b; kk++) C[ii][jj] = C[ii][jj] + A[ii][kk] * B[kk][jj]; Last Revised: 6/7/204

2 Accesses performed when outer i,j = 0,0 Black border shows C[ii][jj] values being calculated, Yellow = A matrix access, Blue = B Matrix Accesses Accesses performed when outer i,j = 0,block_size Black border shows C[ii][jj] values being calculated, Yellow = A matrix access, Blue = B Matrix Accesses Figure Matrix accesses made in the blocked version. Each timestep shown is a new value of k. Total number of access made to each matrix A & B = 2b 2 where b = block size. Notice now that in the case of a large matrix (e.g. N = 024) we would only be accessing small blocks of data in any one iteration of the inner loops. Presumably this data could fit in cache and we would see better performance. We will explore this later in the lab. MARS Cache Simulator: MARS includes a data cache simulator that will allow you to provide a cache configuration for your program and have the simulator determine the hit rate and other statistics. The tool is available by clicking Tools.. Cache Simulator. You will need to click the Connect to MIPS button to tell the cache simulator to attach to/monitor activity from the MARS code simulator. 2 Last Revised: 6/7/204

3 The configurable parameters for the cache include Placement Policy (a.k.a. mapping scheme), Block Replacement Policy (always leave at LRU), Set size (a.k.a. K-Ways), Number of blocks in cache and cache block size. The total cache size (in bytes) is simply calculated as: (# of Blocks) * (Cache block size in words) * (4 bytes per word). Note: There is an oddity that statistics are not kept for any cache with 256 blocks or more even though those options exist in the drop down box. Thus, we will have to work around this. Everytime you run your program (you can just click the reset MIPS mem. and register button in the simulator) you should also be sure to RESET the cache simulator by clicking the Reset button. 4 Procedure Run the following experiments on the non-blocked (original) version of the matrix multiply. Notes: Hit rates = Percentages (not absolute counts). Use Excel or other tool to make all plots (no hand plots). a. Cache misses can be categorized as compulsory, capacity, or conflict misses. Determine the number of compulsory misses. To do this set your block size at 8 words per block. Think about what mapping scheme should be used if you do not want any conflict misses. Then continue to increase the size of the cache (i.e. number of blocks) until you can correctly determine the number of compulsory misses. (Think about how you would know the misses counted are all compulsory and not capacity misses). Also determine what the smallest cache size was that only produced compulsory misses. b. Given the smallest cache size that only produced compulsory misses, decrease the block size to 4 words per block and double the number of blocks (to keep the cache size the same). What happens to the number of cache misses? Are these still all compulsory misses? c. Using a total cache size of half of what you found from the previous two parts, vary the cache block size and number of blocks to keep the cache size constant and determine what the optimal block size is to produce the best hit rate. Create a table and an Excel X-Y Last Revised: 6/7/204

4 plot with all possible block sizes and the corresponding hit rates. Make the block size the horizontal axis and plot it on a log scale. d. Use four words per block and start with the previous cache size. Repeatedly reduce the cache size (i.e. number of blocks) by half until the cache size is 28 Bytes. Determine the hit rate for each cache size. e. For a 256 Bytes cache with 4 words per block, plot the hit rate as a function of associativity (k=, 2, 4, 8, 6). What is the optimal associativity setting? Run the following experiments on the blocked version of the matrix multiply using a 256 byte cache with 4 words per block. f. Create a table listing k (where k is the number ways) = {,2,4,8,6} down the vertical axis and the matrix blocking size = 2,, 4 and 6 along the horizontal axis. Fill in the hit rate for each table entry and create an Excel X-Y plot with a curve for each k value. Remember -way set associative is a direct mapping and in this case, 6-ways is a fully associative mapping. g. Discuss (compare and contrast) the results to the unblocked version from part e. 5 Review. Based on your plot from part f, what is the optimal blocking size for the 2x2 matrix multiply? Why do you think this might be? 2. Based on your plot from part f, assuming that increasing k (i.e. the associativity) is more and more expensive, what value of k would you suggest for the cache design (i.e. at what point do we start to see diminishing hit rate benefits from increasing k) 6 Submission. Submit your blocked matrix multiple file on blackboard and a PDF or Word document with answers to the questions posed in Part 2 along with the Review questions. Embed your Excel graphs into this document and provide appropriate. 4 Last Revised: 6/7/204

5 7 Lab Report Name: Score: (Detach and turn this sheet along with any other requested work or printouts). Turn in the data in well formatted tables and Excel plots from each part of the procedure. a. Number of Compulsory Misses: Smallest cache size (in bytes) that only had compulsory misses: Bytes b. When you double the blocks but decrease the block size to 4 words, what happens to the # of misses: Are these still all compulsory misses? c. Table with hit rates for all possible block sizes for the given cache size from previous parts. d. Table listing hit rates for progressively smaller caches? e. Table listing hit rates for different associativity (k-values) and corresponding plot. What is the optimal associativity (k-value)? f. Table listing k (where k is the number ways) = {,2,4,8,6} down the vertical axis and the matrix blocking size = 2,, 4 and 6 along the horizontal axis and corresponding plot. g. A few sentences comparing the benefits or costs of a blocked matrix multiply vs. a non-blocked implementation (i.e. compare the results in e and f). 2. Turn in answers to the review questions. Last Revised: 6/7/204 5

6 8 Grading Rubric Name: Score: Student Name: Item Outcome Score Max. ness A: Number of compulsory misses A: cache size B: answer & C: 2 C: Plot/formatting D: hit rate E: data 2 E: plot E: Associativity F: 2 F: Plot Discussion (=correct & obvious demonstration of clear understanding/reasoning, 2 = generally correct with some lack of reasoning/understanding, = little demonstration of mastery, 0 = demonstration of mastery) Requirement G Review Problem Review Problem 2 / / / Late Deductions Open Ended Comments: SubTotal 25 Total Req. / Mult Score 4 (Excellent) (Good) 2 (Poor) (Deficient) (0) Failure Guideline Blocked Works Works usually Fails in several Does not work Not 6 Last Revised: 6/7/204

7 Matrix Multiply Req. a. correctly all the time but fails in -2 cases Solution cases but not always Major Errors implemented b. 0.5 Solution Solution c d e Answer f and g Review Problem Review Problem 2 TOTAL 0.5 Plot and ly Formatted 0.5 Plot and ly Formatted 0.5 Plot and ly Formatted 2, Wellformatted plot and insightful good good is correct but plot is not formatted as specified is correct but plot is not formatted as specified is correct but plot is not formatted as specified data, but formatting or is only adequate adequate adequate or formatting or answer or answer or Last Revised: 6/7/204 7

EE 352 Lab 4 Cache Me If You Can

EE 352 Lab 4 Cache Me If You Can EE 352 Lab 4 Cache Me If You Can 1 Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply

More information

EE 352 Lab 3 The Search Is On

EE 352 Lab 3 The Search Is On EE 352 Lab 3 The Search Is On Introduction In this lab you will write a program to find a pathway through a maze using a simple (brute-force) recursive (depth-first) search algorithm. 2 What you will learn

More information

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance What is EE 352 Unit 11 Definitions Address Mapping Performance memory is a small, fast memory used to hold of data that the processor will likely need to access in the near future sits between the processor

More information

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data

More information

EE 101 Lab 5 Fast Adders

EE 101 Lab 5 Fast Adders EE 0 Lab 5 Fast Adders Introduction In this lab you will compare the performance of a 6-bit ripple-carry adder (RCA) with a 6-bit carry-lookahead adder (CLA). The 6-bit CLA will be implemented hierarchically

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus

More information

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name: Computer Architecture and Engineering CS152 Quiz #2 March 3rd, 2009 Professor Krste Asanovic Name: Notes: This is a closed book, closed notes exam. 80 Minutes 10 Pages Not all questions are of equal difficulty,

More information

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!) 7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to

More information

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Cache Memory: Instruction Cache, HW/SW Interaction. Admin Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)

More information

CS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data

CS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data CS1100: Excel Lab 1 Filtering and Summarizing Data To complete this assignment you must submit an electronic copy to BlackBoard by the due date. Use the data in the starter file. In this lab you are asked

More information

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies

More information

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 33 Caches CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Cache Performance Metrics Miss rate fraction of memory references not found in cache (misses

More information

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 217 More Caches 1 Prelim 2 stats High: 9 (out of 9) Mean: 7.2, Median: 73 Announcements Prelab 5(C) due tomorrow 2 Example: Direct Mapped (DM) Cache

More information

EE 109 Lab 8a Conversion Experience

EE 109 Lab 8a Conversion Experience EE 109 Lab 8a Conversion Experience 1 Introduction In this lab you will write a small program to convert a string of digits representing a number in some other base (between 2 and 10) to decimal. The user

More information

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance: #1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce

More information

Inside out of your computer memories (III) Hung-Wei Tseng

Inside out of your computer memories (III) Hung-Wei Tseng Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache

More information

CS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data

CS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data CS1100: Excel Lab 1 Filtering and Summarizing Data To complete this assignment you must submit an electronic copy to Blackboard by the due date. Use the data in the starter file. In this lab you are asked

More information

Caches III. CSE 351 Spring Instructor: Ruth Anderson

Caches III. CSE 351 Spring Instructor: Ruth Anderson Caches III CSE 351 Spring 2017 Instructor: Ruth Anderson Teaching Assistants: Dylan Johnson Kevin Bi Linxing Preston Jiang Cody Ohlsen Yufang Sun Joshua Curtis Administrivia Office Hours Changes check

More information

Types of Cache Misses: The Three C s

Types of Cache Misses: The Three C s Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur

More information

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013 Loops or Lather, Rinse, Repeat CS4410: Spring 2013 Program Loops Reading: Appel Ch. 18 Loop = a computation repeatedly executed until a terminating condition is reached High-level loop constructs: While

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

Lecture 7 - Memory Hierarchy-II

Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Lec 12 How to improve cache performance (cont.)

Lec 12 How to improve cache performance (cont.) Lec 12 How to improve cache performance (cont.) Homework assignment Review: June.15, 9:30am, 2000word. Memory home: June 8, 9:30am June 22: Q&A ComputerArchitecture_CachePerf. 2/34 1.2 How to Improve Cache

More information

: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016

: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016 ETH login ID: (Please print in capital letters) Full name: 263-2300: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016 Instructions Make sure that

More information

CS101 Homework 4: Social Network

CS101 Homework 4: Social Network CS101 Homework 4: Social Network Prof Tejada Program and report due: 11:59pm Wednesday, March 13 Design document due: 11:59pm Wednesday, March 6 1 Introduction For this assignment create user accounts

More information

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub. p4: Cache Simulator 1. Logistics 1. This project must be done individually. It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such

More information

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Today Cache memory organization and operation Performance impact of caches

Today Cache memory organization and operation Performance impact of caches Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality

More information

Lecture 10. Daily Puzzle

Lecture 10. Daily Puzzle Lecture 10 Daily Puzzle Imagine there is a ditch, 10 feet wide, which is far too wide to jump. Using only eight narrow planks, each no more than 9 feet long, construct a bridge across the ditch. Daily

More information

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Roadmap. Java: Assembly language: OS: Machine code: Computer system: Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp %rbp 0111010000011000

More information

write-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF

write-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF write-through v. write-back option 1: write-through 1 write 10 to 0xABCD CPU Cache ABCD: FF RAM 11CD: 42 ABCD: FF 1 2 write-through v. write-back option 1: write-through write-through v. write-back option

More information

CSC D70: Compiler Optimization Memory Optimizations

CSC D70: Compiler Optimization Memory Optimizations CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2 ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just

More information

Cache Performance II 1

Cache Performance II 1 Cache Performance II 1 cache operation (associative) 111001 index offset valid tag valid tag data data 1 10 1 00 00 11 AA BB tag 1 11 1 01 B4 B5 33 44 = data (B5) AND = AND OR is hit? (1) 2 cache operation

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

Last class. Caches. Direct mapped

Last class. Caches. Direct mapped Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

EE 660: Computer Architecture Advanced Caches

EE 660: Computer Architecture Advanced Caches EE 660: Computer Architecture Advanced Caches Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David Wentzlaff Agenda Review Three C s Basic Cache

More information

Carnegie Mellon. Cache Lab. Recitation 7: Oct 11 th, 2016

Carnegie Mellon. Cache Lab. Recitation 7: Oct 11 th, 2016 1 Cache Lab Recitation 7: Oct 11 th, 2016 2 Outline Memory organization Caching Different types of locality Cache organization Cache lab Part (a) Building Cache Simulator Part (b) Efficient Matrix Transpose

More information

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM 1 Instructions Labs may be done in groups of 2 or 3 (i.e., not alone). You may use any programming language you wish but MATLAB is highly suggested.

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Direct mapped cache Pretty simple to

More information

MS Office for Engineers

MS Office for Engineers MS Office for Engineers Lesson 4 Excel 2 Pre-reqs/Technical Skills Basic knowledge of Excel Completion of Excel 1 tutorial Basic computer use Expectations Read lesson material Implement steps in software

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

high-speed-high-capacity memory

high-speed-high-capacity memory Sanjay Rajopadhye Colorado State University n Transparently provide the illusion of a high-speed-high-capacity memory n Built out of caches: small memory devices that exploit the principle of locality

More information

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds

More information

The course that gives CMU its Zip! Memory System Performance. March 22, 2001

The course that gives CMU its Zip! Memory System Performance. March 22, 2001 15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache

More information

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab 1 due today Reading: Chapter 5.1 5.3 2 1 Overview How to

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Cache memories are small, fast SRAM based memories managed automatically in hardware.

Cache memories are small, fast SRAM based memories managed automatically in hardware. Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui Caches III CSE 410 Winter 2017 Instructor: Justin Hsia Teaching Assistants: Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui Please stop charging your phone in public ports "Just by plugging

More information

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators

More information

Caches III. CSE 351 Winter Instructor: Mark Wyse

Caches III. CSE 351 Winter Instructor: Mark Wyse Caches III CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan https://what-if.xkcd.com/111/ Administrative Midterm

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name: Computer Architecture and Engineering CS152 Quiz #2 March 3rd, 2008 Professor Krste Asanovic Name: Notes: This is a closed book, closed notes exam. 80 Minutes 10 Pages Not all questions are of equal difficulty,

More information

Memory Hierarchy. Announcement. Computer system model. Reference

Memory Hierarchy. Announcement. Computer system model. Reference Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at

More information

Cache Memories October 8, 2007

Cache Memories October 8, 2007 15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache

More information

Submission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich

Submission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich 263-2300-00: How To Write Fast Numerical Code Assignment 4: 120 points Due Date: Th, April 13th, 17:00 http://www.inf.ethz.ch/personal/markusp/teaching/263-2300-eth-spring17/course.html Questions: fastcode@lists.inf.ethz.ch

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further

More information

GRADE CENTRE BEST PRACTICE FOR A4L

GRADE CENTRE BEST PRACTICE FOR A4L GRADE CENTRE BEST PRACTICE FOR A4L Overview A large number of reports use information from the Grade Centre to draw correlations between activity and student success (see appendix). This document serves

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:

More information

Computer Architecture, EIT090 exam

Computer Architecture, EIT090 exam Department of Information Technology Lund University Computer Architecture, EIT090 exam 15-12-2004 I. Problem 1 (15 points) Briefly (1-2 sentences) describe the following items/concepts concerning computer

More information

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front.

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front. ETH login ID: (Please print in capital letters) Full name: 63-300: How to Write Fast Numerical Code ETH Computer Science, Spring 015 Midterm Exam Wednesday, April 15, 015 Instructions Make sure that your

More information

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture

More information

Rubrics. Creating a Rubric

Rubrics. Creating a Rubric Rubrics A rubric is a set of specific evaluation criteria used to assess an assignment. Instructors use rubrics to carefully outline their assignment requirements and expectations for students. Students

More information

CMSC 611: Advanced Computer Architecture. Cache and Memory

CMSC 611: Advanced Computer Architecture. Cache and Memory CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Lesson 21: Solution Sets to Inequalities with Two Variables

Lesson 21: Solution Sets to Inequalities with Two Variables Student Outcomes Students recognize and identify solutions to two variable inequalities. They represent the solution set graphically. They create two variable inequalities to represent a situation. Students

More information

Cache Lab Implementation and Blocking

Cache Lab Implementation and Blocking Cache Lab Implementation and Blocking Lou Clark February 24 th, 2014 1 Welcome to the World of Pointers! 2 Class Schedule Cache Lab Due Thursday. Start soon if you haven t yet! Exam Soon! Start doing practice

More information

CSE141 Problem Set #4 Solutions

CSE141 Problem Set #4 Solutions CSE141 Problem Set #4 Solutions March 5, 2002 1 Simple Caches For this first problem, we have a 32 Byte cache with a line length of 8 bytes. This means that we have a total of 4 cache blocks (cache lines)

More information

Lecture 2: Memory Systems

Lecture 2: Memory Systems Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer

More information

Graphing with Microsoft Excel

Graphing with Microsoft Excel Graphing with Microsoft Excel As an AP Physics 1 student, you must be prepared to interpret and construct relationships found in physical laws and experimental data. This exercise is meant to familiarize

More information

CS516 Programming Languages and Compilers II

CS516 Programming Languages and Compilers II CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Mar 12 Parallelism and Shared Memory Hierarchy I Rutgers University Review: Classical Three-pass Compiler Front End IR Middle End IR

More information

GPU programming: Code optimization part 1. Sylvain Collange Inria Rennes Bretagne Atlantique

GPU programming: Code optimization part 1. Sylvain Collange Inria Rennes Bretagne Atlantique GPU programming: Code optimization part 1 Sylvain Collange Inria Rennes Bretagne Atlantique sylvain.collange@inria.fr Outline Analytical performance modeling Optimizing host-device data transfers Optimizing

More information

CS 135, Fall 2010 Project 4: Code Optimization Assigned: November 30th, 2010 Due: December 12,, 2010, 12noon

CS 135, Fall 2010 Project 4: Code Optimization Assigned: November 30th, 2010 Due: December 12,, 2010, 12noon CS 135, Fall 2010 Project 4: Code Optimization Assigned: November 30th, 2010 Due: December 12,, 2010, 12noon 1 Introduction This assignment deals with optimizing memory intensive code. Image processing

More information

CSE 240A Midterm Exam

CSE 240A Midterm Exam Student ID Page 1 of 7 2011 Fall Professor Steven Swanson CSE 240A Midterm Exam Please write your name at the top of each page This is a close book, closed notes exam. No outside material may be used.

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

CS/ECE 250 Computer Architecture

CS/ECE 250 Computer Architecture Computer Architecture Caches and Memory Hierarchies Benjamin Lee Duke University Some slides derived from work by Amir Roth (Penn), Alvin Lebeck (Duke), Dan Sorin (Duke) 2013 Alvin R. Lebeck from Roth

More information

Iterative Compilation with Kernel Exploration

Iterative Compilation with Kernel Exploration Iterative Compilation with Kernel Exploration Denis Barthou 1 Sébastien Donadio 12 Alexandre Duchateau 1 William Jalby 1 Eric Courtois 3 1 Université de Versailles, France 2 Bull SA Company, France 3 CAPS

More information