EE 352 Lab 5 Cache Me If You Can
|
|
- Samson Davidson
- 6 years ago
- Views:
Transcription
1 EE 52 Lab 5 Cache Me If You Can Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply and compare their performance benefits in regard to cache behavior. You will explore the interaction of algorithms and cache architecture. (Make sure you have both versions of your matrix multiply code working before exploring the caching effects.) You will use the MARS Cache Simulator tool to perform your experiments. 2 What you will learn This lab is intended to give you insight and intuition for how to write cache-conscious code and to understand the benefits of certain cache architecture parameters. Background Information and Notes Blocked Matrix Multiply: A traditional NxN matrix multiply can be implemented with a triple nested for loop. However, when N is large, the number of data points accessed before significant reuse is proportional to N 2. This can lead to cache performance and, in turn, overall performance. The advantage of cache memory is predicated on our ability to fit the working data set into the cache. Thus, we can break the large matrix (e.g. 8x8) into smaller matrices (e.g. 2x2) and define the matrix multiply operation recursively (i.e. calculate the overall matrix product by calculating the product of smaller blocks of the matrix. The code for doing this is straightforward though it may require some examination to see exactly what values are being accessed. We have attempted to provide visualization below in addition to the code. // b = block size, N = matrix dimension for(i=0; i < N; i=i+b) for(j=0; j < N; j=j+b) for(k=0; k < N; k=k+b) for(ii=i; ii < i+b; ii++) for(jj=j; jj < j+b; jj++) for(kk=k; kk < k+b; kk++) C[ii][jj] = C[ii][jj] + A[ii][kk] * B[kk][jj]; Last Revised: 6/7/204
2 Accesses performed when outer i,j = 0,0 Black border shows C[ii][jj] values being calculated, Yellow = A matrix access, Blue = B Matrix Accesses Accesses performed when outer i,j = 0,block_size Black border shows C[ii][jj] values being calculated, Yellow = A matrix access, Blue = B Matrix Accesses Figure Matrix accesses made in the blocked version. Each timestep shown is a new value of k. Total number of access made to each matrix A & B = 2b 2 where b = block size. Notice now that in the case of a large matrix (e.g. N = 024) we would only be accessing small blocks of data in any one iteration of the inner loops. Presumably this data could fit in cache and we would see better performance. We will explore this later in the lab. MARS Cache Simulator: MARS includes a data cache simulator that will allow you to provide a cache configuration for your program and have the simulator determine the hit rate and other statistics. The tool is available by clicking Tools.. Cache Simulator. You will need to click the Connect to MIPS button to tell the cache simulator to attach to/monitor activity from the MARS code simulator. 2 Last Revised: 6/7/204
3 The configurable parameters for the cache include Placement Policy (a.k.a. mapping scheme), Block Replacement Policy (always leave at LRU), Set size (a.k.a. K-Ways), Number of blocks in cache and cache block size. The total cache size (in bytes) is simply calculated as: (# of Blocks) * (Cache block size in words) * (4 bytes per word). Note: There is an oddity that statistics are not kept for any cache with 256 blocks or more even though those options exist in the drop down box. Thus, we will have to work around this. Everytime you run your program (you can just click the reset MIPS mem. and register button in the simulator) you should also be sure to RESET the cache simulator by clicking the Reset button. 4 Procedure Run the following experiments on the non-blocked (original) version of the matrix multiply. Notes: Hit rates = Percentages (not absolute counts). Use Excel or other tool to make all plots (no hand plots). a. Cache misses can be categorized as compulsory, capacity, or conflict misses. Determine the number of compulsory misses. To do this set your block size at 8 words per block. Think about what mapping scheme should be used if you do not want any conflict misses. Then continue to increase the size of the cache (i.e. number of blocks) until you can correctly determine the number of compulsory misses. (Think about how you would know the misses counted are all compulsory and not capacity misses). Also determine what the smallest cache size was that only produced compulsory misses. b. Given the smallest cache size that only produced compulsory misses, decrease the block size to 4 words per block and double the number of blocks (to keep the cache size the same). What happens to the number of cache misses? Are these still all compulsory misses? c. Using a total cache size of half of what you found from the previous two parts, vary the cache block size and number of blocks to keep the cache size constant and determine what the optimal block size is to produce the best hit rate. Create a table and an Excel X-Y Last Revised: 6/7/204
4 plot with all possible block sizes and the corresponding hit rates. Make the block size the horizontal axis and plot it on a log scale. d. Use four words per block and start with the previous cache size. Repeatedly reduce the cache size (i.e. number of blocks) by half until the cache size is 28 Bytes. Determine the hit rate for each cache size. e. For a 256 Bytes cache with 4 words per block, plot the hit rate as a function of associativity (k=, 2, 4, 8, 6). What is the optimal associativity setting? Run the following experiments on the blocked version of the matrix multiply using a 256 byte cache with 4 words per block. f. Create a table listing k (where k is the number ways) = {,2,4,8,6} down the vertical axis and the matrix blocking size = 2,, 4 and 6 along the horizontal axis. Fill in the hit rate for each table entry and create an Excel X-Y plot with a curve for each k value. Remember -way set associative is a direct mapping and in this case, 6-ways is a fully associative mapping. g. Discuss (compare and contrast) the results to the unblocked version from part e. 5 Review. Based on your plot from part f, what is the optimal blocking size for the 2x2 matrix multiply? Why do you think this might be? 2. Based on your plot from part f, assuming that increasing k (i.e. the associativity) is more and more expensive, what value of k would you suggest for the cache design (i.e. at what point do we start to see diminishing hit rate benefits from increasing k) 6 Submission. Submit your blocked matrix multiple file on blackboard and a PDF or Word document with answers to the questions posed in Part 2 along with the Review questions. Embed your Excel graphs into this document and provide appropriate. 4 Last Revised: 6/7/204
5 7 Lab Report Name: Score: (Detach and turn this sheet along with any other requested work or printouts). Turn in the data in well formatted tables and Excel plots from each part of the procedure. a. Number of Compulsory Misses: Smallest cache size (in bytes) that only had compulsory misses: Bytes b. When you double the blocks but decrease the block size to 4 words, what happens to the # of misses: Are these still all compulsory misses? c. Table with hit rates for all possible block sizes for the given cache size from previous parts. d. Table listing hit rates for progressively smaller caches? e. Table listing hit rates for different associativity (k-values) and corresponding plot. What is the optimal associativity (k-value)? f. Table listing k (where k is the number ways) = {,2,4,8,6} down the vertical axis and the matrix blocking size = 2,, 4 and 6 along the horizontal axis and corresponding plot. g. A few sentences comparing the benefits or costs of a blocked matrix multiply vs. a non-blocked implementation (i.e. compare the results in e and f). 2. Turn in answers to the review questions. Last Revised: 6/7/204 5
6 8 Grading Rubric Name: Score: Student Name: Item Outcome Score Max. ness A: Number of compulsory misses A: cache size B: answer & C: 2 C: Plot/formatting D: hit rate E: data 2 E: plot E: Associativity F: 2 F: Plot Discussion (=correct & obvious demonstration of clear understanding/reasoning, 2 = generally correct with some lack of reasoning/understanding, = little demonstration of mastery, 0 = demonstration of mastery) Requirement G Review Problem Review Problem 2 / / / Late Deductions Open Ended Comments: SubTotal 25 Total Req. / Mult Score 4 (Excellent) (Good) 2 (Poor) (Deficient) (0) Failure Guideline Blocked Works Works usually Fails in several Does not work Not 6 Last Revised: 6/7/204
7 Matrix Multiply Req. a. correctly all the time but fails in -2 cases Solution cases but not always Major Errors implemented b. 0.5 Solution Solution c d e Answer f and g Review Problem Review Problem 2 TOTAL 0.5 Plot and ly Formatted 0.5 Plot and ly Formatted 0.5 Plot and ly Formatted 2, Wellformatted plot and insightful good good is correct but plot is not formatted as specified is correct but plot is not formatted as specified is correct but plot is not formatted as specified data, but formatting or is only adequate adequate adequate or formatting or answer or answer or Last Revised: 6/7/204 7
EE 352 Lab 4 Cache Me If You Can
EE 352 Lab 4 Cache Me If You Can 1 Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply
More informationEE 352 Lab 3 The Search Is On
EE 352 Lab 3 The Search Is On Introduction In this lab you will write a program to find a pathway through a maze using a simple (brute-force) recursive (depth-first) search algorithm. 2 What you will learn
More informationWhat is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance
What is EE 352 Unit 11 Definitions Address Mapping Performance memory is a small, fast memory used to hold of data that the processor will likely need to access in the near future sits between the processor
More informationDECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations
DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data
More informationEE 101 Lab 5 Fast Adders
EE 0 Lab 5 Fast Adders Introduction In this lab you will compare the performance of a 6-bit ripple-carry adder (RCA) with a 6-bit carry-lookahead adder (CLA). The 6-bit CLA will be implemented hierarchically
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus
More informationComputer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:
Computer Architecture and Engineering CS152 Quiz #2 March 3rd, 2009 Professor Krste Asanovic Name: Notes: This is a closed book, closed notes exam. 80 Minutes 10 Pages Not all questions are of equal difficulty,
More informationAgenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)
7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to
More informationCache Memory: Instruction Cache, HW/SW Interaction. Admin
Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)
More informationCS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data
CS1100: Excel Lab 1 Filtering and Summarizing Data To complete this assignment you must submit an electronic copy to BlackBoard by the due date. Use the data in the starter file. In this lab you are asked
More informationMemory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache
Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies
More informationCS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 33 Caches CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Cache Performance Metrics Miss rate fraction of memory references not found in cache (misses
More informationECE 2300 Digital Logic & Computer Organization. More Caches
ECE 23 Digital Logic & Computer Organization Spring 217 More Caches 1 Prelim 2 stats High: 9 (out of 9) Mean: 7.2, Median: 73 Announcements Prelab 5(C) due tomorrow 2 Example: Direct Mapped (DM) Cache
More informationEE 109 Lab 8a Conversion Experience
EE 109 Lab 8a Conversion Experience 1 Introduction In this lab you will write a small program to convert a string of digits representing a number in some other base (between 2 and 10) to decimal. The user
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More informationInside out of your computer memories (III) Hung-Wei Tseng
Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationCS1100: Excel Lab 1. Problem 1 (25 Points) Filtering and Summarizing Data
CS1100: Excel Lab 1 Filtering and Summarizing Data To complete this assignment you must submit an electronic copy to Blackboard by the due date. Use the data in the starter file. In this lab you are asked
More informationCaches III. CSE 351 Spring Instructor: Ruth Anderson
Caches III CSE 351 Spring 2017 Instructor: Ruth Anderson Teaching Assistants: Dylan Johnson Kevin Bi Linxing Preston Jiang Cody Ohlsen Yufang Sun Joshua Curtis Administrivia Office Hours Changes check
More informationTypes of Cache Misses: The Three C s
Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur
More informationLoops. Lather, Rinse, Repeat. CS4410: Spring 2013
Loops or Lather, Rinse, Repeat CS4410: Spring 2013 Program Loops Reading: Appel Ch. 18 Loop = a computation repeatedly executed until a terminating condition is reached High-level loop constructs: While
More informationMemory Hierarchy 3 Cs and 6 Ways to Reduce Misses
Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationLec 12 How to improve cache performance (cont.)
Lec 12 How to improve cache performance (cont.) Homework assignment Review: June.15, 9:30am, 2000word. Memory home: June 8, 9:30am June 22: Q&A ComputerArchitecture_CachePerf. 2/34 1.2 How to Improve Cache
More information: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016
ETH login ID: (Please print in capital letters) Full name: 263-2300: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016 Instructions Make sure that
More informationCS101 Homework 4: Social Network
CS101 Homework 4: Social Network Prof Tejada Program and report due: 11:59pm Wednesday, March 13 Design document due: 11:59pm Wednesday, March 6 1 Introduction For this assignment create user accounts
More informationIt is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.
p4: Cache Simulator 1. Logistics 1. This project must be done individually. It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such
More informationDenison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud
Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationToday Cache memory organization and operation Performance impact of caches
Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality
More informationLecture 10. Daily Puzzle
Lecture 10 Daily Puzzle Imagine there is a ditch, 10 feet wide, which is far too wide to jump. Using only eight narrow planks, each no more than 9 feet long, construct a bridge across the ditch. Daily
More informationRoadmap. Java: Assembly language: OS: Machine code: Computer system:
Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp %rbp 0111010000011000
More informationwrite-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF
write-through v. write-back option 1: write-through 1 write 10 to 0xABCD CPU Cache ABCD: FF RAM 11CD: 42 ABCD: FF 1 2 write-through v. write-back option 1: write-through write-through v. write-back option
More informationCSC D70: Compiler Optimization Memory Optimizations
CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just
More informationCache Performance II 1
Cache Performance II 1 cache operation (associative) 111001 index offset valid tag valid tag data data 1 10 1 00 00 11 AA BB tag 1 11 1 01 B4 B5 33 44 = data (B5) AND = AND OR is hit? (1) 2 cache operation
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationLast class. Caches. Direct mapped
Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place
More informationLecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time
Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More informationEE 660: Computer Architecture Advanced Caches
EE 660: Computer Architecture Advanced Caches Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David Wentzlaff Agenda Review Three C s Basic Cache
More informationCarnegie Mellon. Cache Lab. Recitation 7: Oct 11 th, 2016
1 Cache Lab Recitation 7: Oct 11 th, 2016 2 Outline Memory organization Caching Different types of locality Cache organization Cache lab Part (a) Building Cache Simulator Part (b) Efficient Matrix Transpose
More informationImage Manipulation in MATLAB Due Monday, July 17 at 5:00 PM
Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM 1 Instructions Labs may be done in groups of 2 or 3 (i.e., not alone). You may use any programming language you wish but MATLAB is highly suggested.
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Direct mapped cache Pretty simple to
More informationMS Office for Engineers
MS Office for Engineers Lesson 4 Excel 2 Pre-reqs/Technical Skills Basic knowledge of Excel Completion of Excel 1 tutorial Basic computer use Expectations Read lesson material Implement steps in software
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction
More informationComputer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College
Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationControl flow graphs and loop optimizations. Thursday, October 24, 13
Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange
More informationhigh-speed-high-capacity memory
Sanjay Rajopadhye Colorado State University n Transparently provide the illusion of a high-speed-high-capacity memory n Built out of caches: small memory devices that exploit the principle of locality
More informationMemory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds
More informationThe course that gives CMU its Zip! Memory System Performance. March 22, 2001
15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab 1 due today Reading: Chapter 5.1 5.3 2 1 Overview How to
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationUniversity of Waterloo Midterm Examination Sample Solution
1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationCache memories are small, fast SRAM based memories managed automatically in hardware.
Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationKathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui
Caches III CSE 410 Winter 2017 Instructor: Justin Hsia Teaching Assistants: Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui Please stop charging your phone in public ports "Just by plugging
More informationCS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours
CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators
More informationCaches III. CSE 351 Winter Instructor: Mark Wyse
Caches III CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan https://what-if.xkcd.com/111/ Administrative Midterm
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationComputer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:
Computer Architecture and Engineering CS152 Quiz #2 March 3rd, 2008 Professor Krste Asanovic Name: Notes: This is a closed book, closed notes exam. 80 Minutes 10 Pages Not all questions are of equal difficulty,
More informationMemory Hierarchy. Announcement. Computer system model. Reference
Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at
More informationCache Memories October 8, 2007
15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache
More informationSubmission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich
263-2300-00: How To Write Fast Numerical Code Assignment 4: 120 points Due Date: Th, April 13th, 17:00 http://www.inf.ethz.ch/personal/markusp/teaching/263-2300-eth-spring17/course.html Questions: fastcode@lists.inf.ethz.ch
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationImproving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion
Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory
More informationECE 2300 Digital Logic & Computer Organization. More Caches
ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further
More informationGRADE CENTRE BEST PRACTICE FOR A4L
GRADE CENTRE BEST PRACTICE FOR A4L Overview A large number of reports use information from the Grade Centre to draw correlations between activity and student success (see appendix). This document serves
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCS422 Computer Architecture
CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:
More informationComputer Architecture, EIT090 exam
Department of Information Technology Lund University Computer Architecture, EIT090 exam 15-12-2004 I. Problem 1 (15 points) Briefly (1-2 sentences) describe the following items/concepts concerning computer
More informationMake sure that your exam is not missing any sheets, then write your full name and login ID on the front.
ETH login ID: (Please print in capital letters) Full name: 63-300: How to Write Fast Numerical Code ETH Computer Science, Spring 015 Midterm Exam Wednesday, April 15, 015 Instructions Make sure that your
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationRubrics. Creating a Rubric
Rubrics A rubric is a set of specific evaluation criteria used to assess an assignment. Instructors use rubrics to carefully outline their assignment requirements and expectations for students. Students
More informationCMSC 611: Advanced Computer Architecture. Cache and Memory
CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.
More informationTypes of Cache Misses. The Hardware/So<ware Interface CSE351 Winter Cache Read. General Cache OrganizaJon (S, E, B) Memory and Caches II
Types of Cache Misses The Hardware/So
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationLesson 21: Solution Sets to Inequalities with Two Variables
Student Outcomes Students recognize and identify solutions to two variable inequalities. They represent the solution set graphically. They create two variable inequalities to represent a situation. Students
More informationCache Lab Implementation and Blocking
Cache Lab Implementation and Blocking Lou Clark February 24 th, 2014 1 Welcome to the World of Pointers! 2 Class Schedule Cache Lab Due Thursday. Start soon if you haven t yet! Exam Soon! Start doing practice
More informationCSE141 Problem Set #4 Solutions
CSE141 Problem Set #4 Solutions March 5, 2002 1 Simple Caches For this first problem, we have a 32 Byte cache with a line length of 8 bytes. This means that we have a total of 4 cache blocks (cache lines)
More informationLecture 2: Memory Systems
Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer
More informationGraphing with Microsoft Excel
Graphing with Microsoft Excel As an AP Physics 1 student, you must be prepared to interpret and construct relationships found in physical laws and experimental data. This exercise is meant to familiarize
More informationCS516 Programming Languages and Compilers II
CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Mar 12 Parallelism and Shared Memory Hierarchy I Rutgers University Review: Classical Three-pass Compiler Front End IR Middle End IR
More informationGPU programming: Code optimization part 1. Sylvain Collange Inria Rennes Bretagne Atlantique
GPU programming: Code optimization part 1 Sylvain Collange Inria Rennes Bretagne Atlantique sylvain.collange@inria.fr Outline Analytical performance modeling Optimizing host-device data transfers Optimizing
More informationCS 135, Fall 2010 Project 4: Code Optimization Assigned: November 30th, 2010 Due: December 12,, 2010, 12noon
CS 135, Fall 2010 Project 4: Code Optimization Assigned: November 30th, 2010 Due: December 12,, 2010, 12noon 1 Introduction This assignment deals with optimizing memory intensive code. Image processing
More informationCSE 240A Midterm Exam
Student ID Page 1 of 7 2011 Fall Professor Steven Swanson CSE 240A Midterm Exam Please write your name at the top of each page This is a close book, closed notes exam. No outside material may be used.
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationCS/ECE 250 Computer Architecture
Computer Architecture Caches and Memory Hierarchies Benjamin Lee Duke University Some slides derived from work by Amir Roth (Penn), Alvin Lebeck (Duke), Dan Sorin (Duke) 2013 Alvin R. Lebeck from Roth
More informationIterative Compilation with Kernel Exploration
Iterative Compilation with Kernel Exploration Denis Barthou 1 Sébastien Donadio 12 Alexandre Duchateau 1 William Jalby 1 Eric Courtois 3 1 Université de Versailles, France 2 Bull SA Company, France 3 CAPS
More information