Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

Similar documents
Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:


CS152 Computer Architecture and Engineering

CS152 Computer Architecture and Engineering SOLUTIONS Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

Computer Architecture and Engineering CS152 Quiz #2 March 7th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Computer Architecture and Engineering. CS152 Quiz #3. March 18th, Professor Krste Asanovic. Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key

Computer Architecture and Engineering. CS152 Quiz #1. February 19th, Professor Krste Asanovic. Name:

Computer Architecture and Engineering CS152 Quiz #5 May 2th, 2013 Professor Krste Asanović Name: <ANSWER KEY>

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Computer Architecture CS372 Exam 3

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

CS3350B Computer Architecture

CSE 240A Midterm Exam

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2012 Professor Krste Asanović

Lecture 7 - Memory Hierarchy-II

1/19/2009. Data Locality. Exploiting Locality: Caches

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

COSC 6385 Computer Architecture - Memory Hierarchies (I)

ארכיטקטורת יחידת עיבוד מרכזי ת

Computer Architecture and Engineering CS152 Quiz #1 Wed, February 17th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Computer System Architecture Quiz #1 March 8th, 2019

CS61C : Machine Structures

EITF20: Computer Architecture Part4.1.1: Cache - 2

CS252 Prerequisite Quiz. Solutions Fall 2007

EEC 483 Computer Organization

Computer Systems Architecture

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

NAME: Problem Points Score. 7 (bonus) 15. Total

Lecture 11 Reducing Cache Misses. Computer Architectures S

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Memory Hierarchies 2009 DAT105

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

University of California, Berkeley College of Engineering

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Introduction to OpenMP. Lecture 10: Caches

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Midterm Exam 1 Wednesday, March 12, 2008

Name: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

ECE331: Hardware Organization and Design

COSC3330 Computer Architecture Lecture 19. Cache

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2011 Professor Krste Asanović

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

Final Exam Fall 2007

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CSE141 Problem Set #4 Solutions

6 th Lecture :: The Cache - Part Three

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Caching Basics. Memory Hierarchies

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

Page 1. Memory Hierarchies (Part 2)

Student Name: University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science

CS 152 Computer Architecture and Engineering

CS161 Design and Architecture of Computer Systems. Cache $$$$$

LECTURE 11. Memory Hierarchy

CS 136: Advanced Architecture. Review of Caches

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CS 351 Exam 3, Fall 2013

Inside out of your computer memories (III) Hung-Wei Tseng

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

EE 4683/5683: COMPUTER ARCHITECTURE

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

EITF20: Computer Architecture Part4.1.1: Cache - 2

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

CMU Introduction to Computer Architecture, Spring 2014 HW 5: Virtual Memory, Caching and Main Memory


EN1640: Design of Computing Systems Topic 06: Memory System

A Cache Hierarchy in a Computer System

UCB CS61C : Machine Structures

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Solutions for Chapter 7 Exercises

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Transcription:

Computer Architecture and Engineering CS152 Quiz #2 March 3rd, 2009 Professor Krste Asanovic Name: Notes: This is a closed book, closed notes exam. 80 Minutes 10 Pages Not all questions are of equal difficulty, so look over the entire exam and budget your time carefully. Please carefully state any assumptions you make. Please write your name on every page in the quiz. You must not discuss a quiz's contents with students who have not yet taken the quiz. If you have inadvertently been exposed to the quiz prior to taking it, you must tell the instructor or TA. You will get no credit for selecting multiple choice answers without giving explanations if the instructions ask you to explain your choice. Writing name on each sheet 1 Point Question 1 Question 2 Question 3 TOTAL 30 Points 26 Points 23 Points 80 Points

Problem Q.2.1: Way-Predicting Cache Evaluation [30 Points] To improve the hit rate for our data cache, we made it 2-way set associative (it was formerly direct mapped). Sadly as a consequence the hit time has gone up, and we are going to use way-prediction to improve it. Each cache set will have a way prediction indicating which way is likely to be accessed. When doing a cache access, the prediction is used to route the data. If it is incorrect, there will be a delay as the correct way is used. If the desired data is not resident in the cache, it is like a normal cache miss. After a cache miss, the prediction is not used since the correct block is already known. Figure Q2.1-A summarizes this process. Figure Q2.1-A: Way-prediction FSM Since there are two ways, only one bit will be used per prediction, and its value will directly correspond to the way. How the predictions are generated or maintained are beyond the scope of this problem. You can assume that at the beginning of a cycle, the selected prediction is available, and determining the prediction is not on the critical path. Page 2 of 10

Problem Q2.1.A Baseline Cache Design 10 Points The diagram below shows the data portion of our cache. Our cache has 16 byte lines, is 2-way set associative, and has a total capacity of 4kB. Please complete Table Q2.1-1 on the next page with delays across each element of the cache. Using the data you compute in Table Q2.1-1, calculate the critical path delay through this cache (from when the Input Address is set to when the correct data is on the Data Bus). Page 3 of 10

Component Delay equation (ns) Total (ns) Decoder 0.2!(# of index bits) + 1 2.4 Memory Array 0.2!log 2 (# of rows) + 4 0.2!log 2 (# of bits in a row) + 1 N-to-1 MUX 0.5!log 2 N + 1 2 Buffer driver 2 2 Data output driver 0.5!(associativity) + 1 2 Critical Path Delay 10.4 Table Q2.1-1 You may assume that the prediction register is correctly loaded at the start of the cycle, and the clk-to-q delay is 100ps. The inverting and non-inverting buffer drivers both have the same delay. You only need to worry about the case of a fast hit (cache hit with correct prediction). Show your work: # Blocks = (Capacity) / (Block size) = 4KB / 16B = 2 12-4 = 2 8 = 256 blocks # Sets = (# Blocks) / (Associativity) = 256/2 = 128 sets = 2 7 => 7 index bits # of rows = # Sets = 256 => log 2 (# of rows) = 7 # of bits in a row = (# of blocks per row)(# of bytes per block)(# of bits per byte) = (2)(16)(8) = 2 1+4+3 = 2 8 => log 2 (# of bits in a row) = 8 Note that this is only the data part of the cache (tags not part of this problem) N = 4 (shown in diagram, but 16B holds 4 words) Critical Path = Decoder + Memory Array + N-to-1 MUX + Data output driver Note that the prediction and buffer drivers were not on the critical path Scoring - Decoder & Memory Array: 2 pts each - N-to-1 MUX, Buffer driver, Data output driver: 1 pt each - Critical Path Delay: 3 pts (2pts if correct path but wrong total) Page 4 of 10

Problem Q2.1.B Way-Prediction Behavior 10 Points Now we will study the impact of way prediction on cache hit rate. For this problem, the cache is a 128 byte, 2-way set associative cache with 16 bytes per cache line. The cache is byte addressable and uses a least recently used (LRU) replacement policy. Please complete Table Q2.1-2 on the next page showing a trace of memory accesses. In the table, each entry contains the {tag,index} contents of that line, or -, if no data is present. You should only fill in elements in the table when a value changes. For simplicity, the addresses are only 8 bits. The first 3 lines of the table have been filled in for you. The initial values marked with a * are the least recently used ways in that set. For your convenience, the address breakdown for access to the main cache is depicted below. 7 6 5 4 3 2 1 0 TAG INDEX WORD SELECT BYTE SELECT Problem Q2.1.C Average Memory Access Time 10 Points Assume the cache has a 90% hit rate, and that the way-predictor is right 75% of the time there is a cache hit. A cache hit with a correct way-prediction will take 2 cycles, a cache hit with an incorrect prediction will take 4 cycles, and a cache miss (way-prediction irrelevant) will take 60 cycles. Compute the average memory access time of the cache with way-prediction. Without way-prediction (the original 2-way set associative cache) has the same hit rate and miss time, but a 3 cycle hit time. By how many cycles does way-prediction improve the average memory access time? Its important to notice that 60 cycles is the miss time, not the miss penalty (miss time = miss penalty + hit time). AMAT (with WP) = (Hit %)((% WP correct)(wp correct time) + (% WP incorrect time)(wp incorrect time)) + (1 Hit %)(Miss time) = (0.9)((0.75)2 + (0.25)4) + (0.1)60 = (0.9)(1.5 + 1) + 6 = (0.9)(2.5) + 6 = 8.25 (worth 6 pts) AMAT (without WP) = (Hit %)(Hit time)+ (1 Hit %)(Miss time) = (0.9)(3) + (0.1)(60) = 2.7 + 6 = 8.7 (worth 4 points with improvement) Improvement: 8.7 8.25 = 0.45 (you can have fractional cycles since this is AMAT) Page 5 of 10

Set0 Set1 Set2 Set3 Cache Way-Prediction Read Way0 Way1 Way0 Way1 Way0 Way1 Way0 Way1 Hit? Prediction Correct? Address 0 -* 9 -* A* 6 7 B* 04 Y 0 Y 68 Y 1 Y 2C 2 N 1 - B4 Y 0 N 54 5 N 0-3C 3 N 0-94 Y 1 N 28 Y 0 Y 64 Y 0 N 80 8 N 0-64 Y 1 Y B4 Y 1 Y Table Q2.1-2 1pt per line (1 pt is free if not blank) Page 6 of 10

Problem Q2.2: Cache Code Co-Design Problem Q2.2.A 26 Points Cache from SW 8 Points Examine the code given below to compute the average of an array: total = 0; for(j=0; j < k; j++) { sub_total = 0; /* Nested loops to avoid overflow */ for(i=0; i < N; i++) { sub_total += A[j*N + i]; } total += sub_total/n; } average = total/k; When designing a cache to run this application, given a constant cache capacity and associativity, will you want a larger or smaller block size? Why? Examining the code its good to see that the program accesses consecutive addresses and never reuses any of them. Because of this you will want to leverage its spatial locality by using a larger blocker size to reduce compulsory misses Problem Q2.2.B Adversarial Cache Patterns 8 Points Given two fully associative caches that differ only in replacement policy (FIFO or LRU), can you come up with a memory access stream where the cache with FIFO replacement will have less misses than the cache with LRU replacement? To make it easier to describe, assume the caches have 4 entries and the line size is only 1 unit of the address space. Any pattern where FIFO does better than LRU at some point. An simple sequence: 0 1 2 3 0 4 1 FIFO Miss Miss Miss Miss Hit Miss Hit LRU Miss Miss Miss Miss Hit Miss Miss Page 7 of 10

Problem Q2.2.C Cache Tradeoffs 10 Points Examine the code given below (note it is slightly different than Q2.2.A): total = 0; for(i=0; i < N; i++) { sub_total = 0; /* Nested loops to avoid overflow */ for(j=0; j < k; j++) { sub_total += A[j*N + i]; } total += sub_total/k; } average = total/n; Generally, how will the size of the array and the cache capacity impact the choice of line size for good performance? Why? This problem is asking how to pick a line size given a cache capacity and array size. Changing the size of the array or the cache were not options. Basically you want to be able to fit at least one iteration of the outer loop in the cache with a block size!2 words. To do this, you want at least two columns of A to be able to fit in the cache at a time, so the you will get some hits from spatial locality. Without this, the program will suffer 100% misses. If the block size is two and en entire column can fit, one iteration of the outer loop result in all misses, but the next iteration will be all hits. As the size of the array gets smaller relative to the cache capacity, the columns are shorter, so you will want a larger line size to reduce compulsory misses. As the size of the array gets larger relative to the cache capacity, the columns will get taller. You will need to make the line size smaller to get more lines in the cache to reduce conflict misses so it can still hold an entire iteration of the outer loop. Page 8 of 10

Problem Q2.3: Three C s of Cache Misses (Short Answer) [23 points] Mark whether the following modifications will cause each of the categories to increase, decrease, or whether the modification will have no effect. You can assume the baseline cache is set associative. Explain your reasoning to receive credit. This table continues on the next page. Compulsory Misses Conflict Misses Capacity Misses No effect Decrease No effect Double the associativity (capacity and line size constant) (halves # of sets) If the data wasn t ever in the cache, increasing associativity with the constraints won t change that. Typically higher associativity reduces conflict misses because there are more places to put the same element. Capacity was given as a constant. Increase No effect Increase Halving the line size (associativity and # sets constant) (halves capacity) Shorter lines mean less prefetching for shorter lines. It reduces the cache s ability to exploit spatial locality. Same # of sets and associativity. Capacity has been cut in half. No effect Increase No effect. Doubling the number of sets (capacity and line size constant) (halves associativity) If the data wasn t ever in the cache, increasing the number of sets with the constraints won t change that. Less associativity. Capacity is still constant 9

Compulsory Misses Conflict Misses Capacity Misses No effect Decrease Decrease Adding a victim cache Victim cache only holds lines previously held by CPU. The victim cache holds the victim of a conflict, so it can be used again later. Slightly larger cache capacity. No effect, since victim cache doesn t count towards capacity total (only -0.5 off) Adding prefetching Decrease Hopefully prefetched data is there when needed. No effect doesn t affect placement - or - No effect doesn t affect change - or - Possibly increase prefetch data could possibly pollute cache Possibly increase prefetch data could possibly pollute cache Each box is worth 1.5pts, 0.5 pts were given for free. See slide 12 of lecture 7 for reference on miss types. END OF QUIZ 10