CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)

Similar documents
CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1

The cache is 4-way set associative, with 4-byte blocks, and 16 total lines

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

CIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1

exercise 4 byte blocks, 4 sets index valid tag value 00

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

4. Specifications and Additional Information

Performance metrics for caches

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

LECTURE 11. Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CS3350B Computer Architecture

Advanced Memory Organizations

Logical Diagram of a Set-associative Cache Accessing a Cache

Memory hierarchy and cache

14:332:331. Week 13 Basics of Cache

Fundamentals of Computer Systems

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

www-inst.eecs.berkeley.edu/~cs61c/

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

LECTURE 10: Improving Memory Access: Direct and Spatial caches

CIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1

CIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1

Introduction to OpenMP. Lecture 10: Caches

Memory hierarchies: caches and their impact on the running time

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Fundamentals of Computer Systems

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CIS-331 Exam 2 Fall 2015 Total of 105 Points Version 1

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

12 Cache-Organization 1

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Lecture 17: Memory Hierarchy: Cache Design

CMSC 2833 Lecture Memory Organization and Addressing

Cray XE6 Performance Workshop

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Page 1. Memory Hierarchies (Part 2)

The University of Adelaide, School of Computer Science 13 September 2018

Caches Concepts Review

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CSEE 3827: Fundamentals of Computer Systems, Spring Caches

Way-associative cache

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Introduction. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter Seven Morgan Kaufmann Publishers

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

minor HCL update Pipe Hazards (finish) / Caching post-quiz Q1 B post-quiz Q1 D new version of hclrs.tar:

CIS-331 Exam 2 Spring 2016 Total of 110 Points Version 1

Computer Organization (Autonomous)

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches III. CSE 351 Spring Instructor: Ruth Anderson

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Gateway Ascii Command Protocol

Problem 3. (12 points):

ECE 30 Introduction to Computer Engineering

The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Lecture 2: Memory Systems

associativity terminology


Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

UCB CS61C : Machine Structures

ECHO Process Instrumentation, Inc. Modbus RS485 Module. Operating Instructions. Version 1.0 June 2010

Advanced cache optimizations. ECE 154B Dmitri Strukov

Cache Optimisation. sometime he thought that there must be a better way

CS3350B Computer Architecture

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

Transcription:

CSE 30321 Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: November 17, 2009 Due: December 1, 2009 This assignment can be done in groups of 1, 2, or 3. The assignment is worth 85 points Problem 1: (10 points) Memory hierarchies can improve performance for more than one reason. Often, reasons are related to a cache improving either temporal locality (i.e. data or an instruction that was referenced once will be referenced again soon) or spatial locality (i.e. data or an instruction nearby data or an instruction that was referenced will probably be needed soon). Describe the general characteristics of a program that would exhibit very high amounts of temporal locality but very little spatial locality with regard to data accesses. Provide an example program (pseudocode is fine).

Problem 2: (30 points) One possible organization of the memory hierarchy on an Intel Pentium 4 chip is: - An 8KB L1 instruction cache o (e.g. the L1 instruction cache contains only instruction encodings) - An 8KB L1 data cache o (e.g. the L1 instruction cache contains only data words) - A 256KB unified L2 cache o (e.g. the L2 cache can contain both data and instruction encodings) - An 8 MB unified L3 cache o (e.g. the L3 cache can contain both data and instruction encodings) Question A: (5 points) What might be a benefit of splitting up the L1 cache into separate instruction and data caches? Question B: (5 points) The L1 data cache in the P4 holds 8 KBytes of data. Each block holds 64 bytes of data and the cache is 4-way set associative. How many blocks are in the cache? How many sets are in the cache? Question C: (5 points) The L2 cache in the P4 holds 256 KBytes of data. The cache is 8-way set associative. Each block holds 128 bytes of data. If physical addresses are 32 bits long, each data word is 32 bits, and entries are word addressable, what bits of the 32 bit physical address comprise the tag, index and offset? Question D: (5 points) The L3 cache in the P4 holds 8 MBytes of data. It too is 8-way set associative and each block holds 128 bytes of data. If physical addresses are 32 bits long, each data word is 32 bits, and entries are word addressable, what bits of the 32 bit physical address comprise the tag, index and offset? Question E: (10 points) For the P4 architecture discussed above, we know the following: - The hit time for either L1 cache is 2 CCs - The time required to find data in the L2 cache is 7 CCs o Thus, the time to find data in the L2 cache is really 9 CCs 2 CCs to look in L1 and 7 more CCs to get the data from L2 - The time required to find data in the L3 cache is 10 CCs o Thus, the time to find data in the L3 cache is 9 CCs + 10 CCs because we look in L1 and L2 first - If data is not found in the L3 cache, we will need to get data from main memory. o The bus between main memory and the L3 cache can transfer 64 bytes of data at a time Thus, 2 main memory references are required to fill up the 128 byte cache block o It takes 50 CCs to access main memory once and accesses cannot overlap. For the benchmark that we are running: - The L1 cache hit rate is 90% - The L2 cache hit rate is 95% - The L3 cache hit rate is 99% What is the average memory access time?

Problem 3: (5 points) A certain computer has the following characteristics: - A physical address that is 20 bits long o (not all addresses are/have to be 32 bits!) - A cache with each block holding 8 data words - Addresses that are to the word. - The cache has 1024 blocks. - The cache is direct mapped How may bits make up the index, tag, and offset of this physical address? Problem 4: (10 points) You are to design a cache: - that holds 2 KB (note BYTES) of data. - where each data word is just 8 bits long (and addresses are to the word). - that is 4-way set associative - that has 64 total blocks - that uses a write back policy Draw the cache. [Note, you donʼt have to draw every last thing. Just provide enough information so that we know you go the problem right. I.e. if there are 19 words in a cache block, write block 0, block 1,. block 18

Problem 5 (20 points): Consider a cache with the following specs: - It is 4-way set associative - It holds 64 KB of data - Data words are 32 bits each - Data words are byte addressed - Physical addresses are 32 bits - There are 64 words per cache block - A First-In, First-Out replacement policy is used for each set - All cache entries are initially empty (i.e. their valid bits are not set) At startup the following addresses (in hex) are supplied to this cache in the order below: 1. AA AB BB BC 2. AA AB BB BF 3. FC CB BB BD 4. AA AB BB BF 5. FF FB BB BC 6. FF FB BB BB 7. FC BD BB AC 8. AA AB CC CF 9. AA AB BB FF 10. FF FF AA BC Question A: (5 points) - How many sets of the cache has this pattern of accesses touched? Question B: (5 points) - How many compulsory misses are there for this pattern of accesses? Question C: (5 points) - How many conflict misses are there for this pattern of accesses? Question D: (5 points) - What is the overall miss rate for this pattern of accesses?

Problem 6 (10 points): There are many ways to reduce cache misses by changing processor hardware e.g. making cache blocks larger, making larger caches, increasing associativity, etc. However, cache performance can be improved by optimizing software too. If the arrays in the for loop below are stored in memory, there could be many cache misses. Explain why. for (j = 0; j < 100; j = j + 1) { for (i = 0; i < 5000; i = i + 1) { x [i] [j] = 2 * x [i] [j] Extra Credit: There are many ways to reduce cache misses by changing processor hardware e.g. making cache blocks larger, making larger caches, increasing associativity, etc. However, cache performance can be improved by optimizing software too. Another technique that improves cache performance at the software level is called Blocking. By doing your own research, explain how blocking improves cache performance.