CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100

Similar documents
CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)

The cache is 4-way set associative, with 4-byte blocks, and 16 total lines

4. Specifications and Additional Information

CIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1

Advanced Computer Architecture CMSC 611 Homework 5 Due in class at 1.05pm, Nov 21 st, 2012

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

Improving our Simple Cache

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CMSC 2833 Lecture Memory Organization and Addressing

Performance metrics for caches

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design

Structure of Computer Systems

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

Memory hierarchy and cache

Computer Structure. X86 Virtual Memory and TLB

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

CIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1

EE457. Homework #7 (Virtual Memory)

exercise 4 byte blocks, 4 sets index valid tag value 00

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Alexandria University

CIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Computer Organization (Autonomous)

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CIS-331 Exam 2 Fall 2015 Total of 105 Points Version 1

14:332:331. Week 13 Basics of Cache

Fundamentals of Computer Systems

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.


The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

C1098 JPEG Module User Manual

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Virtual Memory (2) 1

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00

ADDRESS TRANSLATION AND TLB

ADDRESS TRANSLATION AND TLB

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS3350B Computer Architecture

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

LECTURE 11. Memory Hierarchy

Fundamentals of Computer Systems

Virtual Memory. User memory model so far:! In reality they share the same memory space! Separate Instruction and Data memory!!

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

virtual memory Page 1 CSE 361S Disk Disk

1/19/2009. Data Locality. Exploiting Locality: Caches

CIS-331 Exam 2 Spring 2016 Total of 110 Points Version 1

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

ECE 30 Introduction to Computer Engineering

A Framework for Memory Hierarchies

Gateway Ascii Command Protocol

Page 1. Memory Hierarchies (Part 2)

Prerequisite Quiz September 3, 2003 CS252 Computer Architecture and Engineering

CPSC 330 Computer Organization

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

Write only as much as necessary. Be brief!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory Nov 9, 2009"

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

ECHO Process Instrumentation, Inc. Modbus RS485 Module. Operating Instructions. Version 1.0 June 2010

CS , Fall 2001 Exam 2

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Memory hierarchies: caches and their impact on the running time

Memory Systems and Performance Engineering

CSE502: Computer Architecture CSE 502: Computer Architecture

Pollard s Attempt to Explain Cache Memory

CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

www-inst.eecs.berkeley.edu/~cs61c/

Agenda. Recap: Components of a Computer. Agenda. Recap: Cache Performance and Average Memory Access Time (AMAT) Recap: Typical Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Way-associative cache

Translation Buffers (TLB s)

6. Specifications & Additional Information

Logical Diagram of a Set-associative Cache Accessing a Cache

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Chapter 8. Virtual Memory

Unit 2. Chapter 4 Cache Memory

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

CIS-331 Final Exam Spring 2015 Total of 115 Points. Version 1

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Transcription:

CSE 30321 Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100 Problem 1: (30 points) Background: One possible organization of the memory hierarchy on an Intel Pentium 4 chip is: - An 8KB L1 instruction cache (e.g. the L1 instruction cache contains only instruction encodings) - An 8KB L1 data cache (e.g. the L1 instruction cache contains only data words) - A 256KB unified L2 cache (e.g. the L2 cache can contain both data and instruction encodings) - An 8 MB unified L3 cache (e.g. the L3 cache can contain both data and instruction encodings) Question A: (5 points) What might be a benefit of splitting up the L1 cache into separate instruction and data caches? Question B: (5 points) The L1 data cache in the P4 holds 8 KBytes of data. Each block holds 64 bytes of data and the cache is 4-way set associative. How many blocks are in the cache? How many sets are in the cache? Question C: (5 points) The L2 cache in the P4 holds 256 KBytes of data. The cache is 8-way set associative. Each block holds 128 bytes of data. If physical addresses are 32 bits long, each data word is 32 bits, and entries are word addressable, what bits of the 32 bit physical address comprise the tag, index and offset? Question D: (5 points) The L3 cache in the P4 holds 8 MBytes of data. It too is 8-way set associative and each block holds 128 bytes of data. If physical addresses are 32 bits long, each data word is 32 bits, and entries are word addressable, what bits of the 32 bit physical address comprise the tag, index and offset? Question E: (10 points) For the P4 architecture discussed above, we know the following: - The hit time for either L1 cache is 2 CCs - The time required to find data in the L2 cache is 7 CCs o Thus, the time to find data in the L2 cache is really 9 CCs 2 CCs to look in L1 and 7 more CCs to get the data from L2 - The time required to find data in the L3 cache is 10 CCs o Thus, the time to find data in the L3 cache is 9 CCs + 10 CCs because we look in L1 and L2 first - If data is not found in the L3 cache, we will need to get data from main memory. o The bus between main memory and the L3 cache can transfer 64 bytes of data at a time Thus, 2 main memory references are required to fill up the 128 byte cache block o It takes 50 CCs to access main memory once and accesses cannot overlap. For the benchmark that we are running: - The L1 cache hit rate is 90% - The L2 cache hit rate is 95% - The L3 cache hit rate is 99% What is the average memory access time?

Problem 2: (10 points) Background: You are to design a cache: - that holds 4 KB (note BYTES) of data. - where each data word is just 16 bits long (and addresses are to the word). - that is 2-way set associative - that has128 total blocks - that uses a write back policy Question: Draw the cache. [Note, you don t have to draw every last thing. Just provide enough information so that we know you go the problem right. I.e. if there are 19 words in a cache block, write block 0, block 1,. block 18 Problem 3 (20 points): Consider a cache with the following specs: - It is 4-way set associative - It holds 64 KB of data - Data words are 32 bits each - Data words are byte addressed - Physical addresses are 32 bits - There are 64 words per cache block - A First-In, First-Out replacement policy is used for each set - All cache entries are initially empty (i.e. their valid bits are not set) At startup the following addresses (in hex) are supplied to this cache in the order below: 1. AA AB 10 11-B BC - compulsory miss (block 1) 2. AA AB 10 11-B BF - hit (block 1) 3. FC CB 10 11-B BD - compulsory miss (block 2) 4. AA AB 10 11-B BF - hit (block 2) 5. FF FB 10 11-B BC - compulsory miss (block 3) 6. FF FB 10 11-B BB - hit (block 3) 7. FC BD 10 11-B AC - compulsory miss (block 4) 8. AA AB 11 00-C CF - compulsory miss (block 1) 9. AA AB 10 11-B FF - hit (block 1) 10. FF FF 10 10-A BC - compulsory miss (block 1) Question A: (5 points) - How many sets of the cache has this pattern of accesses touched? Question B: (5 points) - How many compulsory misses are there for this pattern of accesses? Question C: (5 points) - How many conflict misses are there for this pattern of accesses? Question D: (5 points) - What is the overall miss rate for this pattern of accesses?

Problem 4 (10 points): Background: There are many ways to reduce cache misses by changing processor hardware e.g. making cache blocks larger, making larger caches, increasing associativity, etc. However, cache performance can be improved by optimizing software too. Question: If the arrays in the for loop below are stored in memory, there could be many cache misses. Explain why. for (j = 0; j < 100; j = j + 1) { for (i = 0; i < 5000; i = i + 1) { x [i] [j] = 2 * x [i] [j] Problem 5 (30 points): As discussed in class, virtual memory uses a page table and TLB to track the mapping of virtual addresses to physical addresses. This exercise shows how the TLB must be updated as addresses are accessed. For this problem, you should assume the following: 1. Pages have 4 KB of addressable locations 2. There is a 4-entry, fully-associative TLB 3. The TLB uses a true, least-recently-used replacement policy For the pattern of virtual addresses shown below, comment on whether the entry is: a. Found in the TLB b. Not found in the TLB but is found in the Page Table c. Not found in the TLB or the Page Table (thus, there is a page fault) The initial TLB and page table state is shown below. If there is a page fault, and you need to replace a page from disk, assume the page number is one higher than the current highest page in the page table. (This is currently 1100 2 / 12 10. Thus, the next page would be 1101 2 / 13 10, the next would be 1110 2 / 14 10, etc.) Stream of Virtual Addresses: MSB LSB 1 0000 1111 1111 1111 2 0111 1010 0010 1000 3 0011 1101 1010 1101 4 0011 1010 1001 1000 5 0001 1100 0001 1001 6 0001 0000 0000 0000 7 0010 0010 1101 0000

Initial TLB State: (Note that 1 = Most Recently Used and 4 = Least Recently Used ) 1 3 1011 1100 1 2 0111 0100 1 1 1001 0110 0 4 0000 ----- Initial Page Table State: Valid Physical Page # 0000 1 0101 0001 0 Disk 0010 0 Disk 0011 1 0110 0100 1 1001 0101 1 1011 0110 0 Disk 0111 1 0100 1000 0 Disk 1001 0 Disk 1010 1 0011 1011 1 1100

Question A: (15 points) You must use these tables to answer this question! Given the virtual addresses above, show how the state of the TLB changes by filling in the tables: Address 1: 1 0000 1111 1111 1111 Address 2: 2 0111 1010 0010 1000 Address 3: 3 0011 1101 1010 1101 Address 4: 4 0011 1010 1001 1000

Address 5: 5 0001 1100 0001 1001 Address 6: 6 0001 0000 0000 0000 Address 7: 7 0010 0010 1101 0000 Question B: (10 points) What would some of the advantages of having a larger page size be? What are some of the disadvantages? Question C: (5 points) Given the parameters in the table above, calculate the total page table size in a system running 5 applications. Virtual Address Size Page Size Page Table Entry Size 64 bits 16 KB 8 bytes