ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12
|
|
- Margery Miller
- 5 years ago
- Views:
Transcription
1 Homework 3, Part ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) r1.1 Page 1 of 12
2 A.1 Cache Access Part A: Problem Set Consider the following sequence of memory accesses to the main memory in a 16-bit processor: Address (hex) Type A000 R B000 R A380 R A004 W 580C W A108 R 5800 R A10C W A39C W A3AC R A1AC R A006 R 5804 R A.1.1 Assume the following data cache organization: Capacity: 4 KiB Line size: 8 words Organization: direct map Policy: write back, write allocate Trace through the above memory access and answer the following: (i) For each access, is it a hit or a miss? (ii) Show the final content in the cache, including the tag. For sake of simplicity, assume the content of a memory address is the same as its address, i.e., mem[x] = X word size = 16 bit = 2 B. Since line size is 8 words, so 3 more bits are needed to address the words within 1 line, making offset 4 bits. For direct map cache, 4 KiB capacity = 2 8 = 256 lines. Therefore, 8 bits used for index, and remaining 4 bits are for tags. A.1.2 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: 2-way set associative Policy: LRU, write back, write allocate r1.1 Page 2 of 12
3 A.1.3 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: direct map Policy: write through, no write allocate A.1.4 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: 2-way set associative Policy: LRU, write through, no write allocate A.2 Cache Performance A.2.1 A.2.2 AMAT = hit time + miss rate miss penalty = % 300 = 15 AMAT = hit time + miss rate miss penalty = % 300 = 61 A.2.3 Since i-cache always hit, the average CPI of B is: CPI hit = 50 % % % AMAT D = 50 % % % 61 = A.2.4 A.2.5 CPI = CPI hit + 5 % 300 = = r1.1 Page 3 of 12
4 A.2.6 The new memory incurs a larger miss penalty. Therefore, new data cache AMAT becomes: AMAT = hit time + miss rate miss penalty = % 310 = 63 Now, with a perfect I-cache, CPI becomes: CPI hit = 50 % % % AMAT D = 50 % % % 63 = Because of the dual port memory, an I-cache miss that happens to be D-cache miss will no longer incur additional penalty on top of the penalty already incurred by the D-cache miss. For these instructions, they essentially behave as if they are were an I-cache hit in terms of performance. Therefore, CPI = CPI hit + 5 %(1 2 %) 300 = = So the new DRAM actually results in lower performance. A simple way to understand the reason is that because of the larger miss penalty resulting from the new DRAM, it degrades overall CPI in all cases regardless of whether there s an overlap between I-cache and D-cache miss. As the chance of both I-cache and D-cache misses happening at the sam time is low, the over all performance still degrades. A.3 Instruction Cache Adapted from final exam 2015 You are desigining the intruction cache for a new 32-bit processor. Because of other hardware constraints, your cache design must meet the following criteria: Virtual address space is 32-bit wide. Word size is 32 bits. Cache must be indexed and tagged with virtual addresses. Line size must be 4 words The number of bits to index the cache must be 8 bits. Page size is 8 KiB Translation from virtual to physical address is controlled by the OS and the page mapping is pseudo-random. Process ID is 4-bit wide. All compiled programs start running from address 0x The OS performs a context switch every 256 instructions. A.3.1 In order to differentiate data from different processes, you have decided to concatenate the process ID to the virtual address tag in the cache. Assume you are using a direct-map cache, what is the minimum width of the combined tag? 24 bits r1.1 Page 4 of 12
5 A.3.2 Assume you are using a direct-map cache with the maximum possible size given the above constraints, what is the total number of bits required for tag storage in the cache? r1.1 Page 5 of 12
6 A.3.3 Your machine starts with an empty instruction cache. The following program is run as Process 1 (with process ID = 1): 0x _start: addi a1, a1, 1 # instruction 0 0x addi a3, zero, x000401FC bne a1, a3, _start # instruction 127: NOT taken on 4th time Process 1 begins execution for 256 instructions. Then it is stopped. There is no branch instruction in the first 128 instructions except for the last bne. The bne instruction takes the branch the first 3 times it is encountered. The branch condition is false on the 4th time it is executed. After Process 1 has stopped (after it has executed 256 instructions), how many hits and misses have occurred in the instruction cache? Briefly explain your answer. Start with cold cache + no branch + no capacity miss M H H H sequence repeated 128/4 = 32 times during the first 128 instructions. Afterward, since there is no conflict misses, all instructions are cached. Therefore, we have another 128 H. Total: Miss = 32; Hit = 3 times = 224. A.3.4 Following up from the previous part, after Process 1 is stopped, the OS switches in Process 2. Process 2 is the exact same program as Process 1 except it is started by a different user. Therefore, it executes the exact same code as the above part except with a process ID = 2. Process 2 is again stopped after 256 instructions. How many hits and misses will have occurred in the instruction cache due to running Process 2? Briefly explain your answer. r1.1 Page 6 of 12
7 The result is exactly the same as Process 1. Process 2 uses the same cache locations as P1 and overwrites them as if they are from a cold cache. A.3.5 Since there are only 2 processes running, the process switching between Process 1 and Process 2 continues: Process 1 is again switched in after Process 2. It continues its execution for another 256 instructions. By now, Process 1 has completed 512 instructions since it started. Process 2 is switched in place of Process 1 and run for 256 instructions. When the two processes run in the processor for the second time, is the instruction hit rate changed when compared to your answers in A.3.3 and A.3.4? If the hit rate has changed, explain how is it different. If the hit rate remains the same, explain why is it the case. The same as above. Because of ping pong effect, P1 resumes and see an empty cache because P2 has evicted them. Same for P2. A.3.6 As an attempt to improve performance of the instruction cache, you are given the option to change the instruction cache organization into 2-way set associative while keeping the cache capacity constant. However, as the tag capacity is limited, you can no longer store the process IDs in the cache. As a result, you need to flush the cache for every context switch. Considering the 2-process scenario above, how would the change to a 2-way set associate cache with cache flushing affect the overall instruction hit rate after both Process 1 and 2 have finished running 512 instructions? That is: the processor has just finished executing Process 1 (256 r1.1 Page 7 of 12
8 instructions) Process 2 (256 instructions) Process 1 (256 instructions) Process 2 (256 instructions). Because of flushing, every time a process comes back, the cache is cold. So the answer is the same as above. A.3.7 Your teammate suggest that you should keep the process ID in the cache tag to avoid flushing of cache during context switch. However, since you are limited in hardware resource, your teammate suggest that the capacity of the cache should be reduced by half as a tradeoff. Briefly discuss if this scheme using a 2-way set associative cache with reduced cache capacity and no flushing may improve performance over the original direct map cache. Yes it works. In this particular case, the code from both processes will be stored in the 2 sets, preserving over context switch. A.4 Streaming Cache Performance You are investigating the cache performance of your processor regarding the following code segment: // int i, n, a; // int y[], x[]; for (i = 0; i < n; i++) { y[i] = a*x[i] + y[i]; } r1.1 Page 8 of 12
9 A.4.1 Cache Hit or Miss Assume that a, i, and n are stored in registers. Array x is stored at memory address starting at 0xA while the array y is stored at memory address 0xB The processor data cache is initially empty. Below are the details of the data cache: Capacity: 1 MiB Organization: 2-way set associative Line size: 4 words Replacement policy: true LRU Write policy: write through, no write allocate A write buffer is available such that the processor can resume running immediately after writing the data to the buffer. Let n = 2 20, trace through the above code, then show and explain the sequence of cache hit or miss that will occur in the space below. Use the notation RH for read hit, WH for write hit, RM for read miss, and WM for write miss. The first access (read) of each line is always going to be a miss, while subsequent accesses to the same line will be a hit because of spatial locality. Specifically, read x[0] generates a miss (M), y[0] generates a miss (M). But since the cache is 2-way s.a., x[0] and y[0] are stored on the same line but in different set. Therefore, write y[0] is a hit (H). Then (R) x[1], (R) y[1], (W) y[1], (R) x[2], (R) y[2], (W) y[2], (R) x[3], (R) y[3], (W) y[3] are all going to be hit (9 Hits total). This pattern repeats until the end of the cache, which is after 32k lines. (2 20 bytes in 2 ways. So each way contains 2 19 bytes = 2 17 words = 2 15 lines At that point, x[128k] will be stored at line 0 again. A conflict miss + LRU policy makes x[0] evicted. As a result, x[128k] is a miss, but will be stored over the block of x[0]. Similarly, y[128k] is a miss, and will be stored over y[0]. Therefore, the overall sequence is: RM RM WH RH RH WH RH RH WH RH RH WH, then repeat. A.4.2 Based on your result above, what is the overall data cache miss rate of the above code when executed in the main CPU? Consider BOTH read and write access. Assume the write and read miss penalties are both 300 cycles. Hit time is 1 cycle. What is the average memory access time (AMAT) for this code? Overall miss rate is 2/12 = 1/6. Therefore, AMAT is = 51cycles 6 A.4.3 Now the above C code is compiled into the following RISC-V instructions: r1.1 Page 9 of 12
10 # a0 = 2^20 # a1 is base address of x[] # a2 is base address of y[] # constant a is stored in register a3 00: loop: addi a0 a0-1 04: lw t1, 0(a1) 08: lw t2, 0(a2) 0C: mult t0, t1, a3 10: add t0, t0, t2 14: sw t0, 0(a2) 18: addi a1, a1, 4 1C: addi a2, a2, 4 20: bne a0, zero, loop Let n = Assume add, addi and bne takes 1 cycle; mult takes 4 cycles; and performance of lw and sw depends on the cache performance. What is the total run time of the above code in cycles? You may leave the variable n in your answer ( ) = = 161n A.4.4 Write Back Cache If the write policy of the cache is changed to write back, how would the following be affected? (i) Read miss penalty (ii) Overall performance of the above code Explain your answer by tracing through the execution of the above code, highlighting any different in the required cache content handling with the use of write back cache. r1.1 Page 10 of 12
11 Because of the change in write policy, there will be increased number of cache lines that need to be written back on a read misses. In particular, when the access to the cache wrap around in the above code, the current values of y[i-128k] is dirty and need to be written back. But since there is a write buffer, the additional time to write back before reading new data is minimal. As a result, there won t be major difference in performance. A.5 Page Table & TLB A.5.1 A.5.2 Some advantages of larger page size: As page size is larger, the number of page is smaller and thus the page table size can be reduced. If the number of TLB entry keeps unchanged, PPNs stored in TLB cover a larger memory space and TLB hit rate could be higher. Some disadvantages of larger page size: Whenever there is page fault, a whole page needs to be loaded to memory. If these data in the same page is not accesses much before the page is swapped out, it is a waste of memory bandwidth. If the physical memory allocated to a program is limited, the number of pages stored in memory is smaller. When the program access across different pages, it may result in page fault frequently. A.5.3 As TLB access is in the processor critical path, you are experimenting if a direct map TLB could improve performance. Now assume the capacity of the TLB remains unchanged, consider the impact to TLB performance regarding the instructions of a program P. The instructions of P begins at memory location 0x , during run time, it allocates and access only 1 data page starting at address 0xD In the following scenarios, explain how the well the TLB would perform with a direct map organization when compared to the original fully associative organization? (i) Only 1 copy of P running in the system (ii) 2 users both running P at the same time. OS flushes TLB when swapping between processes (iii) 2 users both running P at the same time. OS does not flush TLB. Instead, the process id is prepended to the virtual page number, i.e., VPN tag becomes [process id VPN]. (iv) 2 users both running P at the same time. OS does not flush TLB. Instead, the process id is appended to the virtual page number, i.e., VPN tag becomes [VPN process id]. r1.1 Page 11 of 12
12 (i) Instruction page map to index 2, data page map to index 0. In a full associative TLB, they would be mapped to different location and have both page translations stored in TLB throughout the lifetime of the process. However, in a direct map cache, the 2 collide at index 0, causing multiple TLB misses during run time. (ii) As the TLB is flushed when swapping processes, the scenario for each process is the same as above as if they start from a cold TLB. (iii) With the VPN prepended, in fully associative organization, all 4 pages will be held in TLB. With direct map, the pages from both processes, as well as from both instruction and data page will collide. (iv) With VPN appended, it is possible that the pages from different process would be mapped to different locations in a direct map TLB. However the instruction and the data page from the same process will still collide with each other. r1.1 Page 12 of 12
Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm
Second Semester, 2015 16 Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through
More informationCSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double
CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double *)malloc(sizeof(double)*n*n); B = (double *)malloc(sizeof(double)*n*n);
More informationMemory Hierarchies &
Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and
More informationSOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:
SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)
ECE331 Homework 4 Due Monday, August 13, 2018 (via Moodle) 1. Below is a list of 32-bit memory address references, given as hexadecimal byte addresses. The memory accesses are all reads and they occur
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More information6.004 Tutorial Problems L20 Virtual Memory
6.004 Tutorial Problems L20 Virtual Memory Page Table (v + p) bits in virtual address (m + p) bits in physical address 2 v number of virtual pages 2 m number of physical pages 2 p bytes per physical page
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationPick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality
Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior
More informationCache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationCOSC 3406: COMPUTER ORGANIZATION
COSC 3406: COMPUTER ORGANIZATION Home-Work 5 Due Date: Friday, December 8 by 2.00 pm Instructions for submitting: Type your answers and send it by email or take a printout or handwritten (legible) on paper,
More informationDigital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>
Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationCSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour
CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading
More informationHomework 2 (r1.0) Due: Mar 27, 2018, 11:55pm
Second Semester, 2016 17 Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts in this homework. Part A includes questions
More informationSarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>
Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction
More informationProblem 1 (logic design)
Problem 1 (logic design) For this problem, you are to design and implement a sequential multiplexor that works as follows. On each clock cycle, interpret the current input as a selector from the most recent
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationAgenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements
CS 61C: Great Ideas in Computer Architecture Virtual II Guest Lecturer: Justin Hsia Agenda Review of Last Lecture Goals of Virtual Page Tables Translation Lookaside Buffer (TLB) Administrivia VM Performance
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationCSE 378 Final Exam 3/14/11 Sample Solution
Name There are 8 questions worth a total of 100 points. Please budget your time so you get to all of the questions don t miss the short questions at the end. Keep your answers brief and to the point. Copies
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationDirect Mapped Cache Hardware. Direct Mapped Cache. Direct Mapped Cache Performance. Direct Mapped Cache Performance. Miss Rate = 3/15 = 20%
Direct Mapped Cache Direct Mapped Cache Hardware........................ mem[xff...fc] mem[xff...f8] mem[xff...f4] mem[xff...f] mem[xff...ec] mem[xff...e8] mem[xff...e4] mem[xff...e] 27 8-entry x (+27+)-bit
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 23: Virtual Memory Bernhard Boser & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Agenda Virtual Memory Paged Physical Memory Swap Space Page Faults
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationChapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary
Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationPipelining Exercises, Continued
Pipelining Exercises, Continued. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle
More informationCaches III CSE 351 Spring
Caches III CSE 351 Spring 2018 https://what-if.xkcd.com/111/ Making memory accesses fast! Cache basics Principle of locality Memory hierarchies Cache organization Direct-mapped (sets; index + tag) Associativity
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory
CS 61C: Great Ideas in Computer Architecture Lecture 23: Virtual Memory Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 1 Agenda Virtual Memory Paged Physical Memory Swap Space
More informationInside out of your computer memories (III) Hung-Wei Tseng
Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,
More informationCaches III. CSE 351 Autumn Instructor: Justin Hsia
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan https://what if.xkcd.com/111/
More informationIntroduction. Memory Hierarchy
Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationHomework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm
Second Semester, 2016 17 Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm Instruction: Submit your answers electronically through
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationComputer Architecture EE 4720 Final Examination
Name Computer Architecture EE 4720 Final Examination Primary: 6 December 1999, Alternate: 7 December 1999, 10:00 12:00 CST 15:00 17:00 CST Alias Problem 1 Problem 2 Problem 3 Problem 4 Exam Total (25 pts)
More informationassociativity terminology
Caching 1 2 associativity terminology direct-mapped one block per set E-way set associative E blocks per set E ways in the cache fully associative one set total (everything in one set) 3 Tag-Index-Offset
More informationVirtual Memory Overview
Virtual Memory Overview Virtual address (VA): What your program uses Virtual Page Number Page Offset Physical address (PA): What actually determines where in memory to go Physical Page Number Page Offset
More informationChapter 10: Virtual Memory. Lesson 05: Translation Lookaside Buffers
Chapter 10: Virtual Memory Lesson 05: Translation Lookaside Buffers Objective Learn that a page table entry access increases the latency for a memory reference Understand that how use of translationlookaside-buffers
More informationPipelined processors and Hazards
Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More information6.004 Tutorial Problems L14 Cache Implementation
6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationQuestion 1: (20 points) For this question, refer to the following pipeline architecture.
This is the Mid Term exam given in Fall 2018. Note that Question 2(a) was a homework problem this term (was not a homework problem in Fall 2018). Also, Questions 6, 7 and half of 5 are from Chapter 5,
More information6.004 Tutorial Problems L14 Cache Implementation
6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to
More informationHomework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm
Second Semester, 2016 17 Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Instruction: Submit your answers electronically through
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationLecture 21: Virtual Memory. Spring 2018 Jason Tang
Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring 2019 Caches and the Memory Hierarchy Assigned February 13 Problem Set #2 Due Wed, February 27 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationCS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook
CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationVirtual Memory. Stefanos Kaxiras. Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources.
Virtual Memory Stefanos Kaxiras Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources. Caches Review & Intro Intended to make the slow main memory look fast by
More informationCSE 351. Virtual Memory
CSE 351 Virtual Memory Virtual Memory Very powerful layer of indirection on top of physical memory addressing We never actually use physical addresses when writing programs Every address, pointer, etc
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationLogical Diagram of a Set-associative Cache Accessing a Cache
Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to
More informationCS/CoE 1541 Mid Term Exam (Fall 2018).
CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance
CS 6C: Great Ideas in Computer Architecture Direct Mapped Caches, Set Associative Caches, Cache Performance Instructor: Justin Hsia 7//23 Summer 23 Lecture # Great Idea #3: Principle of Locality/ Memory
More informationCS 61C: Great Ideas in Computer Architecture Caches Part 2
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/fa15 Software Parallel Requests Assigned to computer eg,
More informationAdministrivia. Caches III. Making memory accesses fast! Associativity. Cache Organization (3) Example Placement
s III CSE Autumn Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam ehman Sam Wolfson Savanna Yee Vinny Palaniappan Administrivia Midterm regrade requests
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationECE 411 Exam 1. This exam has 5 problems. Make sure you have a complete exam before you begin.
This exam has 5 problems. Make sure you have a complete exam before you begin. Write your name on every page in case pages become separated during grading. You will have three hours to complete this exam.
More informationVirtual Memory. Motivation:
Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationAlexandria University
Alexandria University Faculty of Engineering Division of Communications & Electronics CC322 Computer Architecture Sheet 3 1. A cache has the following parameters: b, block size given in numbers of words;
More information198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14
98:23 Intro to Computer Organization Lecture 4 Virtual Memory 98:23 Introduction to Computer Organization Lecture 4 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Several slides courtesy of
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 208 Lecture 23 LAST TIME: VIRTUAL MEMORY Began to focus on how to virtualize memory Instead of directly addressing physical memory, introduce a level of indirection
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/16/17 Fall 2017 - Lecture #15 1 Outline
More informationFinal Exam Fall 2008
COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of
More informationECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU
More information10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts
// CS C: Great Ideas in Computer Architecture (Machine Structures) s Part Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~csc/ Organization and Principles Write Back vs Write Through
More informationECE 30, Lab #8 Spring 2014
ECE 30, Lab #8 Spring 20 Shown above is a multi-cycle CPU. There are six special registers in this datapath: PC, IR, MDR, A, B, and ALUOut. Of these, PC and IR are enabled to change when PCWr and IRWr
More informationThe University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011
1. Performance Principles [5 pts] The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 For each of the following comparisons,
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationexercise 4 byte blocks, 4 sets index valid tag value 00
Caching (part 2) 1 exercise 3 address (hex) result 00000000 (00) 00000001 (01) 01100011 (63) 01100001 (61) 01100010 (62) 00000000 (00) 01100100 (64) 4 byte blocks, 4 sets index valid tag value 00 01 10
More informationHY225 Lecture 12: DRAM and Virtual Memory
HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access
More informationCS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.
CS 433 Homework 4 Assigned on 10/17/2017 Due in class on 11/7/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More information