ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

Similar documents
Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Lecture 11 Cache. Peng Liu.

Caching Basics. Memory Hierarchies

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Pipelined processors and Hazards

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

CPSC 330 Computer Organization

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Pollard s Attempt to Explain Cache Memory

Modern Computer Architecture

lecture 18 cache 2 TLB miss TLB - TLB (hit and miss) - instruction or data cache - cache (hit and miss)

Virtual Memory. Main memory is a CACHE for disk Advantages: illusion of having more physical memory program relocation protection.

1. Creates the illusion of an address space much larger than the physical memory

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

LECTURE 12. Virtual Memory

CS422 Computer Architecture

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Virtual to physical address translation

Lec 11 How to improve cache performance

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Chapter Seven Morgan Kaufmann Publishers

EE 4683/5683: COMPUTER ARCHITECTURE

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Topic 18: Virtual Memory

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Improving Memory Access The Cache and Virtual Memory

Virtual Memory - Objectives

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Page 1. Memory Hierarchies (Part 2)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE331: Hardware Organization and Design

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

ECE468 Computer Organization and Architecture. Virtual Memory

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

ECE4680 Computer Organization and Architecture. Virtual Memory

V. Primary & Secondary Memory!

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Main Memory (Fig. 7.13) Main Memory

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Virtual Memory

Lecture 11 Cache and Virtual Memory

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

CPE 631 Lecture 04: CPU Caches

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

ECE331: Hardware Organization and Design

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

CS61C : Machine Structures

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Computer Architecture. Lecture 8: Virtual Memory

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Computer Architecture Spring 2016

Advanced Memory Organizations

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Page 1. The Problem. Caches. Balancing Act. The Solution. Widening memory gap. As always. Deepen memory hierarchy

Virtual Memory, Address Translation

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Virtual Memory Review. Page faults. Paging system summary (so far)

Virtual Memory. CS 3410 Computer System Organization & Programming

Handout 4 Memory Hierarchy

Memory Hierarchies 2009 DAT105

Topic 18 (updated): Virtual Memory

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. User memory model so far:! In reality they share the same memory space! Separate Instruction and Data memory!!

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

Memory Hierarchy. Mehran Rezaei

Transcription:

ADMIN SI232 Set #8: Caching Finale and Virtual Reality (Chapter 7) Ethics Discussion & Reading Quiz Wed April 2 Reading posted online Reading finish Chapter 7 Sections 7.4 (skip 53-536), 7.5, 7.7, 7.8 2 Down the home stretch Split Caches Monday Wednesday Friday Instructions and data have different properties May benefit from different cache organizations (block size, assoc ) 2-Apr Review Exam 9-Apr Ethics Discussion. Reading Quiz. Virtual. I/O ICache (L) DCache (L) L 6-Apr I/O. Pipelining. HW (Ch 7) Due. Pipelining, hazards. L2 Cache L2 Cache 23-Apr ILP and multiple issue. Course paper due. Improving multiple issue Last class. Advanced topics/review. Main memory Main memory Final Exam Monday May (first exam day) Why else might we want to do this? 3 4

Cache Performance Performance Example # Unified Cache Simplified model: execution time = (execution cycles + stall cycles) cycle time = exectime + stalltime stall cycles = (or) = Two typical ways of improving performance: decreasing the miss rate decreasing the miss penalty What happens if we increase block size? Accesses MissRate MissPenalty Pr ogram Instructions Misses MissPenalty Pr ogram Instruction Suppose processor has a CPI of.5 given a perfect cache. If there are.2 memory accesses per instruction, a miss penalty of 2 cycles, and a miss rate of %, what is the effective CPI with the real cache? Add associativity? 5 6 Performance Example #2 Split Cache Exercise # Suppose processor has a CPI of. given a perfect cache. If the instruction cache miss rate is 3% and the data cache miss rate is %, what is the effective CPI with the real cache? Assume a miss penalty of cycles and that 4% of instructions access data. Suppose processor has a CPI of 2. given a perfect cache. If there are.5 memory accesses per instruction, a miss penalty of 4 cycles, and a miss rate of 5%, what is the effective CPI with the real cache? 7 8

Exercise #2 Exercise #3 You are given a processor with a 64KB, direct-mapped instruction cache and a 64 KB, 4-way associative data cache. For a certain program, the instruction cache miss rate is 4% and the data cache miss rate is 5%. The miss penalty is cycles for the I-cache and 2 cycles for the D-cache. If the CPI is.5 with a perfect cache, and 3% of instructions access data, what is the effective CPI? Suppose a processor has a base CPI of. (no cache misses) but currently an effective of CPI of 2. once misses are considered. There are.5 memory accesses per instruction. If the processor has a unified cache with a miss rate of 2%, how low must the miss penalty be in order to improve the effective CPI to.3? 9 Exercise #4 Stretch Cache Complexities A certain processor has a CPI of. with a perfect cache and a CPI of.2 when memory stalls (due to misses) are included. We wish to speed up the performance of this processor by 2x, which we will do by increasing the clock rate. This, however, will not improve the memory system, so misses will take just as long in absolute terms. How much faster must the clock rate be in to meet our goal? t always easy to understand implications of caches: 2 2 Radix sort Radix sort 6 8 2 6 8 4 2 Quicksort 4 Quicksort 4 8 6 32 64 28 256 52 24 248 496 4 8 6 32 64 28 256 52 24 248 496 Size (K items to sort) Size (K items to sort) Theoretical behavior of Radix sort vs. Quicksort Observed behavior of Radix sort vs. Quicksort 2

Cache Complexities Program Design for Caches Example Here is why: 5 4 3 2 Radix sort Option # for (j = ; j < 2; j++) for (i = ; i < 2; i++) x[i][j] = x[i][j] + ; Quicksort 4 8 6 32 64 28 256 52 24 248 496 Size (K items to sort) system performance is often critical factor multilevel caches, pipelined processors, make it harder to predict outcomes Compiler optimizations to increase locality sometimes hurt ILP Option #2 for (i = ; i < 2; i++) for (j = ; j < 2; j++) x[i][j] = x[i][j] + ; Difficult to predict best algorithm: need experimental data 3 4 Program Design for Caches Example 2 Why might this code be problematic? int A[24][24]; int B[24][24]; for (i = ; i < 24; i++) for (j = ; j < 24; j++) A[i][j] += B[i][j]; VIRTUAL MEMORY How to fix it? 5 6

Virtual memory summary (part ) Virtual memory summary (part 2) Data access without virtual memory: 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number address Data access with virtual memory: 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number Translation all problems in Computer Science can be solved by another level of indirection -- Butler Lampson Translation 29 28 27 5 4 3 2 9 8 3 2 29 28 27 5 4 3 2 9 8 3 2 Physical page number Physical page number Physical address Cache Cache Disk 7 Disk 8 Virtual Address Translation Main memory can act as a cache for the secondary storage (disk) es Physical addresses Address translation Terminology: Cache block Cache miss Cache tag Byte offset 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number Disk addresses Advantages: Illusion of having more physical memory Program relocation Protection te that main point is caching of disk in main memory but will affect all our memory references! 9 Translation 29 28 27 5 4 3 2 9 8 3 2 Physical page number Physical address 2

Pages: virtual memory blocks Page Tables Page faults: the data is not in memory, retrieve it from disk huge miss penalty (slow disk), thus pages should be fairly Virtual page number Page table Physical page or Valid disk address Physical memory Replacement strategy: can handle the faults in software instead of hardware Writeback or write-through? Disk storage 2 22 Example Address Translation Part Example Address Translation Part 2 Our virtual memory system has: 32 bit virtual addresses 28 bit physical addresses 496 byte page sizes How to split a virtual address? Virtual page # What will the physical address look like? Physical page # Translate the following addresses:. C56 2. C623 3. C245 C C C2 C3 C4 C5 C6 Valid? Page Table Physical Page or Disk Block # A24 A2 FB 83 729 56 F5C How many entries in the page table? 23 24

Exercise # Exercise #2 (new problem not related to #) Given system with 2 bit virtual addresses 6 bit physical addresses 256 byte page sizes How to split a virtual address? Virtual page # What will the physical address look like? Physical page # Translate the following addresses:. B489 2. B223 3. B6 B B B2 B3 B4 B5 B6 Valid? Page Table Physical Page or Disk Block # B4 A2 AB 83 759 58 F4C How many entries in the page table? 25 26 Exercise #3 Exercise #4 Given the fragment of a page table on the right, answer the following questions assuming a page size of 24 bytes B. What is the virtual address size (# bits) 2. What is the physical address size (# bits) 3. Number of entries in page table? B B2 B3 B4 B5 B6 Valid? Page Table Physical Page # B A AB 8 9 58 F4 Is it possible to have the physical address be wider (more bits) than the virtual address? If so, when would this ever make sense? 27 28

Making Address Translation Fast Protection and Address Spaces A cache for address translations: translation lookaside buffer TLB Virtual page Physical page number Valid Dirty Ref Tag address Physical memory Every program has its own address space Program A s address xc 2 not same as program B s OS maps every virtual address to distinct physical addresses How do we make this work? Page tables Page table Typical values: Valid Dirty Ref Physical page or disk address 6-52 entries, miss-rate:.% - % miss-penalty: cycles Disk storage 29 TLB Can program A access data from program B?, if. OS can map different virtual page # s to same physical page # s So A s xc 2 = B s xb32 2 2. Program A has read or write access to the page 3. OS uses supervisor/kernel protection to prevent user programs from modifying page table/tlb 3 Integrating Virtual, TLBs, and Caches TLBs and Caches TLB access (Figure 7.25) What happens after translation? 3 3 29 28 27 5 4 3 2 9 8 3 2 TLB miss exception TLB hit? Physical address Virtual page number Write? Try to read data from cache Write access bit on? Translation Cache miss stall while read block Cache hit? Write protection exception Deliver data to the CPU Cache miss stall while read block Try to write data to cache Cache hit? 29 28 27 5 4 3 2 9 8 3 2 Physical page number Write data into cache, update the dirty bit, and put the data and the address into the write buffer 3 Cache 32

Modern Systems Some Issues Things are getting complicated! Processor speeds continue to increase very fast much faster than either DRAM or disk access times,,, Performance CPU Design challenge: dealing with this growing disparity Prefetching? 3 rd level caches and more? design? Year 33 34