ADMIN SI232 Set #8: Caching Finale and Virtual Reality (Chapter 7) Ethics Discussion & Reading Quiz Wed April 2 Reading posted online Reading finish Chapter 7 Sections 7.4 (skip 53-536), 7.5, 7.7, 7.8 2 Down the home stretch Split Caches Monday Wednesday Friday Instructions and data have different properties May benefit from different cache organizations (block size, assoc ) 2-Apr Review Exam 9-Apr Ethics Discussion. Reading Quiz. Virtual. I/O ICache (L) DCache (L) L 6-Apr I/O. Pipelining. HW (Ch 7) Due. Pipelining, hazards. L2 Cache L2 Cache 23-Apr ILP and multiple issue. Course paper due. Improving multiple issue Last class. Advanced topics/review. Main memory Main memory Final Exam Monday May (first exam day) Why else might we want to do this? 3 4
Cache Performance Performance Example # Unified Cache Simplified model: execution time = (execution cycles + stall cycles) cycle time = exectime + stalltime stall cycles = (or) = Two typical ways of improving performance: decreasing the miss rate decreasing the miss penalty What happens if we increase block size? Accesses MissRate MissPenalty Pr ogram Instructions Misses MissPenalty Pr ogram Instruction Suppose processor has a CPI of.5 given a perfect cache. If there are.2 memory accesses per instruction, a miss penalty of 2 cycles, and a miss rate of %, what is the effective CPI with the real cache? Add associativity? 5 6 Performance Example #2 Split Cache Exercise # Suppose processor has a CPI of. given a perfect cache. If the instruction cache miss rate is 3% and the data cache miss rate is %, what is the effective CPI with the real cache? Assume a miss penalty of cycles and that 4% of instructions access data. Suppose processor has a CPI of 2. given a perfect cache. If there are.5 memory accesses per instruction, a miss penalty of 4 cycles, and a miss rate of 5%, what is the effective CPI with the real cache? 7 8
Exercise #2 Exercise #3 You are given a processor with a 64KB, direct-mapped instruction cache and a 64 KB, 4-way associative data cache. For a certain program, the instruction cache miss rate is 4% and the data cache miss rate is 5%. The miss penalty is cycles for the I-cache and 2 cycles for the D-cache. If the CPI is.5 with a perfect cache, and 3% of instructions access data, what is the effective CPI? Suppose a processor has a base CPI of. (no cache misses) but currently an effective of CPI of 2. once misses are considered. There are.5 memory accesses per instruction. If the processor has a unified cache with a miss rate of 2%, how low must the miss penalty be in order to improve the effective CPI to.3? 9 Exercise #4 Stretch Cache Complexities A certain processor has a CPI of. with a perfect cache and a CPI of.2 when memory stalls (due to misses) are included. We wish to speed up the performance of this processor by 2x, which we will do by increasing the clock rate. This, however, will not improve the memory system, so misses will take just as long in absolute terms. How much faster must the clock rate be in to meet our goal? t always easy to understand implications of caches: 2 2 Radix sort Radix sort 6 8 2 6 8 4 2 Quicksort 4 Quicksort 4 8 6 32 64 28 256 52 24 248 496 4 8 6 32 64 28 256 52 24 248 496 Size (K items to sort) Size (K items to sort) Theoretical behavior of Radix sort vs. Quicksort Observed behavior of Radix sort vs. Quicksort 2
Cache Complexities Program Design for Caches Example Here is why: 5 4 3 2 Radix sort Option # for (j = ; j < 2; j++) for (i = ; i < 2; i++) x[i][j] = x[i][j] + ; Quicksort 4 8 6 32 64 28 256 52 24 248 496 Size (K items to sort) system performance is often critical factor multilevel caches, pipelined processors, make it harder to predict outcomes Compiler optimizations to increase locality sometimes hurt ILP Option #2 for (i = ; i < 2; i++) for (j = ; j < 2; j++) x[i][j] = x[i][j] + ; Difficult to predict best algorithm: need experimental data 3 4 Program Design for Caches Example 2 Why might this code be problematic? int A[24][24]; int B[24][24]; for (i = ; i < 24; i++) for (j = ; j < 24; j++) A[i][j] += B[i][j]; VIRTUAL MEMORY How to fix it? 5 6
Virtual memory summary (part ) Virtual memory summary (part 2) Data access without virtual memory: 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number address Data access with virtual memory: 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number Translation all problems in Computer Science can be solved by another level of indirection -- Butler Lampson Translation 29 28 27 5 4 3 2 9 8 3 2 29 28 27 5 4 3 2 9 8 3 2 Physical page number Physical page number Physical address Cache Cache Disk 7 Disk 8 Virtual Address Translation Main memory can act as a cache for the secondary storage (disk) es Physical addresses Address translation Terminology: Cache block Cache miss Cache tag Byte offset 3 3 29 28 27 5 4 3 2 9 8 3 2 Virtual page number Disk addresses Advantages: Illusion of having more physical memory Program relocation Protection te that main point is caching of disk in main memory but will affect all our memory references! 9 Translation 29 28 27 5 4 3 2 9 8 3 2 Physical page number Physical address 2
Pages: virtual memory blocks Page Tables Page faults: the data is not in memory, retrieve it from disk huge miss penalty (slow disk), thus pages should be fairly Virtual page number Page table Physical page or Valid disk address Physical memory Replacement strategy: can handle the faults in software instead of hardware Writeback or write-through? Disk storage 2 22 Example Address Translation Part Example Address Translation Part 2 Our virtual memory system has: 32 bit virtual addresses 28 bit physical addresses 496 byte page sizes How to split a virtual address? Virtual page # What will the physical address look like? Physical page # Translate the following addresses:. C56 2. C623 3. C245 C C C2 C3 C4 C5 C6 Valid? Page Table Physical Page or Disk Block # A24 A2 FB 83 729 56 F5C How many entries in the page table? 23 24
Exercise # Exercise #2 (new problem not related to #) Given system with 2 bit virtual addresses 6 bit physical addresses 256 byte page sizes How to split a virtual address? Virtual page # What will the physical address look like? Physical page # Translate the following addresses:. B489 2. B223 3. B6 B B B2 B3 B4 B5 B6 Valid? Page Table Physical Page or Disk Block # B4 A2 AB 83 759 58 F4C How many entries in the page table? 25 26 Exercise #3 Exercise #4 Given the fragment of a page table on the right, answer the following questions assuming a page size of 24 bytes B. What is the virtual address size (# bits) 2. What is the physical address size (# bits) 3. Number of entries in page table? B B2 B3 B4 B5 B6 Valid? Page Table Physical Page # B A AB 8 9 58 F4 Is it possible to have the physical address be wider (more bits) than the virtual address? If so, when would this ever make sense? 27 28
Making Address Translation Fast Protection and Address Spaces A cache for address translations: translation lookaside buffer TLB Virtual page Physical page number Valid Dirty Ref Tag address Physical memory Every program has its own address space Program A s address xc 2 not same as program B s OS maps every virtual address to distinct physical addresses How do we make this work? Page tables Page table Typical values: Valid Dirty Ref Physical page or disk address 6-52 entries, miss-rate:.% - % miss-penalty: cycles Disk storage 29 TLB Can program A access data from program B?, if. OS can map different virtual page # s to same physical page # s So A s xc 2 = B s xb32 2 2. Program A has read or write access to the page 3. OS uses supervisor/kernel protection to prevent user programs from modifying page table/tlb 3 Integrating Virtual, TLBs, and Caches TLBs and Caches TLB access (Figure 7.25) What happens after translation? 3 3 29 28 27 5 4 3 2 9 8 3 2 TLB miss exception TLB hit? Physical address Virtual page number Write? Try to read data from cache Write access bit on? Translation Cache miss stall while read block Cache hit? Write protection exception Deliver data to the CPU Cache miss stall while read block Try to write data to cache Cache hit? 29 28 27 5 4 3 2 9 8 3 2 Physical page number Write data into cache, update the dirty bit, and put the data and the address into the write buffer 3 Cache 32
Modern Systems Some Issues Things are getting complicated! Processor speeds continue to increase very fast much faster than either DRAM or disk access times,,, Performance CPU Design challenge: dealing with this growing disparity Prefetching? 3 rd level caches and more? design? Year 33 34