Cache Performance (H&P 5.3; 5.5; 5.6)
|
|
- Gervais Booth
- 6 years ago
- Views:
Transcription
1 Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time + Average memory access time Avg. mem. time = Hit time + Miss rate x Miss penalty Memory performance eqn. Improving memory hierarchy performance: Decrease hit time Decrease miss rate Decrease miss penalty Inf3 Computer Architecture
2 Reducing Cache Miss Rates Cache miss classification: the three C s Compulsory misses (or cold misses): when a block is accessed for the first time Capacity misses: when a block is not in the cache because it was evicted because the cache was full Conflict misses: when a block is not in the cache because it was evicted because the cache set was full Inf3 Computer Architecture
3 Cache Misses vs. Cache Size Direct mapped H&P Fig way set associative Miss rate Conflict Capacity Cold Miss rate Conflict Capacity Cold 0 0 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB Cache size Cache size Miss rates are very small in practice Miss rates decrease significantly with cache size Miss rates decrease with set-associativity because of reduction in conflict misses Inf3 Computer Architecture
4 Reducing Cold Miss Rates Technique 1: Large block size Principle of locality other data in the block are likely to be used soon Reduce cold miss rate May increase conflict and capacity miss rate for the same cache size (fewer blocks in cache) Increase miss penalty because more data has be brought in each time Uses more memory bandwidth Inf3 Computer Architecture
5 Cache Misses vs. Block Size Miss rate KB 6 16KB 5 64KB KB B 32B 64B 128B 256B Block size Small caches are very sensitive to block size In all cases very large blocks (> 128B) have worse miss rate H&P Fig Inf3 Computer Architecture
6 Reducing Cold Miss Rates Technique 2: Prefetching Idea: bring into the cache (or a special buffer) ahead of time data or instructions that are likely to be used soon Reduce cold misses Uses more memory bandwidth May increase conflict and capacity miss rates (unless prefetch buffer is used) Does not increase miss penalty (prefetch is handled after main cache access is completed) Inf3 Computer Architecture
7 Prefetching Hardware prefetching: hardware automatically prefetches cache blocks on a cache miss No need for extra prefetching instructions in the program Effective for regular accesses, such as instructions E.g., next blocks prefetching, stride prefetching Inf3 Computer Architecture
8 Prefetching Software prefetching: compiler inserts instructions at proper places in the code to prefetch Requires new IS instructions for prefetching (nonbinding prefetch) Adds instructions to compute the prefetching addresses and to perform the prefetch itself (prefetch overhead) E.g., data prefetching in loops, linked list prefetching Inf3 Computer Architecture
9 Software Prefetching E.g., prefetching in loops: Brings the next required block, two iterations ahead of time (assuming each element of x is 4-bytes long and the block has 64 bytes). for (i=0; i<=999; i++) { for (i=0; i<=999; i++) { if (i%16 == 14) x[i] = x[i] + s; prefetch(x[i+16]); } x[i] = x[i] + s; } E.g, linked-list prefetching: Brings the next object in the list while (student) { student->mark = rand(); student = student->next; } while (student) { prefetch(student->next); student->mark = rand(); student=student->next; } Inf3 Computer Architecture
10 Reducing Conflict Miss Rates Technique 3: High associativity caches More options for block placement fewer conflicts Reduce conflict miss rate May increase hit access time because tag match takes longer May increase miss penalty because replacement policy is more involved Inf3 Computer Architecture
11 Cache Misses vs. Associativity Miss rate KB 16KB 64KB 512KB 0 1-way 2-way 4-way fully Associativity Small caches are very sensitive to associativity In all cases more associativity decreases miss rate, but little difference between 4-way and fully associative Inf3 Computer Architecture
12 Reducing Conflict Miss Rates Technique 4: Compiler optimizations E.g., merging arrays: improves spatial locality if the fields are used together for the same index int val[size]; int key[size]; E.g., loop fusion: improves temporal locality struct merge { int val; int key; }; Struct merge merged_array[size]; for (i=0; i<1000; i++) A[i] = A[i]+1; for (i=0; i<1000; i++) B[i] = B[i]+A[i]; for (i=0; i<1000; i++) { A[i] = A[i]+1; B[i] = B[i]+A[i]; } Inf3 Computer Architecture
13 Reducing Conflict Miss Rates E.g., blocking: change row-major and column-major array distributions to block distribution to improve spatial and temporal locality for (i=0; i<5; i++) for (j=0; j<5; j++) { r=0; for (k=0; k<5; k++) { r=r+y[i][k]*z[k][j]; x[i][j]=r; } x: y: z: (matrix multiplication x=y*z) i=0;j=0;0<k<5 i=0;j=1;0<k<5 i=1;j=0;0<k<5 Poor temporal locality Poor spatial and temporal locality Inf3 Computer Architecture
14 Reducing Conflict Miss Rates Loop Blocking or Tiling for (jj = 0; jj < 5; jj = jj+2) for (kk = 0; kk < 5; kk = kk+2) for (i = 0; i < 5; i++) for (j = jj; j < min(jj+2-1,5); j++) { r = 0; for (k = kk; k < min(kk+2-1,5); k++) r = r + y[i][k]*z[k][j]; x[i][j]= x[i][j] + r; } x: y: z: jj=0;kk=0;i=0;j=0;0<k<1 jj=0;kk=0;i=0;j=1;0<k<1 jj=0;kk=0;i=1;j=0;0<k<1 Better temporal locality Inf3 Computer Architecture
15 Cache Performance II Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. Avg. mem. time = Hit time + Miss rate x Miss penalty Memory performance eqn. Improving memory hierarchy performance: Decrease hit time Decrease miss rate Decrease miss penalty Inf3 Computer Architecture
16 Reducing Cache Miss Penalty Technique 1: Victim caches (Can also considered to reduce miss rate) Very small cache used to capture evicted lines from cache In case of cache miss the data may be found quickly in the victim cache (cache miss time < VC hit time < cache hit time) Replacement policy is much more involved CPU Memory address L1 cache Victim cache tag data Main memory tag data?? Inf3 Computer Architecture
17 Reducing Cache Miss Penalty Technique 2: giving priority to reads over writes The value of a read (load instruction) is likely to be used soon, while a write does not affect the processor Idea: place write misses in a write buffer, and let read misses overtake writes Reads to the same memory address of a pending write in the buffer now become hits in the buffer: sw 512(r0), r3 lw r2, 512(0) 1. write miss goes into write buffer 2. read hits in the write buffer and gets the value from the previous write memory address value 512 R[r3] write buffer Inf3 Computer Architecture
18 Reducing Cache Miss Penalty Technique 3: early restart and critical word first On a read miss processor will need just the loaded word (or byte) very soon, but processor has to wait until the whole block is brought into the cache Early restart: as soon as the requested word arrives in the cache, send it to the processor and then continue reading the rest of the block into the cache lw r2, 3(0) CPU 0x0003 L1 cache Main memory tag data ? Cache block Inf3 Computer Architecture
19 Reducing Cache Miss Penalty Technique 3: early restart and critical word first Critical word first: get the requested word first from the memory, send it asap to the processor and then continue reading the rest of the block into the cache lw r2, 3(0) CPU 0x0003 L1 cache Main memory tag data ? 03 Cache block Inf3 Computer Architecture
20 Reducing Cache Miss Penalty Technique 4: non-blocking (or lockup-free) caches Dynamic scheduling (Tomasulo s): ALU instructions can overtake a cache miss instruction Non-blocking caches: other memory instructions can also overtake a cache miss instruction Cache can service multiple hits while waiting on a miss: hit under miss More aggressive: cache can service multiple hits while waiting on multiple misses: miss under miss or hit under multiple misses Cache and memory must be able to service multiple requests concurrently Must keep track of multiple outstanding memory operations Increased hardware complexity Inf3 Computer Architecture
21 Non-blocking Caches H&P Fig Significant improvement from small degree of outstanding memory operations Some applications benefit from large degrees Inf3 Computer Architecture
22 Reducing Cache Miss Penalty Technique 4: second level caches (L2) Gap between main memory and L1 cache speeds is increasing L2 makes main memory appear to be faster if it captures most of the L1 cache misses L1 miss penalty becomes L2 hit access time if hit in L2 L1 miss penalty higher if miss in L2 L2 considerations: Misses will be more frequent Higher associativity is possible On-chip (512KB - 1MB) or off-chip (1MB 4 MB) cycles access time Inf3 Computer Architecture
23 Second Level Caches Memory subsystem performance: Avg. mem. time = Hit time L1 + Miss rate L1 x Miss penalty L1 Miss penalty L1 = Hit time L2 + Miss rate L2 x Miss penalty L2 Avg. mem. time = Hit time L1 + Miss rate L1 x (Hit time L2 + Miss rate L2 x Miss penalty L2 ) Miss rates: Local: the number of misses divided by the number of requests to the cache E.g., Miss rate L1 and Miss rate L2 in the equations above Usually not so small for lower level caches Global: the number of misses divided by the total number of requests from the CPU E.g, L2 global miss rate = Miss rate L1 x Miss rate L2 Represents the aggregate effectiveness of the caches combined Inf3 Computer Architecture
24 Cache Misses vs. L2 size Global miss rate L2 local miss rate Miss rate (%) KB 8KB 16KB 32KB 64KB 128KB 256KB L2 size 512KB L2 caches must be much bigger than L1 Local miss rates for L2 are larger than for L1 and are not a good measure of overall performance 1MB 2MB 4MB H&P Fig Inf3 Computer Architecture
25 Reducing Cache Hit Time Technique 1: small and simple caches Small caches can be placed on-chip signals take a long time to go offchip Low associativity caches have few tags to compare against the requested data Direct mapped caches have only one tag to compare and comparison can be done in parallel with the fetch of the data Inf3 Computer Architecture
26 Reducing Cache Hit Time Technique 2: virtual address caches Programs use virtual addresses for data, while main memory uses physical addresses addresses from processor must be translated at some point Option 1: physical address caches perform address translation before cache access Hit time is increased to accommodate translation 0x0003 CPU address translation 0x2103 L1 cache tag data Main memory? Inf3 Computer Architecture
27 Reducing Cache Hit Time Technique 2: virtual address caches Option 2: virtual address caches perform address translation after cache access if miss Hit time does not include translation 0x0003 CPU L1 cache address translation 0x2103 Main memory tag data? Inf3 Computer Architecture
28 Reducing Cache Hit Time Problems of virtual address caches Programs may use the same virtual addresses, but different physical addresses Cache contents must be flushed on every context switch increase miss rate Cache tag must be extended with process identifier (PID) User programs and OS may use different virtual addresses for the same data: aliasing problem Same data structure may end up with two copies in the cache Inf3 Computer Architecture
29 Virtual Memory Each process would like to see its own, full, address space Clearly impossible to provide full physical memory for all processes Processes may define a large address space but use only a small part of it at any one time Processes would like their memory to be protected from access and modification by other processes The operating system needs to be protected from applications Each process has its own Virtual Address Space, divided into fixed-sized pages Virtual pages that are in use get mapped to pages of physical memory. Virtual pages not recently used may be stored on disk Extends the memory hierarchy out to the swap partition of a disk Inf3 Computer Architecture
30 Virtual and Physical Memory Example 4K page size Process 1 has pages A, B, C and D Page B is held on disk Virtual memory (process 1) 0 4K 8K A B Physical memory 0 4K 8K Y D Virtual memory (process 2) 0 4K 8K X Y Process 2 has pages X, Y, Z Page Z is held on disk 12K 16K 20K 24K C 12K 16K 20K A C X 12K 16K 20K 24K Z Process 1 cannot access pages X, Y, Z Process 2 cannot access page A, B, C, D 28K 32K 36K 36K D B Z page swapping 28K 32K 36K 36K O/S can access any page (full privileges) Swap disk Inf3 Computer Architecture
31 Sharing memory using Virtual Aliases Process 1 and Process 2 want to share a page of memory Process 1 maps virtual page A to physical page P Process 2 maps virtual page Z to physical page P Virtual memory (process 1) 0 4K 8K 12K 16K A B Physical memory 0 4K 8K 12K 16K P Q Virtual memory (process 2) 0 4K 8K 12K 16K Permissions can vary between the sharing processors. 20K 24K 28K C 20K page swapping 20K 24K 28K Z O/S can still access any page (full privileges) 32K 36K 32K 36K Note: Process 1 can also map the same physical page at multiple virtual addresses!! 36K P Shared page Swap disk 36K Q Aliased within one process Inf3 Computer Architecture
32 Typical Virtual Memory Parameters parameter L1 cache virtual memory block/page bytes 4KB-64KB hit time 1-3 cycles cycles miss penalty cycles 1M-10M cycles access time cycles 800K-8M cycles transfer time 2-20 cycles 200K-2M cycles miss rate % % size 256KB-1MB 64MB-16GB H&P Fig Virtual Memory miss is called a page fault Page size is usually fixed, but some systems use variable size segments Inf3 Computer Architecture
33 Virtual Memory Policies Block replacement: choosing a page frame to reuse Minimize misses (page fault) LRU policy Minimize write backs to disk give priority to non-modified pages Write strategy: policy adopted on a write Write-through would mean writing the cache block back to disk whenever the page is updated in main memory not practical Write-back policy is always used (with Dirty or modified bit in page table) Some systems use one Dirty bit per block in the page to minimize data writes back to disk Inclusivity: Inclusive would mean having a copy of all used pages in disk too expensive Memory and Disk are non-inclusive in all systems Inf3 Computer Architecture
34 Virtual Memory Policies Block placement: location of page in memory More freedom lower miss rates, higher hit and miss penalties Memory access time is already high and memory miss penalty (disk access time) is huge low miss rates Full associativity virtual page can be located in any page frame Important to reduce time to find a page in memory (hit time) To place new pages in memory, OS maintains a list of free frames Block placement may be constrained by use of translated virtual address bits when indexing the cache (see later) Inf3 Computer Architecture
35 Virtual Memory Policies Block identification: finding the correct page frame Assigning tags to memory page frames and comparing tags is impractical OS maintains a table that maps all virtual pages to page frames: Page Table Table is updated with a new mapping every time a virtual page is allocated a page frame Table is accessed on a memory request to translate virtual to physical address inefficient The number of entries in the table is the number of virtual pages very large (e.g., with 4Kbyte pages, it has 2 20 =1M entries for a 32 bit address space and 2 52 entries for a 64 bit address space) Page frame number is used to generate the physical address during address translation One Page Table per process Inf3 Computer Architecture
36 Page Tables and Address Translation Page table contains a translation for all virtual pages Page Table Address Register One page table for each process, and one for the system Each page has specific access permissions Virtual Page Number PageOffset Virtual Address Read permission Write permission Disk? Physical Page Number Permissions Valid Execute permission Bit indicates if page is on disk, in which case Physical Page Number indicates location within swap file 0 r-w-x 1 Page Table Page table can be very large, so is often itself stored in virtual memory of the operating system, and large parts may be swapped out CPU needs a cache of recently Physical Page Number PageOffset Physical Address used Page Table Entries (PTEs) Inf3 Computer Architecture
37 Translation Look-aside Buffers Typically a small, fullyassociative cache of Page Table Entries (PTE) Tag given by VPN for that PTE PPN taken from PTE Valid bit required D bit (dirty) indicates whether page has been modified R, W, X bits indicate Read, Write and Execute permission Permissions are checked on every memory access Physical address formed from PPN and Page Offset TLB Exceptions: TLB miss (no matching entry) Privilege violation Often separate TLBs for Instruction and Data references V TLB hit D R W X Virtual Page Number = = = = = Tag Physical Address PageOffset Virtual Address Physical Page Number Physical Page Number PageOffset Inf3 Computer Architecture
38 Problems with virtual aliases and caches Virtually tagged data cache problems: Page aliases appear to be at different addresses Two copies could exist in the same data cache Writing to copy 1 would not be reflected in copy 2 Reading copy 2 would get stale data Does not provide a coherent view of memory Solution: Use Physical address tags Aliases have same physical address, therefore same tag Only one copy exists in each cache Implications for CPU-cache interactions: Must translate addresses before cache tag check May still be able to index cache using non-translated low-order address bits under certain circumstances. Inf3 Computer Architecture
39 VI-PT: translating in parallel with L1-$ access If translation takes place before L1-$ access, then hit time will increase TLB and L1-$ often arranged to allow parallel TLB and L1-$ access Requires that L1-$ index can be obtained from the non-translated bits of the virtual address. This places a limit of one page on the capacity of one way of the cache Virtual Page Number PageOffset Index Offset Virtual Address TLB hit TLB Tag Comparison Cache hit L1-$ 4 KB D-M 32-byte line IMPORTANT: If the cache Index extends beyond bit 11, into the translated part of the address, then translation must take place before the cache can be indexed Inf3 Computer Architecture
40 Coping with large VI-PT caches Rely on page allocator in the O/S to allocate pages such that the translation of index bits would always be an identity relation Hence, if virtual address A translates to physical address P, then Page Allocator must guarantee that: V[11] == P[11] Cache tag bits Index Offset Cache addressing Virtual Page Number Page offset Virtual addressing Any translated bit used to index the cache must be identical in both the Virtual and Physical addresses Inf3 Computer Architecture
41 Putting it together: TLBs in the pipeline Two TLBs, one for Instructions and one for Data, located in IF and MEM respectively Each may generate TLB exceptions (effectively interrupts) TLB exceptions must be re-startable (kill instruction, load TLB entry, restart instruction) Tag check now involves translated address, and can be delayed to next stage IF DEC EX MEM WB I-TLB Exception I-Cache Hit / Miss EX D-TLB Exception D-Cache Hit / Miss MEM WB MEM WB WB PC Instruction TLB Virtual Addr Physical Addr L1 I-cache Tag (s) I-Tag check Register File ALU Virtual Addr Data TLB Physical Addr L1 D-Cache D-Tag check Read Address Read Data Instruction (s) Read Addr 0 Read Addr 1 Read Data 0 Read Data 1 Address Write data Read data Write Data Write Addr Inf3 Computer Architecture
42 When to perform address translation? VI-VT : Virtually indexed, virtually tagged L1-$ indexed with virtual address, before translation, tag contains virtual address Con: Cannot distinguish virtual aliases or synonyms in cache Pro: Only perform TLB lookup on L1-$ miss VI-PT : Virtually indexed, physically tagged L1-$ indexed with virtual address, or often just the un-translated bits Translation must take place before tag can be checked Con: Translation must take place on every L1-$ access Pro: No aliases in the cache; works with cache-coherent shared memory PI-PT : Physically indexed, physically tagged Translation first; then cache access Con: Translation occurs in sequence with L1-$ access high latency PI-VT : Physically indexed, virtually tagged Not interesting Inf3 Computer Architecture
43 Cache Performance Techniques technique miss rate miss penalty hit time complexity large block size high associativity victim cache hardware prefetch compiler prefetch compiler optimizations priorisation of reads critical word first nonblocking caches L2 caches small and simple caches virtual caches Inf3 Computer Architecture
Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationVirtual Memory. Motivation:
Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationLecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time
Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationAdvanced optimizations of cache performance ( 2.2)
Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped
More informationMemory Hierarchy 3 Cs and 6 Ways to Reduce Misses
Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction
More informationCache performance Outline
Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time
More informationMemory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache
Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:
More informationDECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations
DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationTypes of Cache Misses: The Three C s
Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationImproving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion
Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationCPE 631 Lecture 06: Cache Design
Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004
More informationAleksandar Milenkovich 1
Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCMSC 611: Advanced Computer Architecture. Cache and Memory
CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationIntroduction to OpenMP. Lecture 10: Caches
Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014
More informationChapter-5 Memory Hierarchy Design
Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More information10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache
Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is
More informationChapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,
Chapter 2 (cont) Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Improving Cache Performance Average mem access time = hit time + miss rate * miss penalty speed up
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationCache Memory: Instruction Cache, HW/SW Interaction. Admin
Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Mark Bull David Henty EPCC, University of Edinburgh Overview Why caches are needed How caches work Cache design and performance. 2 1 The memory speed gap Moore s Law: processors
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationSE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationMemory Hierarchy Design. Chapter 5
Memory Hierarchy Design Chapter 5 1 Outline Review of the ABCs of Caches (5.2) Cache Performance Reducing Cache Miss Penalty 2 Problem CPU vs Memory performance imbalance Solution Driven by temporal and
More informationMEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming
MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationModern Computer Architecture
Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap
More informationAssignment 1 due Mon (Feb 4pm
Announcements Assignment 1 due Mon (Feb 19) @ 4pm Next week: no classes Inf3 Computer Architecture - 2017-2018 1 The Memory Gap 1.2x-1.5x 1.07x H&P 5/e, Fig. 2.2 Memory subsystem design increasingly important!
More informationLECTURE 12. Virtual Memory
LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished
More informationLecture 11 Reducing Cache Misses. Computer Architectures S
Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationMain Memory (Fig. 7.13) Main Memory
Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCOSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT
COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationComputer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 17 Guest Lecturer: Shakir James Plan for Today Announcements and Reminders Project demos in three weeks (Nov. 23 rd ) Questions Today s discussion: Improving
More informationQ3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache
Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationMemory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy
Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Levels in
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationLec 12 How to improve cache performance (cont.)
Lec 12 How to improve cache performance (cont.) Homework assignment Review: June.15, 9:30am, 2000word. Memory home: June 8, 9:30am June 22: Q&A ComputerArchitecture_CachePerf. 2/34 1.2 How to Improve Cache
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS/ECE 3330 Computer Architecture. Chapter 5 Memory
CS/ECE 3330 Computer Architecture Chapter 5 Memory Last Chapter n Focused exclusively on processor itself n Made a lot of simplifying assumptions IF ID EX MEM WB n Reality: The Memory Wall 10 6 Relative
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationLogical Diagram of a Set-associative Cache Accessing a Cache
Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCS422 Computer Architecture
CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:
More information