Cache Performance (H&P 5.3; 5.5; 5.6)

Size: px
Start display at page:

Download "Cache Performance (H&P 5.3; 5.5; 5.6)"

Transcription

1 Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time + Average memory access time Avg. mem. time = Hit time + Miss rate x Miss penalty Memory performance eqn. Improving memory hierarchy performance: Decrease hit time Decrease miss rate Decrease miss penalty Inf3 Computer Architecture

2 Reducing Cache Miss Rates Cache miss classification: the three C s Compulsory misses (or cold misses): when a block is accessed for the first time Capacity misses: when a block is not in the cache because it was evicted because the cache was full Conflict misses: when a block is not in the cache because it was evicted because the cache set was full Inf3 Computer Architecture

3 Cache Misses vs. Cache Size Direct mapped H&P Fig way set associative Miss rate Conflict Capacity Cold Miss rate Conflict Capacity Cold 0 0 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB Cache size Cache size Miss rates are very small in practice Miss rates decrease significantly with cache size Miss rates decrease with set-associativity because of reduction in conflict misses Inf3 Computer Architecture

4 Reducing Cold Miss Rates Technique 1: Large block size Principle of locality other data in the block are likely to be used soon Reduce cold miss rate May increase conflict and capacity miss rate for the same cache size (fewer blocks in cache) Increase miss penalty because more data has be brought in each time Uses more memory bandwidth Inf3 Computer Architecture

5 Cache Misses vs. Block Size Miss rate KB 6 16KB 5 64KB KB B 32B 64B 128B 256B Block size Small caches are very sensitive to block size In all cases very large blocks (> 128B) have worse miss rate H&P Fig Inf3 Computer Architecture

6 Reducing Cold Miss Rates Technique 2: Prefetching Idea: bring into the cache (or a special buffer) ahead of time data or instructions that are likely to be used soon Reduce cold misses Uses more memory bandwidth May increase conflict and capacity miss rates (unless prefetch buffer is used) Does not increase miss penalty (prefetch is handled after main cache access is completed) Inf3 Computer Architecture

7 Prefetching Hardware prefetching: hardware automatically prefetches cache blocks on a cache miss No need for extra prefetching instructions in the program Effective for regular accesses, such as instructions E.g., next blocks prefetching, stride prefetching Inf3 Computer Architecture

8 Prefetching Software prefetching: compiler inserts instructions at proper places in the code to prefetch Requires new IS instructions for prefetching (nonbinding prefetch) Adds instructions to compute the prefetching addresses and to perform the prefetch itself (prefetch overhead) E.g., data prefetching in loops, linked list prefetching Inf3 Computer Architecture

9 Software Prefetching E.g., prefetching in loops: Brings the next required block, two iterations ahead of time (assuming each element of x is 4-bytes long and the block has 64 bytes). for (i=0; i<=999; i++) { for (i=0; i<=999; i++) { if (i%16 == 14) x[i] = x[i] + s; prefetch(x[i+16]); } x[i] = x[i] + s; } E.g, linked-list prefetching: Brings the next object in the list while (student) { student->mark = rand(); student = student->next; } while (student) { prefetch(student->next); student->mark = rand(); student=student->next; } Inf3 Computer Architecture

10 Reducing Conflict Miss Rates Technique 3: High associativity caches More options for block placement fewer conflicts Reduce conflict miss rate May increase hit access time because tag match takes longer May increase miss penalty because replacement policy is more involved Inf3 Computer Architecture

11 Cache Misses vs. Associativity Miss rate KB 16KB 64KB 512KB 0 1-way 2-way 4-way fully Associativity Small caches are very sensitive to associativity In all cases more associativity decreases miss rate, but little difference between 4-way and fully associative Inf3 Computer Architecture

12 Reducing Conflict Miss Rates Technique 4: Compiler optimizations E.g., merging arrays: improves spatial locality if the fields are used together for the same index int val[size]; int key[size]; E.g., loop fusion: improves temporal locality struct merge { int val; int key; }; Struct merge merged_array[size]; for (i=0; i<1000; i++) A[i] = A[i]+1; for (i=0; i<1000; i++) B[i] = B[i]+A[i]; for (i=0; i<1000; i++) { A[i] = A[i]+1; B[i] = B[i]+A[i]; } Inf3 Computer Architecture

13 Reducing Conflict Miss Rates E.g., blocking: change row-major and column-major array distributions to block distribution to improve spatial and temporal locality for (i=0; i<5; i++) for (j=0; j<5; j++) { r=0; for (k=0; k<5; k++) { r=r+y[i][k]*z[k][j]; x[i][j]=r; } x: y: z: (matrix multiplication x=y*z) i=0;j=0;0<k<5 i=0;j=1;0<k<5 i=1;j=0;0<k<5 Poor temporal locality Poor spatial and temporal locality Inf3 Computer Architecture

14 Reducing Conflict Miss Rates Loop Blocking or Tiling for (jj = 0; jj < 5; jj = jj+2) for (kk = 0; kk < 5; kk = kk+2) for (i = 0; i < 5; i++) for (j = jj; j < min(jj+2-1,5); j++) { r = 0; for (k = kk; k < min(kk+2-1,5); k++) r = r + y[i][k]*z[k][j]; x[i][j]= x[i][j] + r; } x: y: z: jj=0;kk=0;i=0;j=0;0<k<1 jj=0;kk=0;i=0;j=1;0<k<1 jj=0;kk=0;i=1;j=0;0<k<1 Better temporal locality Inf3 Computer Architecture

15 Cache Performance II Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. Avg. mem. time = Hit time + Miss rate x Miss penalty Memory performance eqn. Improving memory hierarchy performance: Decrease hit time Decrease miss rate Decrease miss penalty Inf3 Computer Architecture

16 Reducing Cache Miss Penalty Technique 1: Victim caches (Can also considered to reduce miss rate) Very small cache used to capture evicted lines from cache In case of cache miss the data may be found quickly in the victim cache (cache miss time < VC hit time < cache hit time) Replacement policy is much more involved CPU Memory address L1 cache Victim cache tag data Main memory tag data?? Inf3 Computer Architecture

17 Reducing Cache Miss Penalty Technique 2: giving priority to reads over writes The value of a read (load instruction) is likely to be used soon, while a write does not affect the processor Idea: place write misses in a write buffer, and let read misses overtake writes Reads to the same memory address of a pending write in the buffer now become hits in the buffer: sw 512(r0), r3 lw r2, 512(0) 1. write miss goes into write buffer 2. read hits in the write buffer and gets the value from the previous write memory address value 512 R[r3] write buffer Inf3 Computer Architecture

18 Reducing Cache Miss Penalty Technique 3: early restart and critical word first On a read miss processor will need just the loaded word (or byte) very soon, but processor has to wait until the whole block is brought into the cache Early restart: as soon as the requested word arrives in the cache, send it to the processor and then continue reading the rest of the block into the cache lw r2, 3(0) CPU 0x0003 L1 cache Main memory tag data ? Cache block Inf3 Computer Architecture

19 Reducing Cache Miss Penalty Technique 3: early restart and critical word first Critical word first: get the requested word first from the memory, send it asap to the processor and then continue reading the rest of the block into the cache lw r2, 3(0) CPU 0x0003 L1 cache Main memory tag data ? 03 Cache block Inf3 Computer Architecture

20 Reducing Cache Miss Penalty Technique 4: non-blocking (or lockup-free) caches Dynamic scheduling (Tomasulo s): ALU instructions can overtake a cache miss instruction Non-blocking caches: other memory instructions can also overtake a cache miss instruction Cache can service multiple hits while waiting on a miss: hit under miss More aggressive: cache can service multiple hits while waiting on multiple misses: miss under miss or hit under multiple misses Cache and memory must be able to service multiple requests concurrently Must keep track of multiple outstanding memory operations Increased hardware complexity Inf3 Computer Architecture

21 Non-blocking Caches H&P Fig Significant improvement from small degree of outstanding memory operations Some applications benefit from large degrees Inf3 Computer Architecture

22 Reducing Cache Miss Penalty Technique 4: second level caches (L2) Gap between main memory and L1 cache speeds is increasing L2 makes main memory appear to be faster if it captures most of the L1 cache misses L1 miss penalty becomes L2 hit access time if hit in L2 L1 miss penalty higher if miss in L2 L2 considerations: Misses will be more frequent Higher associativity is possible On-chip (512KB - 1MB) or off-chip (1MB 4 MB) cycles access time Inf3 Computer Architecture

23 Second Level Caches Memory subsystem performance: Avg. mem. time = Hit time L1 + Miss rate L1 x Miss penalty L1 Miss penalty L1 = Hit time L2 + Miss rate L2 x Miss penalty L2 Avg. mem. time = Hit time L1 + Miss rate L1 x (Hit time L2 + Miss rate L2 x Miss penalty L2 ) Miss rates: Local: the number of misses divided by the number of requests to the cache E.g., Miss rate L1 and Miss rate L2 in the equations above Usually not so small for lower level caches Global: the number of misses divided by the total number of requests from the CPU E.g, L2 global miss rate = Miss rate L1 x Miss rate L2 Represents the aggregate effectiveness of the caches combined Inf3 Computer Architecture

24 Cache Misses vs. L2 size Global miss rate L2 local miss rate Miss rate (%) KB 8KB 16KB 32KB 64KB 128KB 256KB L2 size 512KB L2 caches must be much bigger than L1 Local miss rates for L2 are larger than for L1 and are not a good measure of overall performance 1MB 2MB 4MB H&P Fig Inf3 Computer Architecture

25 Reducing Cache Hit Time Technique 1: small and simple caches Small caches can be placed on-chip signals take a long time to go offchip Low associativity caches have few tags to compare against the requested data Direct mapped caches have only one tag to compare and comparison can be done in parallel with the fetch of the data Inf3 Computer Architecture

26 Reducing Cache Hit Time Technique 2: virtual address caches Programs use virtual addresses for data, while main memory uses physical addresses addresses from processor must be translated at some point Option 1: physical address caches perform address translation before cache access Hit time is increased to accommodate translation 0x0003 CPU address translation 0x2103 L1 cache tag data Main memory? Inf3 Computer Architecture

27 Reducing Cache Hit Time Technique 2: virtual address caches Option 2: virtual address caches perform address translation after cache access if miss Hit time does not include translation 0x0003 CPU L1 cache address translation 0x2103 Main memory tag data? Inf3 Computer Architecture

28 Reducing Cache Hit Time Problems of virtual address caches Programs may use the same virtual addresses, but different physical addresses Cache contents must be flushed on every context switch increase miss rate Cache tag must be extended with process identifier (PID) User programs and OS may use different virtual addresses for the same data: aliasing problem Same data structure may end up with two copies in the cache Inf3 Computer Architecture

29 Virtual Memory Each process would like to see its own, full, address space Clearly impossible to provide full physical memory for all processes Processes may define a large address space but use only a small part of it at any one time Processes would like their memory to be protected from access and modification by other processes The operating system needs to be protected from applications Each process has its own Virtual Address Space, divided into fixed-sized pages Virtual pages that are in use get mapped to pages of physical memory. Virtual pages not recently used may be stored on disk Extends the memory hierarchy out to the swap partition of a disk Inf3 Computer Architecture

30 Virtual and Physical Memory Example 4K page size Process 1 has pages A, B, C and D Page B is held on disk Virtual memory (process 1) 0 4K 8K A B Physical memory 0 4K 8K Y D Virtual memory (process 2) 0 4K 8K X Y Process 2 has pages X, Y, Z Page Z is held on disk 12K 16K 20K 24K C 12K 16K 20K A C X 12K 16K 20K 24K Z Process 1 cannot access pages X, Y, Z Process 2 cannot access page A, B, C, D 28K 32K 36K 36K D B Z page swapping 28K 32K 36K 36K O/S can access any page (full privileges) Swap disk Inf3 Computer Architecture

31 Sharing memory using Virtual Aliases Process 1 and Process 2 want to share a page of memory Process 1 maps virtual page A to physical page P Process 2 maps virtual page Z to physical page P Virtual memory (process 1) 0 4K 8K 12K 16K A B Physical memory 0 4K 8K 12K 16K P Q Virtual memory (process 2) 0 4K 8K 12K 16K Permissions can vary between the sharing processors. 20K 24K 28K C 20K page swapping 20K 24K 28K Z O/S can still access any page (full privileges) 32K 36K 32K 36K Note: Process 1 can also map the same physical page at multiple virtual addresses!! 36K P Shared page Swap disk 36K Q Aliased within one process Inf3 Computer Architecture

32 Typical Virtual Memory Parameters parameter L1 cache virtual memory block/page bytes 4KB-64KB hit time 1-3 cycles cycles miss penalty cycles 1M-10M cycles access time cycles 800K-8M cycles transfer time 2-20 cycles 200K-2M cycles miss rate % % size 256KB-1MB 64MB-16GB H&P Fig Virtual Memory miss is called a page fault Page size is usually fixed, but some systems use variable size segments Inf3 Computer Architecture

33 Virtual Memory Policies Block replacement: choosing a page frame to reuse Minimize misses (page fault) LRU policy Minimize write backs to disk give priority to non-modified pages Write strategy: policy adopted on a write Write-through would mean writing the cache block back to disk whenever the page is updated in main memory not practical Write-back policy is always used (with Dirty or modified bit in page table) Some systems use one Dirty bit per block in the page to minimize data writes back to disk Inclusivity: Inclusive would mean having a copy of all used pages in disk too expensive Memory and Disk are non-inclusive in all systems Inf3 Computer Architecture

34 Virtual Memory Policies Block placement: location of page in memory More freedom lower miss rates, higher hit and miss penalties Memory access time is already high and memory miss penalty (disk access time) is huge low miss rates Full associativity virtual page can be located in any page frame Important to reduce time to find a page in memory (hit time) To place new pages in memory, OS maintains a list of free frames Block placement may be constrained by use of translated virtual address bits when indexing the cache (see later) Inf3 Computer Architecture

35 Virtual Memory Policies Block identification: finding the correct page frame Assigning tags to memory page frames and comparing tags is impractical OS maintains a table that maps all virtual pages to page frames: Page Table Table is updated with a new mapping every time a virtual page is allocated a page frame Table is accessed on a memory request to translate virtual to physical address inefficient The number of entries in the table is the number of virtual pages very large (e.g., with 4Kbyte pages, it has 2 20 =1M entries for a 32 bit address space and 2 52 entries for a 64 bit address space) Page frame number is used to generate the physical address during address translation One Page Table per process Inf3 Computer Architecture

36 Page Tables and Address Translation Page table contains a translation for all virtual pages Page Table Address Register One page table for each process, and one for the system Each page has specific access permissions Virtual Page Number PageOffset Virtual Address Read permission Write permission Disk? Physical Page Number Permissions Valid Execute permission Bit indicates if page is on disk, in which case Physical Page Number indicates location within swap file 0 r-w-x 1 Page Table Page table can be very large, so is often itself stored in virtual memory of the operating system, and large parts may be swapped out CPU needs a cache of recently Physical Page Number PageOffset Physical Address used Page Table Entries (PTEs) Inf3 Computer Architecture

37 Translation Look-aside Buffers Typically a small, fullyassociative cache of Page Table Entries (PTE) Tag given by VPN for that PTE PPN taken from PTE Valid bit required D bit (dirty) indicates whether page has been modified R, W, X bits indicate Read, Write and Execute permission Permissions are checked on every memory access Physical address formed from PPN and Page Offset TLB Exceptions: TLB miss (no matching entry) Privilege violation Often separate TLBs for Instruction and Data references V TLB hit D R W X Virtual Page Number = = = = = Tag Physical Address PageOffset Virtual Address Physical Page Number Physical Page Number PageOffset Inf3 Computer Architecture

38 Problems with virtual aliases and caches Virtually tagged data cache problems: Page aliases appear to be at different addresses Two copies could exist in the same data cache Writing to copy 1 would not be reflected in copy 2 Reading copy 2 would get stale data Does not provide a coherent view of memory Solution: Use Physical address tags Aliases have same physical address, therefore same tag Only one copy exists in each cache Implications for CPU-cache interactions: Must translate addresses before cache tag check May still be able to index cache using non-translated low-order address bits under certain circumstances. Inf3 Computer Architecture

39 VI-PT: translating in parallel with L1-$ access If translation takes place before L1-$ access, then hit time will increase TLB and L1-$ often arranged to allow parallel TLB and L1-$ access Requires that L1-$ index can be obtained from the non-translated bits of the virtual address. This places a limit of one page on the capacity of one way of the cache Virtual Page Number PageOffset Index Offset Virtual Address TLB hit TLB Tag Comparison Cache hit L1-$ 4 KB D-M 32-byte line IMPORTANT: If the cache Index extends beyond bit 11, into the translated part of the address, then translation must take place before the cache can be indexed Inf3 Computer Architecture

40 Coping with large VI-PT caches Rely on page allocator in the O/S to allocate pages such that the translation of index bits would always be an identity relation Hence, if virtual address A translates to physical address P, then Page Allocator must guarantee that: V[11] == P[11] Cache tag bits Index Offset Cache addressing Virtual Page Number Page offset Virtual addressing Any translated bit used to index the cache must be identical in both the Virtual and Physical addresses Inf3 Computer Architecture

41 Putting it together: TLBs in the pipeline Two TLBs, one for Instructions and one for Data, located in IF and MEM respectively Each may generate TLB exceptions (effectively interrupts) TLB exceptions must be re-startable (kill instruction, load TLB entry, restart instruction) Tag check now involves translated address, and can be delayed to next stage IF DEC EX MEM WB I-TLB Exception I-Cache Hit / Miss EX D-TLB Exception D-Cache Hit / Miss MEM WB MEM WB WB PC Instruction TLB Virtual Addr Physical Addr L1 I-cache Tag (s) I-Tag check Register File ALU Virtual Addr Data TLB Physical Addr L1 D-Cache D-Tag check Read Address Read Data Instruction (s) Read Addr 0 Read Addr 1 Read Data 0 Read Data 1 Address Write data Read data Write Data Write Addr Inf3 Computer Architecture

42 When to perform address translation? VI-VT : Virtually indexed, virtually tagged L1-$ indexed with virtual address, before translation, tag contains virtual address Con: Cannot distinguish virtual aliases or synonyms in cache Pro: Only perform TLB lookup on L1-$ miss VI-PT : Virtually indexed, physically tagged L1-$ indexed with virtual address, or often just the un-translated bits Translation must take place before tag can be checked Con: Translation must take place on every L1-$ access Pro: No aliases in the cache; works with cache-coherent shared memory PI-PT : Physically indexed, physically tagged Translation first; then cache access Con: Translation occurs in sequence with L1-$ access high latency PI-VT : Physically indexed, virtually tagged Not interesting Inf3 Computer Architecture

43 Cache Performance Techniques technique miss rate miss penalty hit time complexity large block size high associativity victim cache hardware prefetch compiler prefetch compiler optimizations priorisation of reads critical word first nonblocking caches L2 caches small and simple caches virtual caches Inf3 Computer Architecture

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Virtual Memory. Motivation:

Virtual Memory. Motivation: Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

Cache performance Outline

Cache performance Outline Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time

More information

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies

More information

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance: #1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:

More information

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Types of Cache Misses: The Three C s

Types of Cache Misses: The Three C s Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

CPE 631 Lecture 06: Cache Design

CPE 631 Lecture 06: Cache Design Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

CMSC 611: Advanced Computer Architecture. Cache and Memory

CMSC 611: Advanced Computer Architecture. Cache and Memory CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Introduction to OpenMP. Lecture 10: Caches

Introduction to OpenMP. Lecture 10: Caches Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

Chapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Chapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Chapter 2 (cont) Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Improving Cache Performance Average mem access time = hit time + miss rate * miss penalty speed up

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Cache Memory: Instruction Cache, HW/SW Interaction. Admin Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 Performance Workshop Mark Bull David Henty EPCC, University of Edinburgh Overview Why caches are needed How caches work Cache design and performance. 2 1 The memory speed gap Moore s Law: processors

More information

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Virtual Memory. Virtual Memory

Virtual Memory. Virtual Memory Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple

More information

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Memory Hierarchy Design. Chapter 5

Memory Hierarchy Design. Chapter 5 Memory Hierarchy Design Chapter 5 1 Outline Review of the ABCs of Caches (5.2) Cache Performance Reducing Cache Miss Penalty 2 Problem CPU vs Memory performance imbalance Solution Driven by temporal and

More information

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Assignment 1 due Mon (Feb 4pm

Assignment 1 due Mon (Feb 4pm Announcements Assignment 1 due Mon (Feb 19) @ 4pm Next week: no classes Inf3 Computer Architecture - 2017-2018 1 The Memory Gap 1.2x-1.5x 1.07x H&P 5/e, Fig. 2.2 Memory subsystem design increasingly important!

More information

LECTURE 12. Virtual Memory

LECTURE 12. Virtual Memory LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished

More information

Lecture 11 Reducing Cache Misses. Computer Architectures S

Lecture 11 Reducing Cache Misses. Computer Architectures S Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory

More information

Main Memory (Fig. 7.13) Main Memory

Main Memory (Fig. 7.13) Main Memory Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 17 Guest Lecturer: Shakir James Plan for Today Announcements and Reminders Project demos in three weeks (Nov. 23 rd ) Questions Today s discussion: Improving

More information

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Levels in

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Lec 12 How to improve cache performance (cont.)

Lec 12 How to improve cache performance (cont.) Lec 12 How to improve cache performance (cont.) Homework assignment Review: June.15, 9:30am, 2000word. Memory home: June 8, 9:30am June 22: Q&A ComputerArchitecture_CachePerf. 2/34 1.2 How to Improve Cache

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

CS/ECE 3330 Computer Architecture. Chapter 5 Memory CS/ECE 3330 Computer Architecture Chapter 5 Memory Last Chapter n Focused exclusively on processor itself n Made a lot of simplifying assumptions IF ID EX MEM WB n Reality: The Memory Wall 10 6 Relative

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Logical Diagram of a Set-associative Cache Accessing a Cache

Logical Diagram of a Set-associative Cache Accessing a Cache Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:

More information