Cache Impact on Program Performance. T. Yang. UCSB CS240A. 2017
|
|
- Clare Young
- 5 years ago
- Views:
Transcription
1 Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017
2 Multi-level cache in computer systems Topics Performance analysis for multi-level cache Cache performance optimization through program transformation Processor Datapath Control L1 cache L2 Cache L3 cache Caching Main meory Disk
3 3
4 Cache misses and data access time D 0 : total memory data accesses. D 1 : missed access at L1. m 1 local miss ratio of L1: m 1 = D 1 /D 0 D 2 : missed access at L2. m 2 local miss ratio of L2: m 2 = D 2 /D 1 D 3 : missed access at L3 m 3 local miss ratio of L2: m 3 = D 3 /D 2 Memory and cache access time: δ i : access time at cache level i δ mem : access time in memory. Average access time = total time/d 0 = δ 1 + m 1 *penalty CPU
5 Average memory access time (AMAT) AMAT= Data found in L1 Data found in L2, L3 or memory δ 1 + m 1 *penalty ~2 cycles
6 Total data access time Average time = Data found in L2 Data found in L3 or memory δ 1 + m 1 [δ 2 + m 2 Penalty] ~2 cycles
7 Total data access time Found in L3 or memory Average time = δ 1 + m 1 [δ 2 + m 2 Penalty] ~2 cycles
8 Total data access time No L3. Found in memory Average time = δ 1 + m 1 [δ 2 + m 2 δ mem ] ~2 cycles ~10 cycles ~ cycles
9 Total data access time Found in L3 Found in memory Average memory access time (AMAT)= δ 1 + m 1 [δ 2 + m 2 [δ 3 + m 3 δ mem ]]
10 Local vs. Global Miss Rates Local miss rate the fraction of references to one level of a cache that miss. For example, m 2 = D 2 /D 1 Notice total_l2_accesses is L1 Misses Global miss rate the fraction of references that miss in all levels of a multilevel cache Global L2 miss rate = D 2 /D 0 L2$ local miss rate >> than the global miss rate Notice Global L2 miss rate = D 2 /D 0 = D 1 /D 0 * D 2 /D 1 = m 1 m 2 10
11 Global miss rate L1 Cache: 32KB I$, 32KB D$ L2 Cache: 256 KB L3 Cache: 4 MB 10/4/17 Fall Lecture 11
12 Average memory access time with no L3 cache δ 1 + m 1 [δ 2 + m 2 δ mem ] AMAT = = δ 1 + m 1 δ 2 + m 1 m 2 δ mem = δ 1 + m 1 δ 2 + GMiss 2 δ mem
13 Average memory access time with L3 cache δ 1 + m 1 [δ 2 + m 2 [δ 3 + m 3 δ mem ]] AMAT = = δ 1 + m 1 δ 2 + m 1 m 2 δ 3 +m 1 m 2 m 3 δ mem = δ 1 + m 1 δ 2 + GMiss 2 δ 3 + GMiss 3 δ mem
14 Example What is average memory access time?
15 Example What is the average memory access time with L1, L2, and L3?
16 Example
17 Cache-aware Programming Reuse values in cache as much as possible exploit temporal locality in program Example 1: Y[2] is revisited continously For i=1 to n y[2]=y[2]+3 Example 2 with access sequence: Y[2] is revisited after a few instructions later
18 Cache-aware Programming Take advantage of better bandwidth by getting a chunk of memory to cache and use whole chunk Exploit spatial locality in program For i=1 to n y[i]=y[i]+3 Visiting Y[1] benefits next access of Y[2] 4000 Y[0] Y[1] Y[2]] Y[3] Y[4] Y[31] 32-Byte Cache Block Tag Memory 18
19 2D array layout in memory (just like 1D array) for(x = 0; x < 3; x++){ for(y = 0; y < 3; y++) { a[y][x]=0; // implemented as array[3*y+x]=0 } } àaccess order a[0][0], a[1][0], a[2][0], a[3][0]
20 Exploit spatial data locality via program rewriting: Example 1 Each cache block has 64 bytes. Cache has 128 bytes Program structure (data access pattern) char D[64][64]; Each row is stored in one cache line block Program 1 for (j = 0; j <64; j++) for (i = 0; i < 64; i++) D[i][j] = 0; 64*64 data byte access à What is cache miss rate? Program 2 for (i = 0; i < 64; i++) for (j = 0; j < 64; j++) D[i][j] = 0; What is cache miss rate?
21 Data Access Pattern and cache miss for (i = 0; i <64; j++) for (j = 0; j < 64; i++) D[i][j] = 0; 1 cache miss in one inner loop iteration Miss hit hit hit hit D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] j Cache block D[63,0] D[63,1]. D[63,63] 64 cache miss out of 64*64 access. There is spatial locality. Fetched cache block is used 64 times before swapping out (consecutive data access within the inner loop i
22 Memory layout and data access by block Memory layout of Char D[64][64] same as Char D[64*64] Data access order Memory layout of a program Program in 2D loop Miss hit hit hit hit D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] j D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] Cache block Cache block D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] i D[63,0] D[63,1]. D[63,63] 64 cache miss out of 64*64 access. D[63,0] D[63,1]. D[63,63] Cache block D[63,0] D[63,1]. D[63,63]
23 Data Locality and Cache Miss for (j = 0; j <64; j++) for (i = 0; i < 64; i++) D[i][j] = 0; 64 cache miss in one inner loop iteration D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] j D[63] D[63,0]. D[63,63] 100% cache miss There is no spatial locality. Fetched block is only used once before swapping out. i
24 Memory layout and data access by block Data access order of a program D[0,0] D[1,0]. D[63,0] D[0,1] D[1,1]. D[63,1] Cache block Cache block Memory layout D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] i Program in 2D loop j D[0,0] D[0,1]. D[0,63] D[1,0] D[1,1]. D[1,63] D[63] D[63,0]. D[63,63] D[0,63] D[1,63]. D[63,63] Cache block D[63,0] D[63,1]. D[63,63] 100% cache miss
25 Summary of Example 1: Loop interchange alters execution order and data access patterns Exploit more spatial locality in this case
26 Program rewriting example 2: cache blocking for better temporal locality Cache size = 8 blocks =128 bytes Cache block size =16 bytes, hosting 4 integers Program structure int A[64]; // sizeof(int)=4 bytes for (k = 0; k<repcount; k++) for (i = 0; i < 64; I +=stepsize) A[i] =A[i]+1 Analyze cache hit ratio when varying cache block size, or step size (stride distance)
27 Example 2: Focus on inner loop for (i = 0; i < 64; i +=stepsize) A[i] =A[i]+1 memory Cache block Data access order/index s Stepsize S=1 S=2 S=4 S=8 Step size or also called stride distance
28 Step size =2 for (i = 0; i < 64; I +=stepsize) A[i] =A[i]+1 //read, write A[i] Memory Cache block Data access order/index S=2 M/H H/H M/H H/H M/H H/H
29 Repeat many times for (k = 0; k<repcount; k++) for (i = 0; i < 64; I +=stepsize) A[i] =A[i]+1 //read, write A[i] Memory Cache block Data access order/index S=2 integers M/H H/H M/H H/H M/H H/H Array has 16 blocks. Inner loop accesses 32 elements, and fetches all 16 blocks. Each block is used as R/W/R/W. Cache size = 8 blocks and cannot hold all 16 blocks fetched.
30 Cache blocking to exploit temporal locality For (k=0; k=100; k++) for (i = 0;i <64;i+=S) A[i] =f(a[i]) K=0 to K=0 to Pink code block can be executed fitting into cache K=0 to K=1 K=2 K=3 Rewrite program with cache blocking
31 Rewrite a program loop for better cache usage Loop blocking (cache blocking) Rewrite as with blocksize=2 More general: Given for (i = 0; i < 64; i+=s) A[i] =f(a[i]) Rewrite as: for (bi = 0; bi<64; bi=bi+blocksize) for (i = bi; i<bi+ blocksize;i+=s) A[i] =f(a[i])
32 Example 2: Cache blocking for better performance For (k=0; k=100; k++) for (i = 0; i < 64; i=i+s) A[i] =f(a[i]) Rewrite as: For (k=0; k=100; k++) for (bi = 0; bi<64; bi=bi+blocksize) for (i = bi; i<bi+blocksize; i+=s) A[i] =f(a[i]) Look interchange for (bi = 0; bi<64; bi=bi+blocksize) For (k=0; k=100; k++) for (i = bi; i<bi+ blocksize; i+=s) A[i] =f(a[i]) Pink code block can be executed fitting into cache
33 Example 3: Matrix multiplication C=A*B Cij=Row Ai * Col Bj For i= 0 to n-1 For j= 0 to n-1 For k=0 to n-1 C[i][j] +=A[i][k]* B[k][j]
34 Example 3: matrix multiplication code 2D array implemented using 1D layout for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) C[i+j*n] += A[i+k*n]* B[k+j*n] 3 loop controls can interchange (C elements are modified independently with no dependence) Which code has better cache performance (faster)? for (j = 0; j < n; j++) for (k = 0; k < n; k++) for (i = 0; i < n; i++) C[i+j*n] += A[i+k*n]* B[k+j*n]
35 Example 3: matrix multiplication code 2D array implemented using 1D layout for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) C[i+j*n] += A[i+k*n]* B[k+j*n] 3 loop controls can interchange (C elements are modified independently with no dependence) Which code has better cache performance (faster)? -- Study impact of stride on inner most loop which does most computation for (j = 0; j < n; j++) for (k = 0; k < n; k++) for (i = 0; i < n; i++) C[i+j*n] += A[i+k*n]* B[k+j*n]
36 Example 4: Cache blocking for matrix transpose for (x = 0; x < n; x++) { for (y = 0; y < n; y++) { dst[y + x * n] = src[x + y * n]; } src } y dst x Rewrite code with cache blocking
37 Example 4: Cache blocking for matrix transpose for (x = 0; x < n; x++) { for (y = 0; y < n; y++) { dst[y + x * n] = src[x + y * n]; } } Rewrite code with cache blocking for (i = 0; i < n; i += blocksize) for (x = i; x < i+blocksize; ++x) for (j = 0; j < n; j += blocksize) for (y = j; y < j+blocksize; ++y) dst[y + x * n] = src[x + y * n];
CS422 Computer Architecture
CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 25: Multilevel Caches & Data Access Strategies Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationMemory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds
More informationToday Cache memory organization and operation Performance impact of caches
Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality
More informationCache memories are small, fast SRAM based memories managed automatically in hardware.
Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationCSCI 402: Computer Architectures. Performance of Multilevel Cache
CSCI 402: Computer Architectures Memory Hierarchy (5) Fengguang Song Department of Computer & Information Science IUPUI Performance of Multilevel Cache Main Memory CPU L1 cache L2 cache Given CPU base
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More information5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Movie Rental Store You have a huge warehouse with every movie ever made.
More informationAgenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories
Agenda Chapter 6 Cache Memories Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More informationCache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010
Cache Memories EL21 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 21 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of
More informationMemory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)
More informationGiving credit where credit is due
CSCE 23J Computer Organization Cache Memories Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce23j Giving credit where credit is due Most of slides for this lecture are based
More informationMemory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Cache Memory Organization and Access Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O
More informationCISC 360. Cache Memories Nov 25, 2008
CISC 36 Topics Cache Memories Nov 25, 28 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Cache memories are small, fast SRAM-based
More informationCS/ECE 250 Computer Architecture
Computer Architecture Caches and Memory Hierarchies Benjamin Lee Duke University Some slides derived from work by Amir Roth (Penn), Alvin Lebeck (Duke), Dan Sorin (Duke) 2013 Alvin R. Lebeck from Roth
More informationCache Memories October 8, 2007
15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache
More informationCache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,
More informationAdvanced optimizations of cache performance ( 2.2)
Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped
More informationSystems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations
Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache
More informationCS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 33 Caches CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Cache Performance Metrics Miss rate fraction of memory references not found in cache (misses
More informationDECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations
DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data
More informationToday. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,
Today Cache Memories CSci 2021: Machine Architecture and Organization November 7th-9th, 2016 Your instructor: Stephen McCamant Cache memory organization and operation Performance impact of caches The memory
More informationExample. How are these parameters decided?
Example How are these parameters decided? Comparing cache organizations Like many architectural features, caches are evaluated experimentally. As always, performance depends on the actual instruction mix,
More informationDenison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud
Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationAgenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)
7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationCS 61C: Great Ideas in Computer Architecture. Multilevel Caches, Cache Questions
CS 61C: Great Ideas in Computer Architecture Multilevel Caches, Cache Questions Instructor: Alan Christopher 7/14/2014 Summer 2014 -- Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction
More informationLast class. Caches. Direct mapped
Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationThe course that gives CMU its Zip! Memory System Performance. March 22, 2001
15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache
More informationCache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.
Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationCS222: Cache Performance Improvement
CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationCache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
5-23 The course that gies CMU its Zip! Topics Cache Memories Oct., 22! Generic cache memory organization! Direct mapped caches! Set associatie caches! Impact of caches on performance Cache Memories Cache
More informationCache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons
Cache Memories 15-213/18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, 2017 Today s Instructor: Phil Gibbons 1 Today Cache memory organization and operation Performance impact
More informationCache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory
5-23 The course that gies CMU its Zip! Cache Memories Oct, 2 Topics Generic cache memory organization Direct mapped caches Set associatie caches Impact of caches on performance Cache memories Cache memories
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.
Topics CISC 36 Cache Memories Dec, 29 ν Generic cache memory organization ν Direct mapped caches ν Set associatie caches ν Impact of caches on performance Cache Memories Cache memories are small, fast
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More informationHigh-Performance Parallel Computing
High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationWhy memory hierarchy
Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory,
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationCSCI-UA.0201 Computer Systems Organization Memory Hierarchy
CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile
More informationOptimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides
Optimizing Cache Performance in Matrix Multiplication UCSB CS240A, 2017 Modified from Demmel/Yelick s slides 1 Case Study with Matrix Multiplication An important kernel in many problems Optimization ideas
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationCarnegie Mellon. Cache Memories
Cache Memories Thanks to Randal E. Bryant and David R. O Hallaron from CMU Reading Assignment: Computer Systems: A Programmer s Perspec4ve, Third Edi4on, Chapter 6 1 Today Cache memory organiza7on and
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCache Memories. Lecture, Oct. 30, Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
Cache Memories Lecture, Oct. 30, 2018 1 General Cache Concept Cache 84 9 14 10 3 Smaller, faster, more expensive memory caches a subset of the blocks 10 4 Data is copied in block-sized transfer units Memory
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationCS 261 Fall Caching. Mike Lam, Professor. (get it??)
CS 261 Fall 2017 Mike Lam, Professor Caching (get it??) Topics Caching Cache policies and implementations Performance impact General strategies Caching A cache is a small, fast memory that acts as a buffer
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationMain Memory Supporting Caches
Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Cache Issues 1 Example cache block read
More informationCache Memory: Instruction Cache, HW/SW Interaction. Admin
Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Different Storage Memories Chapter 5 Large and Fast: Exploiting Memory
More informationLecture 2. Memory locality optimizations Address space organization
Lecture 2 Memory locality optimizations Address space organization Announcements Office hours in EBU3B Room 3244 Mondays 3.00 to 4.00pm; Thurs 2:00pm-3:30pm Partners XSED Portal accounts Log in to Lilliput
More informationCaches III. CSE 351 Winter Instructor: Mark Wyse
Caches III CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan https://what-if.xkcd.com/111/ Administrative Midterm
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron
Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron 1 Topics Cache memory organiza3on and opera3on Performance impact of caches 2 Cache Memories Cache memories are
More informationCarnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron
Cache Memories Computer Architecture Instructor: Norbert Lu1enberger based on the book by Randy Bryant and Dave O Hallaron 1 Today Cache memory organiza7on and opera7on Performance impact of caches The
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationIntroduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem
Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationLecture 11 Reducing Cache Misses. Computer Architectures S
Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17 Writing to caches: policies, performance Cache tradeoffs and
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationCACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Mar. 14 th : Homework 4 release (due on Mar. 27
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab 1 due today Reading: Chapter 5.1 5.3 2 1 Overview How to
More informationCSC266 Introduction to Parallel Computing using GPUs Optimizing for Caches
CSC266 Introduction to Parallel Computing using GPUs Optimizing for Caches Sreepathi Pai October 4, 2017 URCS Outline Cache Performance Recap Data Layout Reuse Distance Besides the Cache Outline Cache
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationMemory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache
Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies
More informationOutline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis
Memory Optimization Outline Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis Memory Hierarchy 1-2 ns Registers 32 512 B 3-10 ns 8-30 ns 60-250 ns 5-20
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationMemory Systems and Performance Engineering. Fall 2009
Memory Systems and Performance Engineering Fall 2009 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide illusion of fast larger memory
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More information