INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
|
|
- Eugene Haynes
- 6 years ago
- Views:
Transcription
1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 14 Title: Cache Memory - Cache Performance Optimization Summary: Miss penalty reduction (multi-level caches, greater priority to reads than to writes, victim caches); Miss rate reduction (analysis of the misses, increase the block size, increase the cache capacity, increase of the associativity level and way prediction). 2010/2011 Nuno.Roma@ist.utl.pt
2 Architectures for Embedded Computing Cache Memory: Cache Performance Optimization Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 28 Previous Class In the previous class... Memory systems; Program access patterns; Cache memories: Operation principles; Internal organization; Cache management policies. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 28
3 Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 28 Summary Today: : Multi-level caches; Greater priority to reads than to writes; Victim caches; : Analysis of the misses; Increase the block size; Increase the cache capacity; Increase of the associativity level; Way prediction. Bibliography: Computer Architecture: a Quantitative Approach, Sections 5.2 and C.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 28
4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 28 Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28
5 Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Hit Time (t hit ): hardware designers make all their efforts so that the cache responds in a single clock cycle; Miss Rate (p miss ): maximization of the probability to find the requested data in cache; (t penalty ): upon a miss, minimize the required time to resolve it. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28
6 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 1.625T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28
7 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 1.625T Speedup = CP I A/CP I B = 2.25T/1.625T = Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Multi-Level Caches µp L1 Cache L2 Cache Primary Memory t access = t hit L1 + p missl1 t penaltyl1 t penaltyl1 = t hit L2 + p missl2 t penaltyl2 t access = t hit L1 + p missl1 (t hit L2 + p missl2 t penaltyl2 ) Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 28
8 Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28 Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Local is the same as global in L1 cache; Local miss-rate in L2 is usually high - the global miss-rate is a better measure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28
9 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = = 4% local miss rate L2 = = 50% global miss rate L2 = = 2% Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28
10 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = = 4% local miss rate L2 = = 50% global miss rate L2 = = 2% Memory Mean Access Time = = hit time L1 + miss rate L1 (hit time L2 + miss rate L2 miss penalty L2 ) = 1 + 4% ( % 200) = 1 + 4% 110 = 5.4 clock cycles Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 L2 Cache Configuration Variation of the miss rate with L2 cache capacity (L1 with 64kB): Capacity of L2 greater than L1; For greater capacities of L2, the global miss rate is similar to the one that would be obtained with a single (and a lot more expensive!) L1 cache, with the same size. Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 28
11 L2 Cache Configuration Variation of the relative execution time with L2 cache capacity and L2 hit time: Hit time is not critical; More complex cache, to minimize the miss rate. Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 28 Coherency Between Memory and Cache Sources of Incoherence: other devices that may also change the memory positions (DMA, I/O controllers, other processors, etc.). Typically, they only change the primary memory, in order to not interfere with the processor accesses to the cache. Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 28
12 Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches: Only operate over restricted areas of the addressing space (defined by configuration); Non stored areas: Input/output buffers; Communication buffers between processors. Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28
13 Coherency Solutions Selective caches; Shared caches (between all agents): The several agents do not directly access the main memory - instead, they access the cache: Greater contention to access the cache; Increase of the cache miss rate. Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols: Bus Snooping: The cache controller checks all writes in primary memory and invalidates those cache positions that were modified at memory; Reads from primary memory positions corresponding to cache blocks with updated values imply the copy of such values into memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28
14 Data Coherency Between Caches Inclusion: L2 always has all the data stored in L1: It is only necessary to check L2, in order to invalidate both caches; Implies the adoption of blocks with the same size or extra hardware to search for sub-blocks. Exclusion: each block is never simultaneously stored in both caches: Optimization of cache memory occupation; Miss in L1 leads to a swap of the block between L1 and L2. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 28 Loading Policy Blocking: may have a significant impact in the miss penalty. Non-Blocking: reduces the current miss penalty, but may have a serious impact in subsequent misses: Early Restart Critical Word First Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 28
15 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28
16 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Example: SW R5,384(R0) ; M[384] R5 SW R3,512(R0) ; M[512] R3 LW R1,1024(R0) ; R1 M[1024] LW R2,512(R0) ; R2 M[512] Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Victim Cache Instead of completely discarding each block when it has to be replaced, temporarily keep it in a victim buffer. Rather than stalling on a subsequent cache miss, the contents of the buffer are checked on a subsequent miss to see if they have the desired data before going to the next lower-level memory. Small cache (e.g.: 4 to 16 positions); Fully associative; Particularly efficient for small direct mapped caches (more than 25% reduction of the miss rate in a 4kB cache). Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 28
17 Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 28 of the Miss Rate Miss classification: Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28
18 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28
19 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Conflict: occur due to the adopted placement strategy (direct mapped or n-set associative). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 Distribution of the Miss Rate Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 28
20 Minimizing each Type of Miss Compulsory: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28
21 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28
22 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Solution: Increase the associativity level Common objective: Try not to increase t penalty, but more importantly, not to increase t hit! Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28
23 Increase the Block Size Takes advantage of spatial locality. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28 Increase the Block Size Takes advantage of spatial locality. But: The loading of the block may increase the miss penalty; May also increase the capacity and conflict miss rates. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28
24 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28
25 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Solution: Use greater caches in the upper levels. Current L2 caches have the same capacity as those that were used about 10 years ago! Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Associativity Level of the conflict misses. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28
26 Increase the Associativity Level of the conflict misses. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28 Way Prediction and Pseudo-Associative Caches Way Prediction: extra bits are kept in the cache to predict the way of the next cache access. it is only necessary to compare the tag field. Pseudo-Associative Caches: upon a miss, certain direct mapped caches try a second block to find the desired address. Typically, this second block is obtained by inverting one bit of the index field. Several possible values for the hit time: Hit time, considering a correct prediction, t hit correct Hit time, considering an incorrect prediction, t hit incorrect Miss penalty time, t penalty Objective: t hit correct < t hit incorrect t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 28
27 Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 28 Code optimization: Data access; Program access; of miss penalty with parallel techniques: Pre-Fetching; Non-blocking caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 28
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 07
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04
More informationImproving Cache Performance
Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationImproving Cache Performance
Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor
More informationHigh Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 26 Cache Optimization Techniques (Contd.) (Refer
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationQ3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache
Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper
More informationCS422 Computer Architecture
CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks
More informationCACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás
CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationComputer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 17 Guest Lecturer: Shakir James Plan for Today Announcements and Reminders Project demos in three weeks (Nov. 23 rd ) Questions Today s discussion: Improving
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationCache in a Memory Hierarchy
Cache in a Memory Hierarchy Paulo J. D. Domingues Curso de Especialização em Informática Universidade do Minho pado@clix.pt Abstract: In the past two decades, the steady increase on processor performance
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationLecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin
Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationLecture 11 Reducing Cache Misses. Computer Architectures S
Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationMemory Hierarchy 3 Cs and 6 Ways to Reduce Misses
Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationCS222: Cache Performance Improvement
CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationMemory Hierarchy Basics
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationImproving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion
Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory
More informationChapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review
Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic
More informationLecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time
Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More informationCACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 will be released on Oct. 31 st This
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationLecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )
Lecture 13: Cache Hierarchies Today: cache access basics and innovations (Sections 5.1-5.2) 1 The Cache Hierarchy Core L1 L2 L3 Off-chip memory 2 Accessing the Cache Byte address 101000 Offset 8-byte words
More informationCACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Mar. 27 th This lecture Cache
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCOSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT
COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationName: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty
1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: ( 3 Pts) AMAT Hit time + Miss rate * Miss penalty Name and explain (briefly) one technique for each of the
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationDECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations
DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationPick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality
Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction
More informationCPE 631 Lecture 06: Cache Design
Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004
More informationEECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table
Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationAleksandar Milenkovich 1
Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationAdvanced Caching Techniques (2) Department of Electrical Engineering Stanford University
Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More information