INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Size: px
Start display at page:

Download "INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing"

Transcription

1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 14 Title: Cache Memory - Cache Performance Optimization Summary: Miss penalty reduction (multi-level caches, greater priority to reads than to writes, victim caches); Miss rate reduction (analysis of the misses, increase the block size, increase the cache capacity, increase of the associativity level and way prediction). 2010/2011 Nuno.Roma@ist.utl.pt

2 Architectures for Embedded Computing Cache Memory: Cache Performance Optimization Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 28 Previous Class In the previous class... Memory systems; Program access patterns; Cache memories: Operation principles; Internal organization; Cache management policies. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 28

3 Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 28 Summary Today: : Multi-level caches; Greater priority to reads than to writes; Victim caches; : Analysis of the misses; Increase the block size; Increase the cache capacity; Increase of the associativity level; Way prediction. Bibliography: Computer Architecture: a Quantitative Approach, Sections 5.2 and C.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 28

4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 28 Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28

5 Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Hit Time (t hit ): hardware designers make all their efforts so that the cache responds in a single clock cycle; Miss Rate (p miss ): maximization of the probability to find the requested data in cache; (t penalty ): upon a miss, minimize the required time to resolve it. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28

6 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 1.625T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28

7 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T (1T T ) = 0.5T T = 1.625T Speedup = CP I A/CP I B = 2.25T/1.625T = Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Multi-Level Caches µp L1 Cache L2 Cache Primary Memory t access = t hit L1 + p missl1 t penaltyl1 t penaltyl1 = t hit L2 + p missl2 t penaltyl2 t access = t hit L1 + p missl1 (t hit L2 + p missl2 t penaltyl2 ) Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 28

8 Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28 Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Local is the same as global in L1 cache; Local miss-rate in L2 is usually high - the global miss-rate is a better measure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28

9 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = = 4% local miss rate L2 = = 50% global miss rate L2 = = 2% Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28

10 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = = 4% local miss rate L2 = = 50% global miss rate L2 = = 2% Memory Mean Access Time = = hit time L1 + miss rate L1 (hit time L2 + miss rate L2 miss penalty L2 ) = 1 + 4% ( % 200) = 1 + 4% 110 = 5.4 clock cycles Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 L2 Cache Configuration Variation of the miss rate with L2 cache capacity (L1 with 64kB): Capacity of L2 greater than L1; For greater capacities of L2, the global miss rate is similar to the one that would be obtained with a single (and a lot more expensive!) L1 cache, with the same size. Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 28

11 L2 Cache Configuration Variation of the relative execution time with L2 cache capacity and L2 hit time: Hit time is not critical; More complex cache, to minimize the miss rate. Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 28 Coherency Between Memory and Cache Sources of Incoherence: other devices that may also change the memory positions (DMA, I/O controllers, other processors, etc.). Typically, they only change the primary memory, in order to not interfere with the processor accesses to the cache. Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 28

12 Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches: Only operate over restricted areas of the addressing space (defined by configuration); Non stored areas: Input/output buffers; Communication buffers between processors. Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28

13 Coherency Solutions Selective caches; Shared caches (between all agents): The several agents do not directly access the main memory - instead, they access the cache: Greater contention to access the cache; Increase of the cache miss rate. Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols: Bus Snooping: The cache controller checks all writes in primary memory and invalidates those cache positions that were modified at memory; Reads from primary memory positions corresponding to cache blocks with updated values imply the copy of such values into memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28

14 Data Coherency Between Caches Inclusion: L2 always has all the data stored in L1: It is only necessary to check L2, in order to invalidate both caches; Implies the adoption of blocks with the same size or extra hardware to search for sub-blocks. Exclusion: each block is never simultaneously stored in both caches: Optimization of cache memory occupation; Miss in L1 leads to a swap of the block between L1 and L2. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 28 Loading Policy Blocking: may have a significant impact in the miss penalty. Non-Blocking: reduces the current miss penalty, but may have a serious impact in subsequent misses: Early Restart Critical Word First Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 28

15 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28

16 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Example: SW R5,384(R0) ; M[384] R5 SW R3,512(R0) ; M[512] R3 LW R1,1024(R0) ; R1 M[1024] LW R2,512(R0) ; R2 M[512] Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Victim Cache Instead of completely discarding each block when it has to be replaced, temporarily keep it in a victim buffer. Rather than stalling on a subsequent cache miss, the contents of the buffer are checked on a subsequent miss to see if they have the desired data before going to the next lower-level memory. Small cache (e.g.: 4 to 16 positions); Fully associative; Particularly efficient for small direct mapped caches (more than 25% reduction of the miss rate in a 4kB cache). Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 28

17 Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 28 of the Miss Rate Miss classification: Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28

18 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28

19 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Conflict: occur due to the adopted placement strategy (direct mapped or n-set associative). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 Distribution of the Miss Rate Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 28

20 Minimizing each Type of Miss Compulsory: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

21 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

22 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Solution: Increase the associativity level Common objective: Try not to increase t penalty, but more importantly, not to increase t hit! Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

23 Increase the Block Size Takes advantage of spatial locality. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28 Increase the Block Size Takes advantage of spatial locality. But: The loading of the block may increase the miss penalty; May also increase the capacity and conflict miss rates. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28

24 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28

25 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Solution: Use greater caches in the upper levels. Current L2 caches have the same capacity as those that were used about 10 years ago! Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Associativity Level of the conflict misses. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28

26 Increase the Associativity Level of the conflict misses. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28 Way Prediction and Pseudo-Associative Caches Way Prediction: extra bits are kept in the cache to predict the way of the next cache access. it is only necessary to compare the tag field. Pseudo-Associative Caches: upon a miss, certain direct mapped caches try a second block to find the desired address. Typically, this second block is obtained by inverting one bit of the index field. Several possible values for the hit time: Hit time, considering a correct prediction, t hit correct Hit time, considering an incorrect prediction, t hit incorrect Miss penalty time, t penalty Objective: t hit correct < t hit incorrect t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 28

27 Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 28 Code optimization: Data access; Program access; of miss penalty with parallel techniques: Pre-Fetching; Non-blocking caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 28

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 07

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 26 Cache Optimization Techniques (Contd.) (Refer

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 17 Guest Lecturer: Shakir James Plan for Today Announcements and Reminders Project demos in three weeks (Nov. 23 rd ) Questions Today s discussion: Improving

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Cache in a Memory Hierarchy

Cache in a Memory Hierarchy Cache in a Memory Hierarchy Paulo J. D. Domingues Curso de Especialização em Informática Universidade do Minho pado@clix.pt Abstract: In the past two decades, the steady increase on processor performance

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Lecture 11 Reducing Cache Misses. Computer Architectures S

Lecture 11 Reducing Cache Misses. Computer Architectures S Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

CS222: Cache Performance Improvement

CS222: Cache Performance Improvement CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 will be released on Oct. 31 st This

More information

Topic 18: Virtual Memory

Topic 18: Virtual Memory Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections ) Lecture 13: Cache Hierarchies Today: cache access basics and innovations (Sections 5.1-5.2) 1 The Cache Hierarchy Core L1 L2 L3 Off-chip memory 2 Accessing the Cache Byte address 101000 Offset 8-byte words

More information

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Mar. 27 th This lecture Cache

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Name: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty

Name: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: ( 3 Pts) AMAT Hit time + Miss rate * Miss penalty Name and explain (briefly) one technique for each of the

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

CPE 631 Lecture 06: Cache Design

CPE 631 Lecture 06: Cache Design Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004

More information

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont.   History Table. Correlating Prediction Table Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance: #1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information