INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Similar documents
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Improving Cache Performance

Improving Cache Performance

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Caching Basics. Memory Hierarchies

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS422 Computer Architecture

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Computer Architecture Spring 2016

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

ECE331: Hardware Organization and Design

A Cache Hierarchy in a Computer System

Cache in a Memory Hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Lecture 11 Reducing Cache Misses. Computer Architectures S

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchies 2009 DAT105

CS222: Cache Performance Improvement

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Memory Hierarchy Basics

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

CS 136: Advanced Architecture. Review of Caches

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy. Slides contents from:

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

ECE232: Hardware Organization and Design

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Topic 18: Virtual Memory

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

UNIT I (Two Marks Questions & Answers)

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

LECTURE 11. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS3350B Computer Architecture

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Name: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CPE 631 Lecture 06: Cache Design

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Computer Architecture CS372 Exam 3

Aleksandar Milenkovich 1

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Improve performance by increasing instruction throughput

Lec 11 How to improve cache performance

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

ECE331: Hardware Organization and Design

Page 1. Memory Hierarchies (Part 2)

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

The University of Adelaide, School of Computer Science 13 September 2018

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Transcription:

UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 14 Title: Cache Memory - Cache Performance Optimization Summary: Miss penalty reduction (multi-level caches, greater priority to reads than to writes, victim caches); Miss rate reduction (analysis of the misses, increase the block size, increase the cache capacity, increase of the associativity level and way prediction). 2010/2011 Nuno.Roma@ist.utl.pt

Architectures for Embedded Computing Cache Memory: Cache Performance Optimization Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 28 Previous Class In the previous class... Memory systems; Program access patterns; Cache memories: Operation principles; Internal organization; Cache management policies. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 28

Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 28 Summary Today: : Multi-level caches; Greater priority to reads than to writes; Victim caches; : Analysis of the misses; Increase the block size; Increase the cache capacity; Increase of the associativity level; Way prediction. Bibliography: Computer Architecture: a Quantitative Approach, Sections 5.2 and C.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 28

Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 28 Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28

Caches: Objective Objective: minimize the memory mean access time, from the processor point of view. t access = t hit + p miss t penalty Hit Time (t hit ): hardware designers make all their efforts so that the cache responds in a single clock cycle; Miss Rate (p miss ): maximization of the probability to find the requested data in cache; (t penalty ): upon a miss, minimize the required time to resolve it. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28

Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T + 0.5 (1T + 0.1 25T ) = 0.5T + 0.5 3.5T = 2.25T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T + 0.5 (1T + 0.1 25T ) = 0.5T + 0.5 3.5T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T + 0.5 (1T + 0.05 25T ) = 0.5T + 0.5 2.25T = 1.625T Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28

Example Consider a load-store computer architecture where CPI=1.0 (when all cache accesses are successfully satisfied). The load and store instructions correspond to 50% of the whole set of executed instructions. If the miss penalty is 25T and the miss rate is 10%, how much faster would be the processor if the miss rate was reduced to one half? Solution: CP I A = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T + 0.5 (1T + 0.1 25T ) = 0.5T + 0.5 3.5T = 2.25T CP I B = 50% 1T + 50% (t hit + p miss t penalty ) = 0.5 1T + 0.5 (1T + 0.05 25T ) = 0.5T + 0.5 2.25T = 1.625T Speedup = CP I A/CP I B = 2.25T/1.625T = 1.385 Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 28 Multi-Level Caches µp L1 Cache L2 Cache Primary Memory t access = t hit L1 + p missl1 t penaltyl1 t penaltyl1 = t hit L2 + p missl2 t penaltyl2 t access = t hit L1 + p missl1 (t hit L2 + p missl2 t penaltyl2 ) Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 28

Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28 Local and Global Miss Rates Local Miss Rate: fraction of the accesses that are done to a given cache that are not in such cache (miss) Global Miss Rate: fraction of the whole processor accesses that are not in the cache system p missgloball2 = p misslocall1 p misslocall2 Local is the same as global in L1 cache; Local miss-rate in L2 is usually high - the global miss-rate is a better measure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 28

Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = 40 1000 = 4% local miss rate L2 = 20 40 = 50% global miss rate L2 = 20 1000 = 2% Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28

Example Consider that 1000 memory accesses give rise to 40 misses in level 1 cache (L1) and 20 misses in level 2 cache (L2). On average, there are 1.5 memory accesses in each instruction. Also assume that the hit access time to both caches is 1 and 10 clock cycles, respectively, and the memory access is accomplished within 200 clock cycles. Compute the several miss-rates and the memory mean access time. Solution: local miss rate L1 = global miss rate L1 = 40 1000 = 4% local miss rate L2 = 20 40 = 50% global miss rate L2 = 20 1000 = 2% Memory Mean Access Time = = hit time L1 + miss rate L1 (hit time L2 + miss rate L2 miss penalty L2 ) = 1 + 4% (10 + 50% 200) = 1 + 4% 110 = 5.4 clock cycles Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 28 L2 Cache Configuration Variation of the miss rate with L2 cache capacity (L1 with 64kB): Capacity of L2 greater than L1; For greater capacities of L2, the global miss rate is similar to the one that would be obtained with a single (and a lot more expensive!) L1 cache, with the same size. Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 28

L2 Cache Configuration Variation of the relative execution time with L2 cache capacity and L2 hit time: Hit time is not critical; More complex cache, to minimize the miss rate. Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 28 Coherency Between Memory and Cache Sources of Incoherence: other devices that may also change the memory positions (DMA, I/O controllers, other processors, etc.). Typically, they only change the primary memory, in order to not interfere with the processor accesses to the cache. Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 28

Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches: Only operate over restricted areas of the addressing space (defined by configuration); Non stored areas: Input/output buffers; Communication buffers between processors. Shared caches (between all agents); Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28

Coherency Solutions Selective caches; Shared caches (between all agents): The several agents do not directly access the main memory - instead, they access the cache: Greater contention to access the cache; Increase of the cache miss rate. Caches with coherency protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28 Coherency Solutions Selective caches; Shared caches (between all agents); Caches with coherency protocols: Bus Snooping: The cache controller checks all writes in primary memory and invalidates those cache positions that were modified at memory; Reads from primary memory positions corresponding to cache blocks with updated values imply the copy of such values into memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 28

Data Coherency Between Caches Inclusion: L2 always has all the data stored in L1: It is only necessary to check L2, in order to invalidate both caches; Implies the adoption of blocks with the same size or extra hardware to search for sub-blocks. Exclusion: each block is never simultaneously stored in both caches: Optimization of cache memory occupation; Miss in L1 leads to a swap of the block between L1 and L2. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 28 Loading Policy Blocking: may have a significant impact in the miss penalty. Non-Blocking: reduces the current miss penalty, but may have a serious impact in subsequent misses: Early Restart Critical Word First Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 28

Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28

Giving Priority to Read Misses Over Writes After a read miss, instead of stalling the read operation until the write buffer writes its whole content into memory, the read is sent before the next word of the write buffer. Complications: the read has to check whether the acceded position will be updated by the write buffer. Example: SW R5,384(R0) ; M[384] R5 SW R3,512(R0) ; M[512] R3 LW R1,1024(R0) ; R1 M[1024] LW R2,512(R0) ; R2 M[512] Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 28 Victim Cache Instead of completely discarding each block when it has to be replaced, temporarily keep it in a victim buffer. Rather than stalling on a subsequent cache miss, the contents of the buffer are checked on a subsequent miss to see if they have the desired data before going to the next lower-level memory. Small cache (e.g.: 4 to 16 positions); Fully associative; Particularly efficient for small direct mapped caches (more than 25% reduction of the miss rate in a 4kB cache). Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 28

Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 28 of the Miss Rate Miss classification: Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28

of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28

of the Miss Rate Miss classification: Compulsory: occur at the beginning o the program (cannot be avoided). Capacity: the cache cannot contain all the blocks needed during execution of a program. Conflict: occur due to the adopted placement strategy (direct mapped or n-set associative). Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 28 Distribution of the Miss Rate Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 28

Minimizing each Type of Miss Compulsory: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28 Minimizing each Type of Miss Compulsory: Solution: Increase the size of the block Capacity: Solution: Increase the size of the cache Conflict: Solution: Increase the associativity level Common objective: Try not to increase t penalty, but more importantly, not to increase t hit! Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 28

Increase the Block Size Takes advantage of spatial locality. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28 Increase the Block Size Takes advantage of spatial locality. But: The loading of the block may increase the miss penalty; May also increase the capacity and conflict miss rates. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 28

Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28

Increase the Cache Capacity Obviously, it decreases the miss rate: mainly, the capacity faults, but also the conflict faults. But: Slower caches; More expensive caches. Solution: Use greater caches in the upper levels. Current L2 caches have the same capacity as those that were used about 10 years ago! Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 28 Increase the Associativity Level of the conflict misses. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28

Increase the Associativity Level of the conflict misses. But: Slower caches; More expensive caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 28 Way Prediction and Pseudo-Associative Caches Way Prediction: extra bits are kept in the cache to predict the way of the next cache access. it is only necessary to compare the tag field. Pseudo-Associative Caches: upon a miss, certain direct mapped caches try a second block to find the desired address. Typically, this second block is obtained by inverting one bit of the index field. Several possible values for the hit time: Hit time, considering a correct prediction, t hit correct Hit time, considering an incorrect prediction, t hit incorrect Miss penalty time, t penalty Objective: t hit correct < t hit incorrect t penalty Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 28

Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 28 Code optimization: Data access; Program access; of miss penalty with parallel techniques: Pre-Fetching; Non-blocking caches. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 28