CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

Size: px
Start display at page:

Download "CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás"

Transcription

1 CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011 ADVANCED COMPUTER ARCHITECTURES ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)

2 Outline 2 Cache memories: Revision of basic cache memories Direct mapping caches N-way associative and fully associative caches Data replacement policies Write Policies Hierarchical cache systems Optimizing caches software and hardware techniques

3 Reducing memory access times 3 Locality principles: Temporal locality: If a given access is accessed, it is likely that it will be accessed in the near future e.g., program loops Spatial locality: If a given access is accessed, it adjacent addresses are likely to be needed in the near future e.g., sequential code or accessing the elements of an n-dimensional array 90/10 RULE: A program typically spends around 90% of time executing the same 10% of instructions!

4 Cache memory organization 4 A cache memory is usually composed of: A set of n-ways (n parallel cache memories) Each way, is composed of multiple lines of data, each line including: A TAG value A VALID bit that specifies: Value of 0 the block of data is invalid and corresponds to uninitialized data Value of 1 the block of data corresponds to valid data A data BLOCK, typically composed of several data elements Some additional CTRL fields for data replacement and write policies CACHE WAY N-1... CACHE WAY 1 CACHE WAY 0 TAG V CTRL LINE DATA (BLOCK) Address check Control Word... Word Word

5 Mapping a memory in a small cache 5 Whenever a memory access is performed, the processor first checks the cache memory Address TAG INDEX WORD SELECT The validity bit states whether the data in the cache is: (1) entry stores valid data (0) entry stores values corresponding from an unitialized cache If data is on the cache memory retrieve it immediately Select cache line CACHE MEMORY TAG V LINE DATA If data is not on cache memory, fetch it from memory (or from a lower level cache memory) Once data is received from the lower memory hierarchy, send the fetched data to the processor and also put it on the cache Address Comparison = Line is valid TAG matches Data Word Cache Hit (Data Word validity)

6 Mapping a memory in a small cache 6 To check if data is on the cache, perform the following steps: 1. Divide the memory address into TAG (most significant bits) OFFSET, or word/byte select (least significant bits) INDEX (remaining bits) 2. Use the INDEX to select a row of the CACHE memory; retrieve the following fields: DATA TAG (TAG of the data stored in cache) VALID (validity of the cache entry) DATA (actual data) 3. Check the cache entry for data presence: If V=0, the entry is invalid generate a cache miss If V=1 AND entry TAG data TAG, the data is not on the cache (it corresponds to some other memory position) generate a cache miss If V=1 AND entry TAG = data TAG, fetch the data from the cache generate a cache hit

7 Example 7 Consider the following code segment where initially: R1=10F8h R2=11F8h R4=12F8h R5=00F8h Assume instruction and data caches with the following characteristics: Direct mapping (1-way) caches 8 lines, each line with: Instruction Cache 8Bytes Data Cache 32 Bytes 120h Loop: L.D F0,0(R1) 124h L.D F2,0(R2) 128h MUL.D F2,F0,F2 12Ch DADD.D F2,F2,F4 130h S.D 0(R4),F2 134h DADD R1,R1,#-8 138h DADD R2,R2,#-8 13Ch DADD R4,R4,#-8 140h BNE R1,R5,Loop 1. Divide the memory address word between TAG, INDEX and OFFSET fields 2. Make a diagram of the cache verification process when accessing memory address 120h 3. Determine the cache size (in Bytes) 4. Compute the data and instructions hit rate for the example program (assume that the cache is initially empty) ps: assume that the memory is byte addressed

8 Associative caches 8 Address TAG Address Comparison INDEX Select cache line WORD SELECT CACHE WAY 1 CACHE WAY 0 TAG = V LINE DATA Data on Way 1 Data on Way 0 Hit on Way 1 Hit on Way 0 To decrease cache misses, multiple caching tables (ways) can be combined An n-way associative cache looks for the target address in each of the n tables If the address is matched in one of the ways, it retrieves the data from that way If the value is not found, it fetches the data from the memory (or from a lower hierarchy cache memory) and puts it on one of the ways (see replacement policies) Data Cache Hit (Data Word validity)

9 Associative caches 9 Address TAG INDEX WORD SELECT CACHE WAY 1 CACHE WAY 0 Addressing is made as if only one way exists. Thus, the number of bits for each addressing field is computed as: Select cache line TAG V LINE DATA WORD SELECT = log 2 INDEX = log 2 #CACHE LINES LINE SIZE MEM. WORD SIZE TAG = ADDR SIZE (WORD SEL + INDEX) NOTE: the word select is often referred to as line offset or simply offset Data on Way 1 Address Comparison = Data on Way 0 Hit on Way 1 Hit on Way 0 Data Cache Hit (Data Word validity)

10 Associative caches 10 A fully associative cache is a particular case of associative caches, where each table is composed of only one entry (line) The index field has 0 bits In general, increased associativity (more ways) decreases the number of cache misses Particular cases can be found where this is not true Increased associativity also increases the total amount of hardware Each way requires 1 comparator A N:1 multiplexer is required to select the output data Also increases the propagation delay An advantage of direct mapping caches is that the actual data can be retrieved before data validation Address If the data is invalid, ignore the data at a later pipeline stage TAG WORD SELECT

11 Data replacement policies 11 Direct mapping cache: There is only one possible location Associative caches may replace data into any of the multiple ways Random: Randomly select the way to insert the new data (thus discarding the old one) Least Recently Used (LRU): Select the line which hasn t been used for longer Implementation of an LRU algorithm is prohibitive with a high number of ways (typically > 4) Pseudo-LRU uses only one bit to select which is the most recently used line Least Frequently Used (LFU): Counts how many times each line was accessed First-In First-Out Replace the oldest line in cache 2-WAY MAPPING CACHE TAG V DIRECT MAPPING CACHE (1 WAY) POSSIBLE TARGET TAG V TARGET LINE WAY 0 WAY 1 TAG V POSSIBLE TARGET

12 Data replacement policies Victim Cache 12 A substituted line is likely going to be needed in the near future Temporal locality principle To overcome situations when the removed line is required again: Use a victim cache to store the last lines to be removed (substituted) from the cache When a cache miss occurs, check the victim cache before looking at the lower hierarchy memory for the required block If the line is found on the victim cache, put it back on the (main) cache Victim caches are typically fully associative caches Works as a FIFO memory The AMD Athlon uses an 8-entry victim cache Victim caches are especially useful when associated to small direct mapping caches A victim cache with 4 entries can reduce in 25% the caches misses on a 4KB associative cache

13 13 Data replacement policies Reducing the cache miss overhead When the cache line includes more than one word (which is the most typical case), the processor does not need to wait for the whole line to be fetched Early Restart: As soon as the required word is fetched from the lower level memory, it is given to the processor, so that it can continue executing Critical Word First: Start by fetching the required word from the lower level memory and immediately give the value to the CPU; once the required word is fetched, load the remaining words (generally in a circular manner) CACHE LINE (1) Critical word first (2) remaining words Does not guarantee that the cache miss overhead is reduced If an adjacent word is required immediately after (spatial locality principle), the processor may still stall

14 Example 2 14 Consider the following code segment where initially: R1=10F8h R2=11F8h R4=12F8h R5=00F8h Assume an instruction cache with the following characteristics: Cache size of 32B Lines of 4 bytes 120h Loop: L.D F0,0(R1) 124h L.D F2,0(R2) 128h MUL.D F2,F0,F2 12Ch DADD.D F2,F2,F4 130h S.D 0(R4),F2 134h DADD R1,R1,#-8 138h DADD R2,R2,#-8 13Ch DADD R4,R4,#-8 140h BNE R1,R5,Loop Determine the cache hit and miss rates considering: 1. A direct mapping cache 2. A fully associative cache with a Least Recently Used (LRU) substitution policy ps: assume that the memory is byte addressed

15 Write policies 15 Write through: If data is on the cache Write the new value on the cache Write the new value on the memory If data is NOT on the cache Write the new value on the memory Data can be written to cache and to memory The memory is always updated Write Back If data is on the cache Write the new value on the cache If data is NOT on the cache Load the corresponding line to the cache and replace it with the new word value Data can only be written to memory Data is only written to memory when it is replaced

16 Write policies Write back implementation 16 Address TAG INDEX WORD SELECT CACHE WAY 1 CACHE WAY 0 Use an additional dirty bit to indicate whether any of the words in the line has been modified Select cache line TAG V P D LINE DATA If the dirty bit is one, before replacing a line with new data, the cache controller must write the value to memory (or to a lower cache hierarchy) Data on Way 1 Address Comparison P Replacement policy bits D Dirty bit = Data on Way 0 Hit on Way 1 Hit on Way 0 Data Cache Hit (Data Word validity)

17 Write policies Write back implementation 17 Address TAG INDEX WORD SELECT CACHE WAY 1 CACHE WAY 0 A cache miss imposes a large overhead if the block to be replaced is dirty Prioritize accesses: Select cache line TAG V P D LINE DATA Use a buffer to temporarily store data that is going to be replaced First load the data from the lower level memory Then store the dirty data to memory Data on Way 1 Address Comparison P Replacement policy bits D Dirty bit = Data on Way 0 Hit on Way 1 Hit on Way 0 Data Cache Hit (Data Word validity)

18 Write policies Write through implementation 18 A write through policy impose a large latency on memory writes Uses a write buffer to decrease latency The processor writes on the buffer The cache controller writes the data to the cache and lower level memory CPU Cache Memory (or lower level cache) The write buffer works as a FIFO (First In First Out) Works well if: Write frequncy Otherwise the write buffer will saturate! 1 time to write on lower level cache Saturation of the write buffer can be prevented by: Increasing its size Increasing the bandwidth to the lower cache level (e.g., by introducing more caching levels)

19 Write policies Write through implementation 19 A write through policy impose a large latency on memory writes Uses a write buffer to decrease latency The processor writes on the buffer The cache controller writes the data to the cache and lower level memory CPU Cache Memory (or lower level cache) To decrease latency of memory writes: Group writes to the lower level memory on the write buffer Data that have adjacent address, is written together (to allow bursting) Grouping is typically performed when a new value is added to the buffer The write buffer may generate RAW conflicts if the data is on the write buffer and not on the cache Solution 1: clear the write buffer before loading the value Solution 2: also verify if the data is on the write buffer

20 Write allocation policies 20 Write allocate: If the write data is not on the cache, allocate it (i.e., generate a cache miss to force the line to be put on the cache) Write not allocate: If the write data is not on the cache, do not attempt to fetch the line from memory Naturally it only makes sense to use the combined policies: Write Back Write allocate Write Through Write not allocate

21 Example 3 21 Consider the following code segment where initially: R1=10F8h R2=11F8h R4=12F8h R5=00F8h Assume a data cache with the following characteristics: 2-way associative caches 8 lines, each line with 32 bytes 120h Loop: L.D F0,0(R1) 124h L.D F2,0(R2) 128h MUL.D F2,F0,F2 12Ch DADD.D F2,F2,F4 130h S.D 0(R4),F2 134h DADD R1,R1,#-8 138h DADD R2,R2,#-8 13Ch DADD R4,R4,#-8 140h BNE R1,R5,Loop Determine the cache hit and miss rates considering: 1. A write through, write not allocate policy 2. A write back, write allocate policy 3. A write back, write allocate policy using a 4 entry victim cache ps: assume that the memory is byte addressed

22 Example 4 22 Consider the execution of a segment of code in the MIPS 32 processor Assume that, by tracing the execution of the program, you obtain memory accesses on the left Assume that all memory accesses correspond to 32-bit word load/stores Assume a 256B, direct mapping data cache with lines of 16B LOAD ( D4h) LOAD ( E4h) STORE ( D8h) LOAD ( E8h) LOAD ( E0h) STORE ( h) LOAD ( Ch) LOAD ( Dh) Assuming that the cache is initially empty, indicate the data on the cache after the execution of the program. Assume the following policies: 1. A write through, write not allocate policy 2. A write back, write allocate policy ps: assume that the memory is byte addressed

23 Causes of misses 23 Compulsory First reference to a block (line) Capacity Blocks discarded and later retrieved Conflict Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache

24 Cache miss impact on memory access time Using a single cache 24 The average memory access time depends on: Cache access latency the time to check if the required data is on the cache Cache miss rate Cache miss penalty the time to retrieve the data from a lower level memory T MEM ACCESS = T CACHE LATENCY + MR T MISS PENALTY Example: What is the average memory access time considering: A processor running at 2GHz A cache access latency of 5 clock cycles with an average hit rate of 90% An average memory access latency of 100 clock cycles Solution: 15 x 0.5 ns

25 Cache miss impact on memory access time Using a multi-level cache 25 Adding additional cache levels decreases the average access time For a n-level cache hierarchy: T MEM ACCESS = T L1 LATENCY + MR L1 T L2 ACCESS T L2 ACCESS = T L2 LATENCY + MR L2 T L3 ACCESS T L3 ACCESS = T Ln ACCESS = T Ln LATENCY + MR Ln T MEM For a 3-level cache hierarchy: T ACCESS = T L1 LAT + MR L1 T L2 LAT + MR L2 T L3 LAT + MR L3 T MEM Example: What is the average memory access time considering: A processor running at 2GHz An L1 cache with a latency of 3 clock cycles and an average hit rate of 80% An L2 cache with a latency of 10 clock cycles and an average hit rate of 60% An L3 cache with a latency of 20 clock cycles and an average hit rate of 50% An average memory access latency of 100 clock cycles Solution: 10.6 x 0.5 ns

26 Cache miss impact on memory access time Using a multi-level cache 26 To reduce the memory access time Decrease the L1 cache to decrease the initial latency (eventhough it increases the L1 Miss Rate) Add a second L2 cache L2 memory size >> L1 memory size L2 associativity (#ways) > L1 associativity L2 access time > L1 access time (typically around 3 times higher) Global L2 miss rate < L1 Miss rate Global L2 Miss Rate = #L2 Misses #Memory accesses = #Memory accesses MR L1 MR L2 = MR #Memory accesses L1 MR L2

27 Multi-level cache organization 27 If a block of data is on L1 should it be also on L2? Multi-level inclusion A block on L1 is always on L2 Whenever an L1 miss occurs the line (block) is loaded from L2 to L1 Worth when the L2 cache is much larger than the L1 cache Multi-level exclusion A block on L1 is never on L2 Whenever an L1 miss occurs the block is swapped from L2 to L1 The swapped out block is placed on the L2 Requires that the L1 and L2 line (block) sizes are equal Worth when the L2 cache is not much larger than the L1 cache

28 Multi-level cache organization Instruction and Data Caches 28 Separate the data and instruction caches: The processor can simultaneously access the L1 instruction cache to perform an instruction load, while accessing the L1 data cache for a load/store operation Avoids structural conflicts The data and instruction caches can be individually optimized Capacity (memory size), block size, associativity (#ways) Solves cache miss conflicts An instruction word can never occupy a data line Increases the capacity conflicts Each cache has a fixed size

29 29 Hardware optimizations

30 Cache optimizations 30 Increased block size Decreases compulsory misses Increases capacity and conflict misses, increases miss penalty Increase associativity Reduces conflict misses Increases hit time and power consumption Increase total cache size Increases cache access latency and power consumption Higher number of cache levels Reduces average memory access time

31 Cache optimizations 31 Give priority of reads over writes Reduces miss penalty Pipeline cache access Increases overall cache bandwidth Makes it easier to increase associativity Multi-banked caches Use multiple (interleaved) memory banks to store data Allows multiple simultaneous cache accesses

32 Cache optimizations 32 Way prediction Use extra control bits to predict which way has the required address Reduces cache latency, but requires data invalidation whenever the prediction is wrong Used in ARM Cortex A8 Data prefetching Detect patterns in memory accesses to predict the address of the next load instruction Useful in stream-like applications Allows hiding the memory access latency Current Intel architectures use multiple data pre-fetchers

33 33 Software optimizations

34 34 Software optimizations Instruction Cache Techniques for reducing instruction cache misses Instruction re-ordering to reduce the miss rate [McFarling89] For a direct mapping instruction cache of 2kB, lines of 4B Reduces the miss penalty in 50% For a direct mapping instruction cache of 8kB, lines of 4B Reduces the miss penalty in 75% Simple examples: Align sub-routines to the beginning of a cache line Apply loop unrolling only as long as the instructions fit on the cache Modern processors include special registers (typically referred as performance counters) that can count the number of cache misses Other events can also be measured (e.g., number of FP operations)

35 35 Software optimizations Data Cache Array composition Join arrays in memory such that memory is accessed sequentially Uses spatial locality Loop interchange Swap nested loops to access memory in sequential order Typically used when performing vector/matrix operations Uses temporal locality Blocking Instead of accessing entire rows or columns, subdivide matrices into blocks Requires more memory accesses but improves locality of accesses Uses both spatial and temporal locality

36 Software optimizations Data Cache 36 Data pre-fetching Register prefetching Insert load instructions before the data is actually used Requires extra registers Cache prefetching Use special instructions to load data into the cache These techniques are usually combined with loop unrolling and software pipelining

37 37 Next lesson More on memory systems: Virtual memory

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

DYNAMIC SPECULATIVE EXECUTION

DYNAMIC SPECULATIVE EXECUTION DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

CS222: Cache Performance Improvement

CS222: Cache Performance Improvement CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Memory Hierarchy 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Agenda Review Memory Hierarchy Lab 2 Ques6ons Return Quiz 1 Latencies Comparison Numbers L1 Cache 0.5 ns L2 Cache 7 ns 14x L1 cache Main Memory

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring 2019 Caches and the Memory Hierarchy Assigned February 13 Problem Set #2 Due Wed, February 27 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

6 th Lecture :: The Cache - Part Three

6 th Lecture :: The Cache - Part Three Dr. Michael Manzke :: CS7031 :: 6 th Lecture :: The Cache - Part Three :: October 20, 2010 p. 1/17 [CS7031] Graphics and Console Hardware and Real-time Rendering 6 th Lecture :: The Cache - Part Three

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

CS152 Computer Architecture and Engineering

CS152 Computer Architecture and Engineering CS152 Computer Architecture and Engineering Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/fa16 The problem sets are intended to help

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

1. Creates the illusion of an address space much larger than the physical memory

1. Creates the illusion of an address space much larger than the physical memory Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Cache Optimisation. sometime he thought that there must be a better way

Cache Optimisation. sometime he thought that there must be a better way Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching

More information

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 26 Cache Optimization Techniques (Contd.) (Refer

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

Advanced cache memory optimizations

Advanced cache memory optimizations Advanced cache memory optimizations Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin

More information

Lecture 7 - Memory Hierarchy-II

Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information