Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Similar documents
Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter Seven Morgan Kaufmann Publishers

Advanced Memory Organizations

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Memory Technology

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

EEC 483 Computer Organization

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy: Caches, Virtual Memory

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

V. Primary & Secondary Memory!

CS161 Design and Architecture of Computer Systems. Cache $$$$$

EECS 322 Computer Architecture Superpipline and the Cache

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

CSE 2021: Computer Organization

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CSE 2021: Computer Organization

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

14:332:331. Week 13 Basics of Cache

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

A Cache Hierarchy in a Computer System

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Systems Laboratory Sungkyunkwan University

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

EN1640: Design of Computing Systems Topic 06: Memory System

14:332:331. Week 13 Basics of Cache

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CMPSC 311- Introduction to Systems Programming Module: Caching

EN1640: Design of Computing Systems Topic 06: Memory System

CPSC 330 Computer Organization

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Spring 2016

Memory Technologies. Technology Trends

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

Virtual Memory. Main memory is a CACHE for disk Advantages: illusion of having more physical memory program relocation protection.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Main Memory Supporting Caches

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Modern Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy Design (Appendix B and Chapter 2)

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

LECTURE 11. Memory Hierarchy

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Review: Computer Organization

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

ECE331: Hardware Organization and Design

EE 4683/5683: COMPUTER ARCHITECTURE

Caches. Hiding Memory Access Times

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Caching Basics. Memory Hierarchies

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Memory Hierarchy. Slides contents from:

CS3350B Computer Architecture

Handout 4 Memory Hierarchy

Memory. Lecture 22 CS301

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CSE502: Computer Architecture CSE 502: Computer Architecture

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy Basics

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Memory hierarchy and cache

The check bits are in bit numbers 8, 4, 2, and 1.

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

Page 1. Multilevel Memories (Improving performance using a little cash )

Transcription:

Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level Two issues: How do we know if a data item is in the cache? If it is, how do we find it? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level Direct Mapped Direct Mapped Mapping: address is modulo the number of blocks in the cache For MIPS: Address (showing bit positions) Byte offset Valid = What kind of locality are we taking advantage of?

Direct Mapped s vs. Misses Taking advantage of spatial locality: V 8 bits Address (showing bit positions) 6 5 8 8 Byte offset 5 bits Block offset 56 entries Read hits this is what we want! Read misses stall the, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) = 6 Mux Write misses: read the entire block into the cache, then write the word 5 6 Hardware Issues Performance Make reading multiple words easier by using banks of memory Increasing the block size tends to decrease miss rate: % 5% Multiplexor Miss rate % 5% % 5% % bank bank bank bank 5% % 6 6 56 b. Wide memory organization c. Interleaved memory organization Block size (bytes) KB 8 KB 6 KB 6 KB 56 KB a. One-word-wide memory organization It can get a lot more complicated... 7 Use split caches because there is more spatial locality in code: Block size in Instruction miss Effective combined Program words miss rate rate miss rate gcc 6.%.% 5.%.%.7%.9% spice.%.%.%.%.6%.% 8

Performance Decreasing miss ratio with associativity Simplified model: execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty One-way set associative (direct mapped) Block 5 6 7 Set Set Four-way set associative Two-way set associative Eight-way set associative (fully associative) What happens if we increase block size? 9 Compared to direct mapped, give a series of references that: results in a lower miss ratio using a -way set associative cache results in a higher miss ratio using a -way set associative cache assuming we use the least recently used replacement strategy An implementation Performance 9 8 8 5% V V V V % KB 5 9% KB 5 55 6% KB 8 KB -to- multiplexor % One-way 6 KB KB Two-way 6 KB 8 KB Four-way Eight-way Associativity

Decreasing miss penalty with multilevel caches Complexities Add a second level cache: often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in nd level cache Example: CPI of. on a 5 Ghz machine with a 5% miss rate, ns DRAM access Adding nd level cache with 5ns access time decreases miss rate to.5% Using multilevel caches: try and optimize the hit time on the st level cache try and optimize the miss rate on the nd level cache Not always easy to understand implications of caches: 8 6 8 6 6 8 56 5 8 96 Theoretical behavior of vs. 6 8 8 6 6 8 56 5 8 96 Observed behavior of vs. Complexities Virtual Here is why: Main memory can act as a cache for the secondary storage (disk) 5 Virtual addresses Address translation Physical addresses 8 6 6 8 56 5 8 96 Disk addresses system performance is often critical factor multilevel caches, pipelined processors, make it harder to predict outcomes Compiler optimizations to increase locality sometimes hurt ILP Difficult to predict best algorithm: need experimental data Advantages: illusion of having more physical memory program relocation protection 5 6

Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use writeback Virtual address 9 8 7 5 9 8 Virtual page number Page offset Translation 9 8 7 5 9 8 Physical page number Page offset Physical address 7