ECE260: Fundamentals of Computer Engineering

Similar documents
CSE 2021: Computer Organization

CSE 2021: Computer Organization

ECE331: Hardware Organization and Design

ECE232: Hardware Organization and Design

Chapter 5. Memory Technology

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Computer Systems Laboratory Sungkyunkwan University

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EE 4683/5683: COMPUTER ARCHITECTURE

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Advanced Memory Organizations

V. Primary & Secondary Memory!

CS3350B Computer Architecture

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Page 1. Memory Hierarchies (Part 2)

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Course Administration

ECE331: Hardware Organization and Design

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches. Hiding Memory Access Times

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

LECTURE 11. Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Memory Hierarchy Design (Appendix B and Chapter 2)

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Review: Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE 485/585 Microprocessor System Design

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

EE 457 Unit 7a. Cache and Memory Hierarchy

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Caches II. CSE 351 Spring Instructor: Ruth Anderson

14:332:331. Week 13 Basics of Cache

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

Key Point. What are Cache lines

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Computer Architecture: Optional Homework Set

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY


Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

COMPUTER ORGANIZATION AND DESIGN

EEC 483 Computer Organization

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

LECTURE 5: MEMORY HIERARCHY DESIGN

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Memory Hierarchy Y. K. Malaiya

Memory. Lecture 22 CS301

14:332:331. Week 13 Basics of Cache

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Trying to design a simple yet efficient L1 cache. Jean-François Nguyen

CMPT 300 Introduction to Operating Systems

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Copyright 2012, Elsevier Inc. All rights reserved.

CS61C : Machine Structures

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Recall

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Transcription:

Basics of Cache Memory James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy

Cache Memory Cache memory a hardware (or software) component that stores data so future requests for that data can be served faster [from Wikipedia] Data stored in a cache is typically the result of an earlier access or computation May be a duplicate of data stored elsewhere Exploits the principle of of locality Keep often used data in quickly accessible memory Keep data that may be required soon in quickly accessible memory In hardware, cache memory is the level (or levels) of the memory hierarchy closest to the CPU Exists between the processor and the main system memory (DRAM) Cache memory made from fast SRAM Faster, but less capacity than main system memory 2

Cache Analogy (used by author in your textbook) You are writing a paper for a history course while sitting at a table in the library As you work you realize you need a reference book from the library stacks You stop writing, fetch the book from the stacks, and continue writing your paper You don t immediately return the book, you might need it again soon (temporal locality) You might even need multiple chapters from the same book (spatial locality) After a short while, you have a few books on your table, and you can work smoothly without needing to fetch more books from the shelves The library table is a cache for the rest of the library Now you switch to doing your biology homework You need to fetch a new reference book for your biology from the shelf If your table is full, you must return a history book to the stacks to make room for the biology book 3

Cache Memory on Processor Die (Quad-Core CPU) Processor die for Intel s Sandy Bridge Processor Core 1 (circa 2011) μproc Core 2 Core 3 L1 Cache I-Mem? Intel s System Agent Memory Controller PCIe Controller Media Interface Power Control Unit Thermal Management Integrated Graphics L1 Cache D-Mem? L1 Memory Hierarchy 3 Levels of Cache Core 4 L2 Cache L2 L3 Cache (Shared Between Cores) L3 Main Memory Dual Channel Memory I/O 4

Cache in the Memory Hierarchy Want to avoid reading data from slower main memory When processor needs to read data from a memory address First, check cache memory If data is present in cache memory (i.e. cache hit), then done If data is not present in cache (i.e. cache miss), then retrieve it from main memory and copy it into the cache for use by processor If multiple levels of cache exist, check them in order: L1 L2 L3 Main Memory Only check L2 if data not in L1 Only check L3 if data not in L2 Only check main memory if data not in L3 When data is found, copy it to higher levels of memory hierarchy (i.e. copy from L3 to L2 and L1) 5

Main Memory in a Computer To understand cache, first see how main memory is handled Given a processor where memory is addressed with a d-bit address Main memory range 0 to 2 d -1 Main memory is divided into blocks s are all equally sized Each block contains one or more data words (i.e. each block consists of one or more memory addresses) Just as each memory address has a number, each block can also be numbered 0 2 d - 1 Memory Addresses Main Memory... 0 1 2 3 4 n Address How many blocks total? Total Memory Size # s = Size Example: 128 Total 1024 Memory Words Size # s = s 8 Words/ Size 6

Finding the Right in Main Memory Suppose memory address is generated the address is between 0 and 2 d -1 In which block is a memory address located? To calculate block address, divide memory address by block size result is the block address With block size a power of 2, simply shift memory address to find block address Assume block size is 2 b (and b < d) Remove the least significant b bits (this is called the offset) Remaining d minus b bits is the block address Calculating a Address Example Example using binary and shifting Total Memory Memory Address Size # s Address = Size # s #3 = Total MemWord Memory 31Size ten 8 Words/ Size MemWord 31 ten = 11111 two 8 Words/ = 2 3 To divide by 8, shift-right by 3 SRL(11111 two ) = 11 two = #3 ten 7

Cache Memory in a Computer Cache memory is divided into blocks that are exactly the same size as the the main memory blocks 0 Main Memory 0 Number of blocks in cache is much less than the number of total blocks in main memory Cache contains only a subset of the blocks from main memory it is not large enough to store all blocks Many different caching techniques We will focus on direct-mapped caching To calculate cache block index, ( address) modulo (#Cache blocks) μproc 0 1 Cache Index Cache Memory 2 d - 1 Memory Addresses... 1 2 3 4 5 n Address 8

Direct-Mapped Cache Direct-mapped cache a cache structure in which each memory location is mapped to exactly one location in the cache 0 Main Memory 0 Number of blocks is a power of 2 size is a power of 2 Location in cache determined by block address To calculate cache block index, ( address) modulo (#Cache blocks) Cache Index = 5 modulo 2 = 1 μproc 0 1 Cache Index Cache Memory... 1 2 3 4 5 2 d - 1 n Memory Addresses Address 9

Address Subdivision To utilize cache, a memory byte address is subdivided in multiple fields: Byte offset Specifies a byte within a word Least significant two bits, typically ignored when caching Word offset (if block size > 1 word) Specifies a word within a block Cache block index Memory Address Requested by Processor Tag Index Word Offset Byte Offset d-1 0 Specifies which cache block a memory address should be stored Tag Many blocks map to the same index of cache memory Tag bits uniquely identify each block (really just high-order bits of memory address) 10

Cache Structure Each cache entry contains: One or more Data words Determined by cache designer Example shows only a single 32-bit word A Tag field Differentiates blocks that cache to the same block index in cache memory A Valid bit Initialized to 0 Set to 1 when cache entry is set Cache blocks are initially empty and invalid 11

Address Subdivision/Cache Structure Example Memory Address Requested by Processor Tag Index Word Offset Byte Offset => bits 1:0 (2 bits) Used to address bytes within a word Byte Offset d-1 0 Word Offset => Not used in this example since only 1 32-bit word per cache entry Cache Index => Bits 11:2 (10 bits) Location in cache where this memory address should be stored Tag Bits => Bits 31:12 (20 bits) Remaining bits used to differentiate blocks 12

Handling Cache Misses On a cache hit, CPU proceeds normally On a cache miss, cannot proceed normally because data is not available Stall the CPU pipeline Fetch desired block from next level of memory hierarchy If cache miss was Instruction Cache Restart instruction fetch If cache miss was Data Cache Complete data access 13

Direct-Mapped Cache Example Parameters Example of a sequence of memory references to a direct-mapped cache using word addressing For this example, the cache has 8 blocks with only 1 word per block Cache is initially empty all valid bits are initialized to 0 to indicate that the cache block is invalid Assume memory addresses are 8-bit word addresses (no byte offset bits) Total memory space of 2 8 words of memory cannot fit them all in cache First, compute various field sizes # Word Offset Bits = log 2 (# Words/) # Word Offset Bits = log 2 (1 Word/) = 0 bits # Cache Index Bits = log 2 (# s) # Cache Index Bits = log 2 (8 s) = 3 bits # Tag Bits = # Addr Bits - # Index Bits - # Offset Bits # Tag Bits = 8 Addr Bits - 3 Index Bits - 0 Offset Bits = 5 bits 14

Cache Example Cache Structure index numbers range from 0 to (#s - 1) Valid bits are all initially 0 to indicate invalid entry Tag bits are used to differentiate blocks that cache to the same index # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits Cache is structured as a table that is indexed using block numbers. For this example, block index numbers are shown in both binary and base-10 Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 0 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 0 111 two (7 ten ) 0 The data from the main memory that is being cached 15

Cache Example First Memory Access Access memory address 22 ten (this is a word address) Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 22 ten 00010110 two 00010 two 110 two (6 ten ) Miss CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 0 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 0 111 two (7 ten ) 0 index number 6 ten is set to invalid, therefore this is a cache miss. Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 0 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) Copy data from main system memory into cache, set tag and valid bit. Resume original instruction that needed the data. 16

Cache Example Second Memory Access Access memory address 26 ten Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 26 ten 00011010 two 00011 two 010 two (2 ten ) Miss CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 0 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 index number 2 ten is set to invalid, therefore this is a cache miss. Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) Copy data from main system memory into cache, set tag and valid bit. Resume original instruction that needed the data. 17

Cache Example Third Memory Access Access memory address 22 ten (again) Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 22 ten 00010110 two 00010 two 110 two (6 ten ) Hit CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 index number 6 ten is set to valid AND the tag in the cache memory matches the tag field of the requested memory address, therefore this is a cache hit. Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) No need to access main system memory. Provide requested data to processor. 18

Cache Example Fourth Memory Access Access memory address 26 ten (again) Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 26 ten 00011010 two 00011 two 010 two (2 ten ) Hit CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 index number 2 ten is set to valid AND the tag in the cache memory matches the tag field of the requested memory address, therefore this is a cache hit. Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) No need to access main system memory. Provide requested data to processor. 19

Cache Example Fifth Memory Access Access memory address 16 ten Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 16 ten 00010000 two 00010 two 000 two (0 ten ) Miss CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 0 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 index number 0 ten is set to invalid, therefore this is a cache miss. Index V Tag Data 000 two (0 ten ) 1 00010 two mem[16] 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) Copy data from main system memory into cache, set tag and valid bit. Resume original instruction that needed the data. 20

Cache Example Sixth Memory Access Access memory address 18 ten Word Addr Word Addr (base-2) Tag Cache Index # Hit / Miss # Word Offset Bits = 0 bits # Cache Index Bits = 3 bits # Tag Bits = 5 bits 18 ten 00010010 two 00010 two 010 two (2 ten ) Miss CACHE MEMORY (before access) Index V Tag Data 000 two (0 ten ) 1 00010 two mem[16] 001 two (1 ten ) 0 010 two (2 ten ) 1 00011 two mem[26] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 index number 2 ten is set to valid. HOWEVER, the tag in the cache DOES NOT MATCH the tag field of the requested memory address, therefore this is a cache miss. Index V Tag Data 000 two (0 ten ) 1 00010 two mem[16] 001 two (1 ten ) 0 010 two (2 ten ) 1 00010 two mem[18] 011 two (3 ten ) 0 100 two (4 ten ) 0 101 two (5 ten ) 0 110 two (6 ten ) 1 00010 two mem[22] 111 two (7 ten ) 0 CACHE MEMORY (after access) Copy data from main system memory into cache OVERWRITING previous contents, set new tag value. Resume original instruction that needed the data. 21

Direct-Mapped Cache Example #2 Parameters Example of a sequence of memory references to a direct-mapped cache using byte addressing For this example, the cache has 4 blocks with 16 bytes per block (i.e. 4 words per block) Cache is initially empty all valid bits are initialized to 0 to indicate that the cache block is invalid Assume memory addresses are 8-bit byte addresses Two least significant bits specify byte offset within a word remove these bits Total memory space of 2 8 bytes of memory cannot fit them all in cache First, compute various field sizes # Word Offset Bits = log 2 (# Words/) # Word Offset Bits = log 2 (4 Words/) = 2 bits # Cache Index Bits = log 2 (# s) # Cache Index Bits = log 2 (4 s) = 2 bits # Tag Bits = # Addr Bits - # Index Bits - # Offset Bits - Byte Offset Bits # Tag Bits = 8 Addr Bits - 2 Index Bits - 2 Offset Bits - 2 Byte Offset Bits = 2 bits 22

Cache Example #2 Cache Structure This cache structure stores multiple data words per block When one of the data words is accessed, the entire block is brought into the cache from main system memory # Word Offset Bits = 2 bits # Cache Index Bits = 2 bits # Tag Bits = 2 bits Data (4 words) Index V Tag 00 two (0 ten ) 0 01 two (1 ten ) 0 10 two (2 ten ) 0 11 two (3 ten ) 0 Offset 3 Offset 2 Offset 1 Offset 0 23

Cache Example #2 First Memory Access Access byte address 12 ten (this is a byte address, i.e. word address 3 ten ) Byte Addr Byte Addr (base-2) Tag Cache Index # Offset Hit / Miss # Word Offset Bits = 2 bits # Cache Index Bits = 2 bits # Tag Bits = 2 bits 12 ten 00001100 two 00 two 00 two (0 ten ) 11 two (3 ten ) Miss Data (4 words) CACHE MEMORY (before access) Index V Tag 00 two (0 ten ) 0 01 two (1 ten ) 0 Offset 3 Offset 2 Offset 1 Offset 0 index number 0 ten is set to invalid, therefore this is a cache miss. 10 two (2 ten ) 0 11 two (3 ten ) 0 CACHE MEMORY (after access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 0 11 two (3 ten ) 0 On cache miss, find the block in which address 12 ten resides, copy entire block from main system memory into cache. Also, set tag and valid bit and resume original instruction that needed data. 24

Cache Example #2 Second Memory Access Access byte address 104 ten (this is a byte address, i.e. word address 26 ten ) Byte Addr Byte Addr (base-2) Tag Cache Index # Offset Hit / Miss # Word Offset Bits = 2 bits # Cache Index Bits = 2 bits # Tag Bits = 2 bits 104 ten 01101000 two 01 two 10 two (2 ten ) 10 two (2 ten ) Miss CACHE MEMORY (before access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 0 11 two (3 ten ) 0 index number 2 ten is set to invalid, therefore this is a cache miss. CACHE MEMORY (after access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 1 01 two mem[108] mem[104] mem[100] mem[96] 11 two (3 ten ) 0 On cache miss, find the block in which address 104 ten resides, copy entire block from main system memory into cache. Also, set tag and valid bit and resume original instruction that needed data. 25

Cache Example #2 Third Memory Access Access byte address 96 ten (this is a byte address, i.e. word address 24 ten ) Byte Addr Byte Addr (base-2) Tag Cache Index # Offset Hit / Miss # Word Offset Bits = 2 bits # Cache Index Bits = 2 bits # Tag Bits = 2 bits 96 ten 01100000 two 01 two 10 two (2 ten ) 00 two (0 ten ) Hit CACHE MEMORY (before access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 1 01 two mem[108] mem[104] mem[100] mem[96] 11 two (3 ten ) 0 index number 2 ten is set to valid AND the tag in the cache memory matches the tag field of the requested memory address, therefore this is a cache hit. CACHE MEMORY (after access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 1 01 two mem[108] mem[104] mem[100] mem[96] 11 two (3 ten ) 0 On cache miss, find the block in which address 104 ten resides, copy entire block from main system memory into cache. Also, set tag and valid bit and resume original instruction that needed data. 26

Cache Example #2 Fourth Memory Access Access byte address 172 ten (this is a byte address, i.e. word address 43 ten ) Byte Addr Byte Addr (base-2) Tag Cache Index # Offset Hit / Miss # Word Offset Bits = 2 bits # Cache Index Bits = 2 bits # Tag Bits = 2 bits 172 ten 10101100 two 10 two 10 two (2 ten ) 11 two (3 ten ) Miss CACHE MEMORY (before access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 1 01 two mem[108] mem[104] mem[100] mem[96] 11 two (3 ten ) 0 index number 2 ten is set to valid. HOWEVER, the tag in the cache DOES NOT MATCH the tag field of the requested memory address, therefore this is a cache miss. CACHE MEMORY (after access) Data (4 words) Index V Tag Offset 3 Offset 2 Offset 1 Offset 0 00 two (0 ten ) 1 00 two mem[12] mem[8] mem[4] mem[0] 01 two (1 ten ) 0 10 two (2 ten ) 1 10 two mem[172] mem[168] mem[164] mem[160] 11 two (3 ten ) 0 On cache miss, find the block in which address 172 ten resides, copy entire block from main system memory into cache OVERWRITING previous contents. Set new tag value and resume original instruction that needed the data. 27

Calculating Address and Index Numbers Using cache parameters from previous example: 4 blocks with 16 bytes per block address references the address of a block in main system memory Address = Memory Byte Address / Bytes per Address = 104 ten / 16 Bytes per = 6.5 = 6 Example byte address: 104 ten JUST SHIFT or remove low-order bits index number references the index location into which the block should be stored Index Number = Address modulo # Cache s Index Number = 6 modulo 4 = 2 JUST grab low-order bits 28

Size Considerations Larger block sizes should reduce miss rate Due to spatial locality However, in a fixed-sized cache Larger blocks means there are fewer of them More competition for blocks means an increased miss rate may be overwritten in cache before program is done with it Larger block size also means larger miss penalty When a miss occurs, must copy more data from main system memory for larger blocks Can override benefit of reduced miss rate 29

Size Considerations (continued) Each line represents a different cache size. Larger caches have smaller miss rate. Making blocks too large can increase miss rate. 30

Handling Writes Write-Through On data-write hit, could just update the block in cache and not the main system memory However, then cache and main system memory would be inconsistent In a write-through scheme, when cache is written also write the data to main system memory May negatively impact performance Each write to main system memory takes time Doesn t take advantage of burst write mode Data may be updated repeatedly, no need to write transient data to main memory (e.g. loop variable) Various methods exist to help alleviate the performance issues caused by the write-through scheme However, better schemes exist 31

Handling Writes Write-Back Alternative to write-through scheme On data-write hit, just update the block in cache and not the main system memory Keep track of whether each block is dirty Dirty blocks are blocks that have been modified but not yet written back to main system memory Anytime a dirty block is replaced Write it back to main system memory before overwriting with new block Can use a write buffer to allow replacing block to be read first 32