ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Overview Caches hold a subset of data from the main memory Three types of caches Direct mapped Set associative Fully associative Today: Direct mapped Each memory value can only be in one place in the cache Is it there (Hit?) Or is it not there (Miss?) ECE232: Introduction to Caches 2

Direct Mapped Cache - Textbook Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits ECE232: Introduction to Caches 3

Direct mapped cache (assume 1 byte/block) Cache Block 0 can be occupied by data from Memory blocks 0, 4, 8, 12 Cache Block 1 can be occupied by data from Memory blocks 1, 5, 9, 13 Cache Block 2 can be occupied by data from Memory blocks 2, 6, 10, 14 Cache Block 3 can be occupied by data from Memory blocks 3, 7, 11, 15 0000 2 0100 2 1000 2 1100 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index Memory 0 1 2 3 Cache Index 4-Block Direct Mapped Cache ECE232: Introduction to Caches 4

Direct Mapped Cache Index and Tag Memory 1 byte 00 00 2 01 00 2 10 00 2 11 00 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index 0 1 2 3 Cache Index Memory block address tag index index determines block in cache index = (address) mod (# blocks) The number of cache blocks is power of 2 cache index is the lower n bits of memory address ECE232: Introduction to Caches 5

Direct Mapped w/tag 00 10 01 10 10 10 11 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index Memory 0 1 2 3 Cache Index tag tag tag determines which memory block occupies cache block hit: cache tag field = tag bits of address 11 Memory block address index miss: tag field tag bits of address ECE232: Introduction to Caches 6

Direct Mapped Cache Simplest mapping is a direct mapped cache Each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache ECE232: Introduction to Caches 7

Finding Item within Block In reality, a cache block consists of a number of bytes/words to (1) increase cache hit due to locality property and (2) reduce the cache miss time Given an address of item, index tells which block of cache to look in Then, how to find requested item within the cache block? Or, equivalently, What is the byte offset of the item within the cache block? ECE232: Introduction to Caches 8

Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset within the indexed block TAG INDEX OFFSET Tag to check if have correct block Index to select a block in cache Byte offset Example: Block size of 8 bytes; select byte 4 (or 2 nd word) Memory address 11 01 100 0 1 2 3 tag 11 ECE232: Introduction to Caches 9 Cache Index

Accessing data in a direct mapped cache Three types of events: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte ECE232: Introduction to Caches 10

Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0 ECE232: Introduction to Caches 11

Cache Example 8-blocks, 1 byte/block, direct mapped Initial state Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N ECE232: Introduction to Caches 12

Cache Example Addr Binary addr Hit/mis s Cache block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 13

Cache Example Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 14

Cache Example Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 15

Cache Example Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 16

Cache Example Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 17

Example: Larger Block Size 64 blocks, 16 bytes/block To what block number does address 1200 map? Block address = 1200/16 = 75 Block number = 75 modulo 64 = 11 31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits ECE232: Introduction to Caches 18

Block Size Considerations Larger blocks should reduce miss rate Due to spatial locality But in a fixed-sized cache Larger blocks fewer of them More competition increased miss rate Larger blocks pollution Larger miss penalty Can override benefit of reduced miss rate Early restart and critical-word-first can help ECE232: Introduction to Caches 19

Cache Misses On cache hit, CPU proceeds normally On cache miss Stall the CPU pipeline Fetch block from next level of hierarchy Instruction cache miss Restart instruction fetch Data cache miss Complete data access ECE232: Introduction to Caches 20

Write-Through On data-write hit, could just update the block in cache But then cache and memory would be inconsistent Write through: also update memory But makes writes take longer e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles Effective CPI = 1 + 0.1 100 = 11 Solution: write buffer Holds data waiting to be written to memory CPU continues immediately Only stalls on write if write buffer is already full ECE232: Introduction to Caches 21

Write-Back Alternative: On data-write hit, just update the block in cache Keep track of whether each block is dirty When a dirty block is replaced Write it back to memory Can use a write buffer to allow replacing block to be read first ECE232: Introduction to Caches 22

Measuring Cache Performance Components of CPU time Program execution cycles Includes cache hit time Memory stall cycles Mainly from cache misses With simplifying assumptions: Memory stallcycles Memory accesses Miss rate Miss penalty Program Instructions Program Misses Instruction Miss penalty ECE232: Introduction to Caches 23

Average Access Time Hit time is also important for performance Average memory access time (AMAT) AMAT = Hit time + Miss rate Miss penalty Example CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% AMAT = 1 + 0.05 20 = 2ns 2 cycles per instruction ECE232: Introduction to Caches 24

Summary Today: Direct mapped cache Performance: tied to whether values are located in the cache Cache miss = bad performance Need to understand how to numerically determine system performance based on cache hit rate Why might direct mapped caches be bad Lots of data map to same location in cache Idea Maybe we should have multiple locations for each data value Next time: set associative ECE232: Introduction to Caches 25