ECE331: Hardware Organization and Design

Size: px

Start display at page:

Download "ECE331: Hardware Organization and Design"

Ethan Chambers
6 years ago
Views:

1 ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

2 Last time: Write-Back Alternative: On data-write hit, just update the block in cache Keep track of whether each block is dirty When a dirty block is replaced Write it back to memory Can use a write buffer to allow replacing block to be read first ECE331: Associative Caches 2

3 Measuring Cache Performance Components of CPU time Program execution cycles Includes cache hit time Memory stall cycles Mainly from cache misses With simplifying assumptions: Memory stall cycles = = Memory accesses Program Instructions Program Miss rate Miss penalty Misses Instruction Miss penalty ECE331: Associative Caches 3

4 Average Access Time Hit time is also important for performance Average memory access time (AMAT) AMAT = Hit rate * Hit time + Miss rate Miss penalty AMAT Hit time + Miss rate Miss penalty Since the hit rate is approximately equal to one. Example CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% AMAT = = 2ns 2 cycles per instruction ECE331: Associative Caches 4

5 Overview Last time: Direct mapped cache Pretty simple to understand Every memory block goes in only one place in the cache Somewhat limiting May cause a lot of the cache to be unused Idea! Why not be more flexible: data can go into more than one place Associative caches ECE331: Associative Caches 5

6 Cache addressing Memory How do you know if something is in the cache? (Q1) If it is in the cache, how to find it? (Q2) CPU To Processor From Processor Cache Block X Block X Block Y Traditional Memory Given an address, provide the data (has address decoder) Associative Memory AKA Content Addressable Memory Each line contain the address (or part of it) and the data Full/MSBs of Address Tag Data ECE331: Associative Caches 6

7 Cache Organization Fully-associative: any memory location can be stored anywhere in the cache Cache location and memory address are unrelated Direct-mapped: each memory location maps onto exactly one cache entry Some of the memory address bit are used to index the cache N-way set-associative: each memory location can go into one set of N blocks LSBs of Address MSBs of Address Tag Data ECE331: Associative Caches 7

Spectrum of Associativity For a cache with 8 entries convention calls this a block. Maybe set would be a better term. Sometimes we are stuck with convention.

8 Spectrum of Associativity For a cache with 8 entries convention calls this a block. Maybe set would be a better term. Sometimes we are stuck with convention. two blocks of data per set (4 sets) one block of data per set eight blocks of data per set four blocks of data per set (2 sets) ECE331: Associative Caches 8

9 Associativity example 64 Bytes of main memory (000000) arranged in 16 blocks of 4 byte words main memory 1 byte main memory block number bytes of cache arranged in 4 blocks of 4 bytes index or cache memory block number cache memory set number cache memory block number within set Direct Mapped 1-way set associative 2-way set associative fully associative 4-way set associative 1 block of cache 4 bytes in each block cache memory block number ECE331: Associative Caches 9

10 Associativity example continued For example: Memory location 13 (001101) main memory 1 byte ECE331: Associative Caches 10 main memory block number To find which block in main memory it is stored: 13 / 4 bytes per block = 3 remainder 1 data can be found in main memory block #3 For direct mapped cache (memory block #3) / (4 blocks cache) = 0 remainder 3 cache block #3 (same as the mod operation) cache memory block number For set associative cache (memory block #3) / (2 sets cache) = 1 remainder 1 cache block #1 cache memory set number cache memory block number within set actual location of byte in block

11 Set Associative Cache - addressing From the main memory address TAG INDEX/Set # OFFSET Tag to check if have correct block anywhere in set Index to select a set in cache Byte offset Example: Main memory address 13 (001101) with 16 bytes of cache arranged in 4 blocks of 4 bytes each Direct mapped: tag: 00, index 11, offset 01 2-way associative: tag: 001, index 1, offset 01 4-way associative: tag: 0011, index -, offset 01 Notice: the size of the tag grows as associativity increases ECE331: Associative Caches 11

12 Two-way Set Associative Cache Two direct-mapped caches operate in parallel Cache Index selects a set from the cache (set includes 2 blocks) The two tags in the set are compared in parallel Data is selected based on the tag result Valid Cache Tag Cache Data Cache Block 0 : : : Cache Index Cache Data Cache Block 0 : Cache Tag Valid : : Tag Compare Sel1 1 Mux 0 Sel0 Compare Tag Set Hit OR Cache Block ECE331: Associative Caches 12

hardware Higher access time A Four-Way Set- Associative Cache,

13 4-way Set Associative Cache Organization Allow block anywhere in a set Advantages: Better hit rate Disadvantage: More tag bits More hardware Higher access time A Four-Way Set- Associative Cache, Block size = 4 Bytes Cache size = 4096 Bytes ECE331: Associative Caches 13

14 Associative Caches Fully associative Allow a given block to go in any cache entry Requires all entries to be searched at once Comparator per entry (expensive) n-way set associative Each set contains n entries Block number determines which set (Block number) modulo (#Sets in cache) Search all entries in a given set at once n comparators (less expensive) ECE331: Associative Caches 14

15 How Much Associativity Increased associativity decreases miss rate But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1% ECE331: Associative Caches 15

16 Types of Cache Misses (for 3 organizations) Compulsory (cold start): location has never been accessed - first access to a block not in the cache Capacity: since the cache cannot contain all the blocks of a program, some blocks will be replaced and later retrieved Conflict: when too many blocks try to load into the same set, some blocks will be replaced and later retrieved ECE331: Associative Caches 16

17 Cache Design Decisions For a given cache size Block (Line) size Number of Blocks (Lines) How is the cache organized Write policy Replacement Strategy Increase cache size More Blocks (Lines) More lines == Higher hit rate Slower Memory As many as practical ECE331: Associative Caches 17

18 Recall: Some Examples of Cache Memory From 7-cpu.com, a website for comparing processors with one another ECE331: Associative Caches 18

19 Another example Cache capacity is 4 one-word blocks Memory read sequence: 0, 8, 0, 6, Time step Block address Direct-Mapped Cache (1-way set associative) Block address Cache index Hit/miss Cache block 0 (0 modulo 4) = 0 6 (6 modulo 4) = 2 8 (8 modulo 4) = 0 Cache content after access miss Mem[0] miss Mem[8] miss Mem[0] miss Mem[0] Mem[6] miss Mem[8] Mem[6] 5 misses ECE331: Associative Caches 19

20 Another example Cache capacity is 4 one-word blocks Memory read sequence: 0, 8, 0, 6, Time step Block address 2-way set associative* Block address Cache index Hit/miss Cache set 0 (0 modulo 2) = 0 6 (6 modulo 2) = 0 8 (8 modulo 2) = miss Mem[0] miss Mem[0] Mem[8] hit Mem[0] Mem[8] miss Mem[0] Mem[6] miss Mem[8] Mem[6] 4 misses *Uses the principal of Least Recently Used (LRU) Cache content after access Set 0 Set 1 ECE331: Associative Caches 20

21 Another example Cache capacity is 4 one-word blocks Memory read sequence: 0, 8, 0, 6, Time step Block address Fully Associative* (4-way set associative) Block address Cache index Hit/miss Cache set 0 (0 modulo 2) = 0 6 (6 modulo 2) = 0 8 (8 modulo 2) = miss Mem[0] 1 8 miss Mem[0] Mem[8] 2 0 hit Mem[0] Mem[8] Cache content after access 3 6 miss Mem[0] Mem[8] Mem[6] 4 8 hit Mem[0] Mem[8] Mem[6] 3 misses (all necessary) *Uses the principal of Least Recently Used (LRU) ECE331: Associative Caches 21

22 Summary Today: Associative caches Provide more choices for block storage More expensive in terms of hardware Require comparators for tags Many caches are set associative Remember: Direct mapped = 1 way set associative Full associative = N way set associate (N is total blocks in cache) ECE331: Associative Caches 22

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Direct mapped cache Pretty simple to