ECE 2300 Digital Logic & Computer Organization. More Caches

Size: px

Start display at page:

Download "ECE 2300 Digital Logic & Computer Organization. More Caches"

Scarlett Allison
6 years ago
Views:

1 ECE 23 Digital Logic & Computer Organization Spring 217 More Caches 1

2 Prelim 2 stats High: 9 (out of 9) Mean: 7.2, Median: 73 Announcements Prelab 5(C) due tomorrow 2

3 Example: Direct Mapped (DM) Cache 32-bit memory address break down 2 tag bits 1 index bits 2 byte offset bits Block size: 2 2 = 4 bytes Number of cache blocks: 2 1 =124 Total cache capacity: 124 x 4B = 4KB 3

4 Example: DM Cache Address Breakdown Assuming 16-bit memory addresses, how many bits are associated with the tag, index, and offset of the following configurations for a direct mapped cache? (a) 32 blocks, 8 bytes per block Byte offset: 3 bits; Index: 5 bits; Tag: 8 bits (b) 16 blocks, 16 bytes per block Byte offset: 4 bits; Index: 4 bits; Tag: 8 bits

5 Block Placement in DM Cache Direct mapped cache: Each memory block maps to one cache block Mapping conflicts may increase miss rate Block Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Memory with 8 blocks Direct mapped cache with 4 blocks 5

6 More Flexible Block Placement K-way Set Associate Cache: each memory block maps to one set, which contains K blocks A block can be stored anywhere in the set Block Block 1 Block 2 Block 3 Block 4 Set Set 1 Way Way 1 Block 5 Block 6 2-way set associative cache with 4 blocks Block 7 Memory with 8 blocks 6

7 4-way Set Associative Cache 256 sets (4 ways per set, 124 blocks) 7

8 Associative Caches K-way set associative Index bits determine which set to address Each set contains K entries (ways) All ways in the selected set are searched in parallel K comparators (more expensive than direct mapped) An extreme case: Fully associative Block can go in any cache location No need for index bits All entries are searched in parallel Comparator per entry (most expensive) 8

9 Address Translation for Associative Caches Breakdown of memory address for cache use n-i-b tag bits i index bits b byte offset bits Parameters for a K-way set associative cache Size of each cache block is 2 b bytes Number of sets is 2 i Number of blocks is K 2 i Total cache size is (K 2 b+i ) bytes 9

10 Spectrum of Associativity A K-way set associative cache with N blocks Number of cache sets S = N / K Number of index bits = log 2 (S) When K = N, fully associative cache ONE cache set à zero index bits When K = 1 (one-way), direct mapped cache N cache sets Increasing the associatively Typically improves the hit rate (fewer conflicts) But increases the hit time (takes longer to search) 1

11 Spectrum of Associativity For a cache with 8 blocks 11

12 2-way Set Associative Example Size of each block is 4 bytes Cache holds 4 blocks, 2-way set associative Memory holds 16 blocks Memory address 1 3 tag bits 2 byte offset bits 2 sets 1 index bit 2 ways 12

13 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 miss

14 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 1 miss

15 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 1 miss

16 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

17 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

18 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

19 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

20 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

21 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

22 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

23 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

24 Miss Classification Compulsory (Cold) misses Caused by the first access to a memory block Capacity misses Occur because the cache might not be big enough to hold the active set of memory blocks needed during program execution Conflict misses Occur with a direct mapped or set-associative cache when multiple memory blocks compete in the same set due to the inflexibility of block placement Would not occur in a fully associative cache 24

25 Misses vs. Associativity Example Compare different caches Capacity: 4 blocks (1 byte / block) Direct mapped, 2-way set associative, fully associative Block address sequence:, 8,, 6, 8 Direct mapped Block address Cache index Hit/miss Cache contents after access Blocks (or Sets) 25

26 Misses vs. Associativity Example Compare different caches Capacity: 4 blocks (1 byte / block) Direct mapped, 2-way set associative, fully associative Block address sequence:, 8,, 6, 8 (in decimal) Direct mapped Block address Cache Hit/miss index miss Mem[] 8 miss Mem[8] miss Mem[] 6 2 miss Mem[] Mem[6] 8 miss Mem[8] Mem[6] Cache contents after access Color code: Cold miss Conflict miss 26

27 Misses vs. Associativity Example 2-way set associative Block address Cache index Hit/miss Cache contents after access Set Set 1 Fully associative Ways Ways Block address Hit/miss Cache contents after access 27

28 Misses vs. Associativity Example 2-way set associative Block address Cache Hit/miss index miss Mem[] 8 miss Mem[] Mem[8] hit Mem[] Mem[8] 6 miss Mem[] Mem[6] 8 miss Mem[8] Mem[6] Fully associative Cache contents after access Set Set 1 Block Hit/miss Cache contents after access address miss Mem[] 8 miss Mem[] Mem[8] hit Mem[] Mem[8] 6 miss Mem[] Mem[8] Mem[6] 8 hit Mem[] Mem[8] Mem[6] Color code: Cold miss Conflict miss 28

29 Block Replacement Policy Direct mapped: no choice Set associative and fully associative Pick non-valid entry, if there is one Otherwise, choose among entries in the set Least recently used (LRU) Choose the one unused for the longest time Requires extra bits to order the blocks High overhead beyond 4-way set associative Random Similar performance as LRU for high associativity 29

30 LRU Replacement Example Fully associative (X) = LRU Age Block Hit/miss Cache contents after access address miss Mem[] () 4 miss Mem[] (1) Mem[4] () 2 miss Mem[] (2) Mem[4] (1) Mem[2] () 6 miss Mem[] (3) Mem[4] (2) Mem[2] (1) Mem[6] () 8 miss Mem[8] () Mem[4] (3) Mem[2] (2) Mem[6] (1) miss Mem[8] (1) Mem[] () Mem[2] (3) Mem[6] (2) 4 miss Mem[8] (2) Mem[] (1) Mem[4] () Mem[6] (3) 2 miss Mem[8] (3) Mem[] (2) Mem[4] (1) Mem[2] () 6 miss Mem[6] () Mem[] (3) Mem[4] (2) Mem[2] (1) 8 miss Mem[6] (1) Mem[8] () Mem[4] (3) Mem[2] (2) 2 hit Mem[6] (2) Mem[8] (1) Mem[4] (3) Mem[2] () 6 hit Mem[6] () Mem[8] (2) Mem[4] (3) Mem[2] (1) 2 hit Mem[6] (1) Mem[8] (2) Mem[4] (3) Mem[2] () miss Mem[6] (2) Mem[8] (3) Mem[] () Mem[2] (1) Color code: Cold miss Conflict miss Capacity miss 3

31 LRU Replacement Example 2-way set associative Block address Cache Hit/miss index miss Mem[] 4 miss Mem[] (*) Mem[4] 2 miss Mem[2] Mem[4] (*) 6 miss Mem[2] (*) Mem[6] 8 miss Mem[8] Mem[6] (*) miss Mem[8] (*) Mem[] 4 miss Mem[4] Mem[] (*) 2 miss Mem[4] (*) Mem[2] 6 miss Mem[6] Mem[2] (*) 8 miss Mem[6] (*) Mem[8] 2 miss Mem[2] Mem[8] (*) 6 miss Mem[2] (*) Mem[6] 2 hit Mem[2] Mem[6] (*) miss Mem[2] (*) Mem[] (*) = LRU block Cache contents after access Set Set 1 Color code: Cold miss Conflict miss Capacity miss 31

32 H&H 7.5.5, 8.2 Before Next Class Next Time More Caches Measuring Performance 32

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further