Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Size: px

Start display at page:

Download "Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)"

Samson Boyd
5 years ago
Views:

1 s akim Weatherspoon CS, Spring Computer Science Cornell University See P&.,. (except writes)

2 Big Picture: : big & slow vs s: small & fast compute jump/branch targets memory PC + new pc Instruction Fetch inst register file control extend detect hazard Instruction Decode imm B A ctrl alu forward unit Execute d in addr d out memory Write Back IF/ID ID/EX EX/E E/WB B D ctrl D ctrl

3 Goals for Today: caches s vs memory vs tertiary storage Tradeoffs: big & slow vs small & fast Best of both worlds working set: / rule ow to predict future: temporal & spacial locality Examples of caches: Direct apped Fully Associative N way set associative

4 Performance CPU clock rates ~.ns ns(gz z) Technology Capacity $/GB Latency Tape TB $. s of seconds Disk TB $. illions of cycles (ms) SSD (Flash) GB $ Thousands of cycles (us) DRA GB $ cycles (s of ns) SRA off chip B cycles (few ns) SRA on chip KB??? cycles (ns) Others: edra aka T SRA, FeRA, CD, DVD, Q: Can we create illusion of cheap + large + fast?

5 L becoming more common (edra?) Pyramid RegFile s bytes L (several KB) L (½ B) Pyramid < cycle access (B few GB) cycle access cycle access cycle access Disk (any GB few TB) + cycle access These are rough numbers: mileage may vary for latest/greatest s usually made of SRA (or edra)

6 ierarchy closer to processor small & fast stores active data farther from processor big & slow stores inactive data

7 ierarchy Insight for s If em[x] is was accessed recently... then em[x] is likely to be accessed soon Exploit temporal locality: Put recently accessed em[x] higher in memory hierarchy since it will likely be accessed again soon then em[x ± ε] is likely to be accessed soon Exploit spatial locality: Put entire block containing em[x] and surrounding addresses higher in memory hierarchy since nearby address will likely be accessed

8 ierarchy closer to processor is fast but small usually stores subset of memory farther away strictly inclusive Transfer whole blocks (cache lines): kb: disk ram b: ram L b: L L

9 trace xcab xcab xcaba xcabb xcabc xcabd xcabe xcabf xcab xcab xcab xcab xcab xcabc xc x xcab x xcab x xc... ierarchy int n = ; int k[] = {,,, }; int fib(int i) { if (i <= ) return i; else return fib(i )+fib(i ); } int main(int ac, char **av) { for (int i = ; i < n; i++) { printi(fib(k[i])); prints("\n"); } }

10 Lookups (Read) tries to access em[x] Check: is block containing em[x] in the cache? Yes: cache hit return requested data from cache line No: cache miss read block from memory (or lower level cache) (evict an existing cache line to make room) place new block in cache return requested data and stall the pipeline while all of this happens

11 Three common designs A given data block can be placed in exactly one cache line Direct apped in any cache line Fully Associative in a small set of cache lines Set Associative

12 line line Direct apped Direct apped Each block number mapped to a single cache line index Simplest hardware x x x xc x x x xc x x x xc x x x xc x x x

13 Direct apped Direct apped Each block number mapped to a single cache line index Simplest hardware line line line line x x x xc x x x xc x x x xc x x x xc x x x

14 Direct apped

15 Direct apped (Reading) Tag Index Offset V Tag Block = hit? word select data bits

16 Example:A Simple Direct apped Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ cache lines word block V

17 Example:A Simple Direct apped Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ cache lines word block bit tag field bit index field bit block offset V

18 st Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V

19 st Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

20 nd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

21 nd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V Addr: isses: its:

22 rd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

23 rd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V Addr: isses: its:

24 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

25 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

26 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

27 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

28 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

29 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

30 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

31 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

32 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

33 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

34 Three types of misses Cold (aka Compulsory) isses The line is being referenced for the first time Capacity The line was evicted because the cache was not large enough Conflict The line was evicted because of another access whose index conflicted

35 Q: ow to avoid Cold isses isses Unavoidable? The data was never in the cache Prefetching! Capacity isses Buy more SRA Conflict isses Use a more flexible cache design

36 Direct apped Example: th Access Using byte addresses in this example! Addr Bus = bits Pathological example LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

37 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

38 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

39 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: V isses: its:

40 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: its:

41 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V isses: + its:

42 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] V isses: + its:

43 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] V isses: ++ its:

44 Organization ow to avoid Conflict isses Three common designs Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set associative: Block can be in a few ( to ) places in the cache

45 Fully Associative (Reading) Tag Offset VTag Block = = = = hit? line select word select data bytes bits

46 Example:Simple Fully Associative Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ cache lines word block bit tag field bit block offset V V V V V

47 st Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ V

48 st Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ LRU Addr: isses: its:

49 nd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ lru isses: its:

50 nd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

51 rd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

52 rd Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

53 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

54 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

55 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

56 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

57 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

58 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

59 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

60 th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ Addr: isses: its:

61 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its:

62 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ isses: its: +

63 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] isses: its: +

64 th and th Access LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] isses: its: ++

65 Eviction Which cache line should be evicted from the cache to make room for a new line? Direct mapped no choice, must evict line selected by index Associative caches random: select one of the lines at random round robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time

66 Direct apped + Smaller + Less + Less + Faster + Less + Very Lots Low Common Tradeoffs Tag Size SRA Overhead Controller Logic Speed Price Scalability # of conflict misses it rate Pathological Cases? Fully Associative Larger ore ore Slower ore Not Very Zero + igh +?

67 Set associative cache Compromise Like a direct mapped cache Index into a location Fast Like a fully associative cache Can store multiple entries decreases thrashing in cache Search in each element

68 -Way Set Associative (Reading) Tag Index Offset = = = hit? line select word select data bytes bits

69 Comparison: Direct apped Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit index field bit block offset field isses: its:

70 Comparison: Direct apped Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit index field bit block offset field isses: its:

71 Comparison: Fully Associative Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit block offset field isses: its:

72 Comparison: Fully Associative Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit block offset field isses: its:

73 Comparison: Way Set Assoc Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] sets word block bit tag field bit set index field bit block offset field isses: its:

74 Comparison: Way Set Assoc Using byte addresses in this example! Addr Bus = bits LB [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] sets word block bit tag field bit set index field bit block offset field isses: its:

75 To Do: Evicting cache lines Picking cache parameters Writing using the cache Remaining Issues

76 Administrivia W due today, arch th Project due next onday, April nd Prelim Thursday, arch th at :pm in Philips Review session today : :pm in Phillips Survey and Improvements

77 Next six weeks Administrivia Week : Prelim and W due Week : Project due and Lab and W handout Week : Lab and W due and Project handout Week : Project design doc due and W handout Week : Project and W due and Lab handout Week : Project handout Final Project for class Week : Project design doc Week : Project due

78 Summary Caching assumptions small working set: / rule can predict future: spatial & temporal locality Benefits big & fast memory built from (big & slow) + (small & fast) Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate Fully Associative higher hit cost, higher hit rate Larger block size lower hit cost, higher miss penalty Next up: other designs; writing to caches

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca