Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Size: px

Start display at page:

Download "Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University"

Geraldine Jennings
5 years ago
Views:

1 s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University

2 What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca E) Study

3 inst Big Picture: emory imm B A ctrl B D ctrl D ctrl Code Stored in emory (also, data and stack) memory PC + new pc Instruction Fetch compute jump/branch targets $ (zero) $ ($at) register file $ ($sp) $ ($ra) control extend detect hazard Instruction Decode alu forward unit Execute d in addr d out memory Stack, Data, Code Stored in emory emory Write- Back IF/ID ID/EX EX/E E/WB

4 ctrl ctrl ctrl inst imm B A B D D memory PC + new pc Instruction Fetch Big Picture: emory emory: big & slow vs s: small & fast Code Stored in emory (also, data and stack) compute jump/branch targets $ (zero) $ ($at) register file $ ($sp) $ ($ra) control extend detect hazard Instruction Decode alu forward unit Execute d in addr d out memory Stack, Data, Code Stored in emory emory Write- Back IF/ID ID/EX EX/E E/WB

5 ctrl ctrl ctrl inst imm B A B D D $$ memory PC + new pc Instruction Fetch Big Picture: emory emory: big & slow vs s: small & fast Code Stored in emory (also, data and stack) compute jump/branch targets $ (zero) $ ($at) register file $ ($sp) $ ($ra) control extend detect hazard Instruction Decode alu forward unit Execute $$ addr d in d out memory Stack, Data, Code Stored in emory emory Write- Back IF/ID ID/EX EX/E E/WB

6 Goals for Today: caches s vs memory vs tertiary storage Tradeoffs: big & slow vs small & fast working set: / rule ow to predict future: temporal & spacial locality Examples of caches: Direct apped Fully Associative N-way set associative Caching Questions ow does a cache work? ow effective is the cache (hit rate/miss rate)? ow large is the cache? ow fast is the cache (AAT=average memory access time)

7 Big Picture ow do we make the processor fast, Given that memory is VEEERRRYYYY SLLOOOWWW!!

8 Performance CPU clock rates ~.ns ns (Gz-z) Technology Capacity $/GB Latency Tape TB $. s of seconds Disk TB $. illions of cycles (ms) SSD (Flash) GB $ Thousands of cycles (us) DRA GB $ - cycles (s of ns) SRA off-chip B $ - cycles (few ns) SRA on-chip KB??? - cycles (ns) Others: edra aka T SRA, FeRA, CD, DVD, Q: Can we create illusion of cheap + large + fast?

9 L becoming more common (edra?) emory Pyramid RegFile s bytes L (several KB) L (½-B) emory Pyramid < cycle access emory (B few GB) - cycle access - cycle access - cycle access Disk (any GB few TB) + cycle access These are rough numbers: mileage may vary for latest/greatest s usually made of SRA (or edra)

10 emory ierarchy emory closer to processor small & fast stores active data emory farther from processor big & slow stores inactive data L/L SRA L SRA-on-chip emory DRA

11 emory ierarchy emory closer to processor small & fast stores active data emory farther from processor big & slow stores inactive data % of data inactive (not accessed) emory DRA % of data access the most L SRA-on-chip L/L SRA % of data is active $R Reg LW $R, em

12 emory ierarchy Insight for s If em[x] was accessed recently... then em[x] is likely to be accessed soon Exploit temporal locality: Put recently accessed em[x] higher in memory hierarchy since it will likely be accessed again soon then em[x ± ε] is likely to be accessed soon Exploit spatial locality: Put entire block containing em[x] and surrounding addresses higher in memory hierarchy since nearby address will likely be accessed

13 emory ierarchy emory closer to processor is fast but small usually stores subset of memory farther away strictly inclusive Transfer whole blocks (cache lines): kb: disk ram b: ram L b: L L

14 emory trace xcab xcab xcaba xcabb xcabc xcabd xcabe xcabf xcab xcab xcab xcab xcab xcabc xc x xcab x xcab x xc... emory ierarchy int n = ; int k[] = {,,, }; int fib(int i) { if (i <= ) return i; else return fib(i-)+fib(i-); } Temporal Locality int main(int ac, char **av) { for (int i = ; i < n; i++) { printi(fib(k[i])); prints("\n"); } Spatial Locality }

15 Lookups (Read) tries to access em[x] Check: is block containing em[x] in the cache? Yes: cache hit return requested data from cache line No: cache miss read block from memory (or lower level cache) (evict an existing cache line to make room) place new block in cache return requested data and stall the pipeline while all of this happens

16 Three common designs A given data block can be placed in exactly one cache line Direct apped in any cache line Fully Associative in a small set of cache lines Set Associative

17 Direct apped Direct apped line line Each block number mapped to a single cache line index Simplest hardware cachelines -words per cacheline x x x xc x x x xc x x x xc x x x xc x x x emory

18 Direct apped Direct apped line line Each block number mapped to a single cache line index Simplest hardware -addr x tag -bits x index offset -bits -bits x xc cachelines -words per cacheline addr x x x xc x x x xc x x x xc x x x xc x x x emory

19 Direct apped Direct apped line line Each block number mapped to a single cache line index Simplest hardware -addr x tag -bits x index offset -bits -bits x xc cachelines -words per cacheline line line x x x xc x x x xc x x x xc x x x xc x x x emory

20 Direct apped Direct apped line line line line Each block number mapped to a single cache line index Simplest hardware -addr x tag -bits x cachelines -words per cacheline index offset -bits -bits line line line line line line line line x x x xc x x x xc x x x xc x x x xc x x x emory

21 Direct apped

22 Direct apped (Reading) Tag Index Offset V Tag Block word selector Byte offset in word tag offset index hit? = word select data bits

23 Direct apped (Reading) Tag Index Offset V Tag n bit index, m bit offset Q: ow big is cache (data only)?

24 Direct apped (Reading) Tag Index Offset m bytes per block V Tag Block n blocks n bit index, m bit offset Q: ow big is cache (data only)? of size n blocks Block size of m bytes Size: n bytes per block x n blocks = n+m bytes

25 Direct apped (Reading) Tag Index Offset V Tag Block n bit index, m bit offset Q: ow much SRA is needed (data + overhead)?

26 Direct apped (Reading) Tag Index Offset m bytes per block V Tag Block n blocks n bit index, m bit offset Q: ow much SRA is needed (data + overhead)? of size n blocks Block size of m bytes Tag field: (n + m), Valid bit: SRA Size: n x (block size + tag size + valid bit size) = n x ( m bytes x bits-per-byte + ( n m) + )

27 Example:A Simple Direct apped Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ cache lines word block V tag data emory

28 Example:A Simple Direct apped Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ cache lines word block bit tag field bit index field bit block offset V tag data emory

29 st Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data

30 st Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag isses: its: data

31 nd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag isses: its: data

32 nd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V Addr: tag isses: its: data

33 rd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag isses: its: data

34 rd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V Addr: tag isses: its: data

35 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

36 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag data isses: its:

37 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

38 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag data isses: its:

39 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

40 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag data isses: its:

41 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

42 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag isses: its: data

43 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag isses: its: data

44 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag isses: its: data

45 isses Three types of misses Cold (aka Compulsory) The line is being referenced for the first time Capacity The line was evicted because the cache was not large enough Conflict The line was evicted because of another access whose index conflicted

46 Q: ow to avoid Cold isses isses Unavoidable? The data was never in the cache Prefetching! Capacity isses Buy more SRA Conflict isses Use a more flexible cache design

47 Direct apped Example: th Access Using byte addresses in this example! Addr Bus = bits Pathological example emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

48 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag data isses: its:

49 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data isses: its:

50 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: V tag isses: its: data

51 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag isses: its: data

52 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag isses: + its: data

53 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] V tag isses: + its: data

54 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] V tag data isses: ++ its:

55 Organization ow to avoid Conflict isses Three common designs Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set-associative: Block can be in a few ( to ) places in the cache

56 Fully Associative (Reading) Tag Offset V Tag Block = = = = hit? line select word select data bytes bits

57 Fully Associative (Reading) Tag Offset V Tag Block m bit offset, n blocks (cache lines) Q: ow big is cache (data only)?

58 Fully Associative (Reading) Tag Offset V Tag Block m bit offset Q: ow big is cache (data only)? of size n blocks Block size of m bytes, n blocks (cache lines) Size: number-of-blocks x block size = n x m bytes = n+m bytes

59 Fully Associative (Reading) Tag Offset V Tag Block m bit offset, n blocks (cache lines) Q: ow much SRA needed (data + overhead)?

60 Fully Associative (Reading) Tag Offset V Tag Block m bit offset Q: ow much SRA needed (data + overhead)? of size n blocks Block size of m bytes Tag field: m Valid bit:, n blocks (cache lines) SRA size: n x (block size + tag size + valid bit size) = n x ( m bytes x bits-per-byte + (-m) + )

61 Example:Simple Fully Associative Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ cache lines word block bit tag field bit block offset V tag data V V V V emory

62 st Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ V tag data

63 st Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ LRU Addr: tag isses: its: data

64 nd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ lru tag isses: its: data

65 nd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag data isses: its:

66 rd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag data isses: its:

67 rd Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag data isses: its:

68 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag data isses: its:

69 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag data isses: its:

70 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag data isses: its:

71 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag data isses: its:

72 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag data isses: its:

73 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag isses: its: data

74 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag isses: its: data

75 th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ Addr: tag isses: its: data

76 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag isses: its: data

77 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] $ $ $ $ tag isses: data its: +

78 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] tag isses: data its: +

79 th and th Access emory LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] tag isses: data its: ++

80 Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped no choice, must evict line selected by index Associative caches random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time

81 Direct apped + Smaller + Less + Less + Faster + Less + Very Lots Low Common Tradeoffs Tag Size SRA Overhead Controller Logic Speed Price Scalability # of conflict misses it rate Pathological Cases? Fully Associative Larger ore ore Slower ore Not Very Zero + igh +?

82 Set-associative cache Compromise Like a direct-mapped cache Index into a location Fast Like a fully-associative cache Can store multiple entries decreases thrashing in cache Search in each element

83 -Way Set Associative (Reading) Tag Index Offset = = = hit? line select word select data bytes bits

84 Comparison: Direct apped Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit index field bit block offset field tag data isses: its: emory

85 Comparison: Direct apped Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit index field bit block offset field tag data isses: its: emory

86 Comparison: Fully Associative Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit block offset field isses: its: tag data emory

87 Comparison: Fully Associative Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] cache lines word block bit tag field bit block offset field tag isses: its: data emory

88 Comparison: Way Set Assoc Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] tag data sets word block bit tag field bit set index field bit block offset field isses: its: emory

89 Comparison: Way Set Assoc Using byte addresses in this example! Addr Bus = bits LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] LB $ [ ] tag data sets word block bit tag field bit set index field bit block offset field isses: its: emory

90 Remaining Issues To Do: Evicting cache lines Picking cache parameters Writing using the cache

91 Summary Caching assumptions small working set: / rule can predict future: spatial & temporal locality Benefits big & fast memory built from (big & slow) + (small & fast) Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate Fully Associative higher hit cost, higher hit rate Larger block size lower hit cost, higher miss penalty Next up: other designs; writing to caches

92 Administrivia Upcoming agenda W was due yesterday Wednesday, arch th PA Work-in-Progress circuit due before spring break Spring break: Saturday, arch th to Sunday, arch th Prelim Thursday, arch th, right after spring break PA due Thursday, April th

93 ave a great Spring Break!!!

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca