Caches & Memory. CS 3410 Computer System Organization & Programming

Size: px
Start display at page:

Download "Caches & Memory. CS 3410 Computer System Organization & Programming"

Transcription

1 Caches & Memory CS 34 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 34 by Professors Weatherspoon, Bala, Bracy, and Sirer.

2 Programs C Code int main (int argc, char* argv[ ]) { int i; int m = n; int sum = ; } for (i = ; i <= m; i++) { sum += i; } printf (..., n, sum); Load/Store Architectures: Read data from memory (put in registers) Manipulate it Store it back to memory MIPS Assembly main: addiu $sp,$sp,-48 sw $3,44($sp) sw $fp,4($sp) move $fp,$sp sw $4,48($fp) sw $5,52($fp) la $2,n lw $2,($2) sw $2,28($fp) sw $,32($fp) li $2, sw $2,24($fp) $L2: lw $2,24($fp) lw $3,28($fp) slt $2,$3,$2 bne $2,$,$L3... Instructions that read from or write to memory 2

3 Cycle Per Stage: the Biggest Lie (So Far) Code Stored in Memory (also, data and stack) compute jump/branch targets memory PC +4 new pc Instruction Fetch inst detect hazard register file control extend Instruction Decode imm B A ctrl Execute ALU forward unit d in addr d out memory Memory IF/ID ID/EX EX/MEM MEM/WB B D ctrl Stack, Data, Code Stored in Memory D M ctrl Write- Back 3

4 What s the problem? CPU Main Memory + big slow far away SandyBridge Motherboard, 2 4

5 The Need for Speed CPU Pipeline 5

6 The Need for Speed CPU Pipeline Instruction speeds: add,sub,shift: cycle mult: 3 cycles load/store: cycles off-chip 5(-7)ns 2(-3) GHz processor.5 ns clock 6

7 The Need for Speed CPU Pipeline 7

8 What s the solution? Caches! Level 2 $ Level Data $ Level Insn $ Intel Pentium 3, 999 8

9 Aside Go back to 4-state and 5-memory and look at how registers, SRAM and DRAM are built. 9

10 What s the solution? Caches! Level 2 $ Level Data $ Level Insn $ What lucky data gets to go here? Intel Pentium 3, 999

11 Locality Locality Locality If you ask for something, you re likely to ask for: the same thing again soon Temporal Locality something near that thing, soon Spatial Locality total = ; for (i = ; i < n; i++) total += a[i]; return total;

12 Clicker Questions This highlights the temporal and spatial locality of data. Q: Which line of code exhibits good temporal locality? total = ; 2 for (i = ; i < n; i++) { 3 n--; 4 total += a[i]; 5 return total; A) B) 2 C) 3 D) 4 E) 5 2

13 Clicker Questions This highlights the temporal and spatial locality of data. Q: Which line of code exhibits good temporal locality? total = ; 2 for (i = ; i < n; i++) { 3 n--; 4 total += a[i]; 5 return total; A) B) 2 C) 3 D) 4 E) 5 3

14 Clicker Questions This highlights the temporal and spatial locality of data. total = ; 2 for (i = ; i < n; i++) { 3 n--; 4 total += a[i]; 5 return total; Q: Which line of code exhibits good temporal locality? Q2: Which line of code exhibits good spatial locality with the line after it? A) B) 2 C) 3 D) 4 E) 5 4

15 Clicker Questions This highlights the temporal and spatial locality of data. total = ; 2 for (i = ; i < n; i++) { 3 n--; 4 total += a[i]; 5 return total; Q: Which line of code exhibits good temporal locality? Q2: Which line of code exhibits good spatial locality with the line after it? A) B) 2 C) 3 D) 4 E) 5 5

16 Your life is full of Locality Last Called Speed Dial Favorites Contacts Google/Facebook/ 6

17 Your life is full of Locality 7

18 The Memory Hierarchy cycle, Registers 28 bytes L Caches 4 cycles, 64 KB Small, Fast Intel Haswell Processor, 23 L2 Cache L3 Cache Main Memory Disk 2 cycles, 256 KB 36 cycles, 2-2 MB 5-7 ns, 52 MB 4 GB 5-2 ms 6GB 4 TB, Big, Slow 8

19 Cache hit Some Terminology data is in the Cache t hit : time it takes to access the cache Hit rate (%hit): # cache hits / # cache accesses Cache miss data is not in the Cache t miss : time it takes to get the data from below the $ Miss rate (%miss): # cache misses / # cache accesses Cacheline or cacheblock or simply line or block Minimum unit of info that is present/or not in the cache 9

20 The Memory Hierarchy cycle, Registers 28 bytes L Caches L2 Cache L3 Cache Main Memory Disk 4 cycles, 64 KB 2 cycles, 256 KB 36 cycles, 2-2 MB average access time t avg = t hit + % miss * t miss = 4 + 5% x = 9 cycles 5-7 ns, 52 MB 4 GB 5-2 ms 6GB 4 TB, Intel Haswell Processor, 23 2

21 Single Core Memory Hierarchy ON CHIP Registers L Caches L2 Cache L3 Cache Main Memory Disk Processor Regs I$ D$ L2 Main Memory Disk 2

22 Multi-Core Memory Hierarchy ON CHIP Processor Processor Processor Processor Regs Regs Regs Regs I$ D$ I$ D$ I$ D$ I$ D$ L2 L2 L2 L2 L3 Main Memory Disk 22

23 Memory Hierarchy by the Numbers CPU clock rates ~.33ns 2ns (3GHz-5MHz) Memory technology SRAM (on chip) SRAM (off chip) DRAM Transistor count* Access time *Registers,D-Flip Flops: - s of registers Access time in cycles $ per GIB in 22 Capacity 6-8 transistors ns -3 cycles $4k 256 KB transistor (needs refresh).5-3 ns 5-5 cycles $4k 32 MB 5-7 ns 5-2 cycles $-$2 8 GB SSD (Flash) 5k-5k ns Tens of thousands Disk 5M-2M ns Millions $.5- $. $.75-$ 52 GB 4 TB 23

24 Basic Cache Design Direct Mapped Caches 24

25 MEMORY 6 Byte Memory load r Byte-addressable memory 4 address bits 6 bytes total b addr bits 2 b bytes in memory addr data A B C D E F G H J K L M N O P Q 25

26 MEMORY 4-Byte, Direct Mapped Cache addr data index XXXX index CACHE data A B C D Cache entry = row = (cache) line = (cache) block Block Size: byte Direct mapped: Each address maps to cache block 4 entries 2 index bits (2 n n bits) Index with LSB: Supports spatial locality A B C D E F G H J K L M N O P Q 26

27 Analogy to a Spice Rack Spice Rack (Cache) index spice Spice Wall (Memory) A B C D E F Z Compared to your spice wall Smaller Faster More costly (per oz.) 27

28 Analogy to a Spice Rack Spice Rack (Cache) index tag spice Spice Wall (Memory) A B C D E F Z Cinnamon How do you know what s in the jar? Need labels Tag = Ultra-minimalist label 28

29 4-Byte, Direct Mapped tag index XXXX index Cache tag CACHE data Tag: minimalist label/address address = tag + index A B C D MEMORY addr data A B C D E F G H J K L M N O P Q

30 4-Byte, Direct Mapped Cache CACHE index V tag data X X X X One last tweak: valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 3

31 Simulation # of a 4-byte, DM Cache tag index XXXX index load Miss CACHE V tag data X X X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 3

32 Simulation # of a 4-byte, DM Cache tag index XXXX index CACHE V tag data N xx X xx X xx X load Miss Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 32

33 Simulation # of a 4-byte, DM Cache tag index XXXX load... load Awesome! index Miss Hit! CACHE V tag data N X X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 33

34 Block Diagram 4-entry, direct mapped Cache tag index 2 2 CACHE V tag data 2 8 Great! Are we done? = Hit! data 36

35 Clicker: A) Hit B) Miss load load load load Simulation #2: 4-byte, DM Cache index Miss CACHE V tag data X X X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 37

36 load load load load Simulation #2: 4-byte, DM Cache index Miss CACHE V tag data N X X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 38

37 Clicker: A) Hit B) Miss load load load load Simulation #2: 4-byte, DM Cache index Miss Miss CACHE V tag data N X X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 39

38 tag index XXXX load load load load Simulation #2: 4-byte, DM Cache index Miss Miss CACHE V tag data N O X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 4

39 Clicker: A) Hit B) Miss load load load load Simulation #2: 4-byte, DM Cache index Miss Miss Miss CACHE V tag data N O xx X xx X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 4

40 load load load load Simulation #2: 4-byte, DM Cache index Miss Miss Miss CACHE V tag data E O X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 42

41 Clicker: A) Hit B) Miss load load load load Simulation #2: 4-byte, DM Cache index Miss Miss Miss Miss CACHE V tag data E O X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 43

42 tag index XXXX load load load load Simulation #2: 4-byte, DM Cache index Miss Miss Miss Miss CACHE V tag data N O X X cold cold cold Disappointed! MEMORY addr data A B C D E F G H J K L M N O P Q 44

43 Reducing Cold Misses by Increasing Block Size Leveraging Spatial Locality 45

44 MEMORY Increasing Block Size addr data A offset index XXXX CACHE V tag data x A B x C D x E F x G H Block Size: 2 bytes Block Offset: least significant bits indicate where you live in the block Which bits are the index? tag? B C D E F G H J K L M N O P Q 46

45 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss CACHE V tag data x X X x X X x X X x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 47

46 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 48

47 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss Hit! CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 49

48 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss Hit! Miss CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 5

49 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss Hit! Miss CACHE V tag data x X X x X X E F x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 5

50 index tag offset XXXX load load load load Simulation #3: 8-byte, DM Cache index Miss Hit! Miss Miss CACHE V tag data x X X x X X E F x X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 52

51 load load load load Simulation #3: 8-byte, DM Cache index Miss Hit! Miss Miss CACHE V tag data x X X x X X E F x X X cold cold conflict hit, 3 misses 3 bytes don t fit in a 4 entry cache? MEMORY addr data A B C D E F G H J K L M N O P Q 53

52 Removing Conflict Misses with Fully-Associative Caches 54

53 8 byte, fully-associative XXXX tag offset V tag data xxx X X V tag data xxx X X Cache CACHE V tag data xxx X X What should the offset be? What should the index be? What should the tag be? V tag data xxx X X Clicker: A) xxxx B) xxxx C) xxxx D) xxxx E) None MEMORY addr data A B C D E F G H J K L M N O P Q 55

54 XXXX tag offset V tag data xxx X X load load load load LRU Pointer Simulation #4: 8-byte, FA Cache V tag data xxx X X Miss CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits MEMORY addr data A B C D E F G H J K L M N O P Q 56

55 XXXX tag offset V tag data N O load load load load Simulation #4: 8-byte, FA Cache V tag data xxx X X Miss Hit! CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits MEMORY addr data A B C D E F G H J K L M N O P Q 57

56 XXXX tag offset V tag data N O load load load load LRU Pointer Simulation #4: 8-byte, FA Cache V tag data xxx X X Miss Hit! Miss CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits MEMORY addr data A B C D E F G H J K L M N O P Q 58

57 XXXX tag offset V tag data N O load load load load LRU Pointer Simulation #4: 8-byte, FA Cache V tag data E F Miss Hit! Miss Hit! CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits MEMORY addr data A B C D E F G H J K L M N O P Q 59

58 Pros and Cons of Full Associativity + No more conflicts! + Excellent utilization! But either: Parallel Reads lots of reading! Serial Reads lots of waiting t avg = t hit + % miss * t miss = 4 + 5% x = 9 cycles = 6 + 3% x = 9 cycles 6

59 Pros & Cons Direct Mapped Fully Associative Tag Size Smaller Larger SRAM Overhead Less More Controller Logic Less More Speed Faster Slower Price Less More Scalability Very Not Very # of conflict misses Lots Zero Hit Rate Low High Pathological Cases Common?

60 Reducing Conflict Misses with Set-Associative Caches Not too conflict-y. Not too slow. Just Right! 62

61 8 byte, 2-way set associative Cache XXXX tag offset index CACHE index V tag data V tag data xx E F xx N O xx C D xx P Q What should the offset be? What should the index be? What should the tag be? MEMORY addr data A B C D E F G H J K L M N O P Q 63

62 Clicker Question XXXXX 5 bit address 2 byte block size 24 byte, 3-Way Set Associative CACHE index V tag data V tag data V tag data? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y How many tag bits? A) B) C) 2 D) 3 E) 4 64

63 Clicker Question XXXXX 5 bit address 2 byte block size 24 byte, 3-Way Set Associative CACHE index V tag data V tag data V tag data? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y? X Y How many tag bits? A) B) C) 2 D) 3 E) 4 65

64 8 byte, 2-way set associative Cache XXXX tag offset index index load load load load LRU Pointer V tag data CACHE xx X X xx X X Miss V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 66

65 8 byte, 2-way set associative Cache XXXX tag offset index index load load load load LRU Pointer V tag data CACHE N O xx X X Miss Hit! V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 67

66 8 byte, 2-way set associative Cache XXXX tag offset index index load load load load LRU Pointer V tag data CACHE N O xx X X Miss Hit! Miss V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 68

67 8 byte, 2-way set associative Cache XXXX tag offset index index load load load load LRU Pointer V tag data CACHE N O xx X X Miss Hit! Miss Hit! V tag data E F xx X X Lookup: Index into $ Check tag Check valid bit MEMORY addr data A B C D E F G H J K L M N O P Q 69

68 Eviction Policies Which cache line should be evicted from the cache to make room for a new line? Direct-mapped: no choice, must evict line selected by index Associative caches Random: select one of the lines at random Round-Robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time 7

69 Misses: the Three C s Cold (compulsory) Miss: never seen this address before Conflict Miss: cache associativity is too low Capacity Miss: cache is too small 7

70 Miss Rate vs. Block Size 72

71 Block Size Tradeoffs For a given total cache size, Larger block sizes mean. fewer lines so fewer tags, less overhead and fewer cold misses (within-block prefetching ) But also fewer blocks available (for scattered accesses!) so more conflicts can decrease performance if working set can t fit in $ and larger miss penalty (time to fetch block)

72 Miss Rate vs. Associativity 74

73 Clicker Question What does NOT happen when you increase the associativity of the cache? A) Conflict misses decrease B) Tag overhead decreases C) Hit time increases D) Cache stays the same size 75

74 Clicker Question What does NOT happen when you increase the associativity of the cache? A) Conflict misses decrease B) Tag overhead decreases C) Hit time increases D) Cache stays the same size 76

75 ABCs of Caches t avg = t hit + % miss * t miss + Associativity: conflict misses hit time + Block Size: cold misses conflict misses + Capacity: capacity misses hit time 77

76 Which caches get what properties? t avg = t hit + % miss * t miss L Caches L2 Cache L3 Cache Fast Big Design with speed in mind More Associative Bigger Block Sizes Larger Capacity Design with miss rate in mind 78

77 Roadmap Things we have covered: The Need for Speed Locality to the Rescue! Calculating average memory access time $ Misses: Cold, Conflict, Capacity $ Characteristics: Associativity, Block Size, Capacity Things we will now cover: Cache Figures Cache Performance Examples Writes 79

78 2-Way Set Associative Cache (Reading) Tag Index Offset V Tag Data V Tag Data = = hit? line select word select data 64bytes 32bits 8

79 3-Way Set Associative Cache (Reading) Tag Index Offset = = = hit? line select word select data 64bytes 32bits 8

80 How Big is the Cache? Tag Index Offset n bit index, m bit offset, N-way Set Associative Question: How big is cache? Data only? (what we usually mean when we ask how big is the cache) Data + overhead? 82

81 How Big is the Cache? Tag Index Offset n bit index, m bit offset, N-way Set Associative Question: How big is cache? How big is the cache (Data only)? Cache of size 2 n sets Block size of 2 m bytes, N-way set associative Cache Size: 2 m bytes-per-block x (2 n sets x N-way-per-set) = N x 2 n+m bytes 83

82 n bit index, m bit offset, N-way Set Associative Question: How big is cache? How big is the cache (Data + overhead)? Cache of size 2 n sets Block size of 2 m bytes, N-way set associative Tag field: 32 (n + m), Valid bit: SRAM Size: 2 n sets x N-way-per-set x (block size + tag size + valid bit size) = 2 n x N-way x (2 m 84 bytes x 8 bits-per-byte + (32 n m) + ) How Big is the Cache? Tag Index Offset

83 Performance Calculation with $ Hierarchy t avg = t hit + % miss * t miss Parameters Reference stream: all loads D$: t hit = ns, % miss = 5% L2: t hit = ns, % miss = 2% (local miss rate) Main memory: t hit = 5ns What is t avgd$ without an L2? t missd$ = t avgd$ = What is t avgd$ with an L2? t missd$ = t avgl2 = t avgd$ = 85

84 Performance Calculation with $ Hierarchy Parameters Reference stream: all loads D$: t hit = ns, % miss = 5% L2: t hit = ns, % miss = 2% (local miss rate) Main memory: t hit = 5ns What is t avgd$ without an L2? t missd$ = t avgd$ = What is t avgd$ with an L2? t missd$ = t avgl2 = t avgd$ = t avg = t hit + % miss * t miss t hitm t hitd$ + % missd$ *t hitm = ns+(.5*5ns) = 3.5ns t avgl2 t hitl2 +% missl2 *t hitm = ns+(.2*5ns) = 2ns t hitd$ + % missd$ *t avgl2 = ns+(.5*2ns) = 2ns 86

85 Performance Summary Average memory access time (AMAT) depends on: cache architecture and size Hit and miss rates Access times and miss penalty Cache design a very complex problem: Cache size, block size (aka line size) Number of ways of set-associativity (, N, ) Eviction policy Number of levels of caching, parameters for each Separate I-cache from D-cache, or Unified cache Prefetching policies / instructions Write policy 87

86 Takeaway Direct Mapped fast, but low hit rate Fully Associative higher hit cost, higher hit rate Set Associative middleground Line size matters. Larger cache lines can increase performance due to prefetching. BUT, can also decrease performance is working set size cannot fit in cache. Cache performance is measured by the average memory access time (AMAT), which depends cache architecture and size, but also the access time for hit, miss penalty, hit rate. 88

87 What about Stores? We want to write to the cache. If the data is not in the cache? Bring it in. (Write allocate policy) Should we also update memory? Yes: write-through policy No: write-back policy

88 Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Write-Through Cache Register File $ $ $2 $3 6 byte, byte-addressed memory 4 btye, fully-associative cache: 2-byte blocks, write-allocate 4 bit addresses: 3 bit tag, bit offset lru V tag data Cache Misses: Hits: Reads: Writes: Memory

89 Write-Through (REF ) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 lru V tag data Cache Misses: Hits: Reads: Writes: Memory

90 Write-Through (REF ) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M lru V tag data Cache 78 Misses: Hits: Reads: 2 Writes: Memory

91 Write-Through (REF 2) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M lru V tag data Cache 78 Misses: Hits: Reads: 2 Writes: Memory

92 Write-Through (REF 2) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M 73 lru V tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

93 Write-Through (REF 3) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M 73 CLICKER: (A) HIT (B) MISS lru V tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

94 Write-Through (REF 3) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit 73 lru V tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

95 Write-Through (REF 4) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M 73 write-allocate lru V tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

96 Write-Through (REF 4) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M 73 lru V tag data Cache Misses: 3 Hits: Reads: 6 Writes: Memory

97 Write-Through (REF 5) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M 73 CLICKER: (A) HIT (B) MISS lru V tag data Cache 73 7 Misses: 3 Hits: Reads: 6 Writes: Memory

98 Write-Through (REF 5) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M 33 lru V tag data Cache Misses: 4 Hits: Reads: 8 Writes: Memory

99 Write-Through (REF 6) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M 33 lru V tag data Cache Misses: 4 Hits: Reads: 8 Writes: Memory

100 Write-Through (REF 6) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] Register File $ $ $2 $3 M M Hit SB $ M[ 5 ] M LB $2 M[ ] M SB $ M[ 5 ] Hit SB $ M[ ] 33 lru V tag data Cache Misses: 4 Hits: 2 Reads: 8 Writes: Memory

101 Write-Through (REF 7) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] Register File $ $ $2 $3 M M Hit SB $ M[ 5 ] M LB $2 M[ ] M SB $ M[ 5 ] Hit SB $ M[ ] 33 lru V tag data Cache Misses: 4 Hits: 2 Reads: 8 Writes: Memory

102 Write-Through (REF 7) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] Register File $ $ $2 $3 M M Hit SB $ M[ 5 ] M LB $2 M[ ] M SB $ M[ 5 ] Hit SB $ M[ ] Hit 33 lru V tag data Cache Misses: 4 Hits: 3 Reads: 8 Writes: Memory

103 Summary: Write Through Write-through policy with write allocate Cache miss: read entire block from memory Write: write only updated item to memory Eviction: no need to write to memory

104 Next Goal: Write-Through vs. Write-Back What if we DON T to write stores immediately to memory? Keep the current copy in cache, and update memory when data is evicted (write-back policy) Write-back all evicted lines? No, only written-to blocks

105 Write-Back Meta-Data (Valid, Dirty Bits) V D Tag Byte Byte 2 Byte N V = means the line has valid data D = means the bytes are newer than main memory When allocating line: Set V =, D =, fill in Tag and Data When writing line: Set D = When evicting line: If D = : just set V = If D = : write-back Data, then set D =, V =

106 Write-back Example Example: How does a write-back cache work? Assume write-allocate

107 Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Handling Stores (Write-Back) Register File $ $ $2 $3 6 byte, byte-addressed memory 4 btye, fully-associative cache: 2-byte blocks, write-allocate 4 bit addresses: 3 bit tag, bit offset lru V d tag data Cache Misses: Hits: Reads: Writes: Memory

108 Write-Back (REF ) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 lru V d tag data Cache Misses: Hits: Reads: Writes: Memory

109 Write-Back (REF ) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M lru V d tag data Cache 78 Misses: Hits: Reads: 2 Writes: Memory

110 Write-Back (REF 2) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M lru V d tag data Cache 78 Misses: Hits: Reads: 2 Writes: Memory

111 Write-Back (REF 2) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M 73 lru V d tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

112 Write-Back (REF 3) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M 73 lru V d tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

113 Write-Back (REF 3) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 lru V d tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

114 Write-Back (REF 4) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 lru V d tag data Cache Misses: 2 Hits: Reads: 4 Writes: Memory

115 Write-Back (REF 4) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 lru V d tag data Cache Misses: 3 Hits: Reads: 6 Writes: Memory

116 Write-Back (REF 4) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] M LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 lru V d tag data Cache Misses: 3 Hits: Reads: 6 Writes: Memory

117 Write-Back (REF 5) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] M LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 lru V d tag data Cache 73 7 Misses: 3 Hits: Reads: 6 Writes: Memory

118 Write-Back (REF 5) Instructions: LB $ M[ ] M LB $2 M[ 7 ] M SB $2 M[ ] Hit SB $ M[ 5 ] M LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 73 Eviction, WB dirty block lru V d tag data Cache 73 7 Misses: 3 Hits: Reads: 6 Writes: Memory

119 Write-Back (REF 5) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M 33 lru V d tag data Cache Misses: 4 Hits: Reads: 8 Writes: Memory

120 Write-Back (REF 6) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M 33 CLICKER: (A) HIT (B) MISS lru V d tag data Cache Misses: 4 Hits: Reads: 8 Writes: Memory

121 Write-Back (REF 6) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M Hit 33 lru V d tag data Cache Misses: 4 Hits: 2 Reads: 8 Writes: Memory

122 Write-Back (REF 7) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M Hit 33 lru V d tag data Cache Misses: 4 Hits: 2 Reads: 8 Writes: Memory

123 Write-Back (REF 7) Instructions: LB $ M[ ] LB $2 M[ 7 ] SB $2 M[ ] SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M Hit Hit 33 lru V d tag data Cache 28 7 Misses: 4 Hits: 3 Reads: 8 Writes: Memory

124 Write-Back (REF 8,9) Instructions:... SB $ M[ 5 ] LB $2 M[ ] SB $ M[ 5 ] SB $ M[ ] SB $ M[ 5 ] SB $ M[ ] Register File $ $ $2 $3 M M Hit M M Hit Hit 33 Cheap subsequent updates! lru V d tag data Cache 28 7 Misses: 4 Hits: 3 Reads: 8 Writes: Memory

125 Write-Back (REF 8,9) Instructions:... SB $ M[ 5 ] Register File $ $ $2 $3 M M Hit M LB $2 M[ ] M SB $ M[ 5 ] Hit SB $ M[ ] Hit SB $ M[ 5 ] Hit SB $ M[ ] Hit 33 lru V d tag data Cache 28 7 Misses: 4 Hits: 3 Reads: 8 Writes: Memory

126 How Many Memory References? Write-back performance How many reads? Each miss (read or write) reads a block from mem 4 misses 8 mem reads How many writes? Some evictions write a block to mem dirty eviction 2 mem writes (+ 2 dirty evictions later +4 mem writes)

127 Write-back vs. Write-through Example Assume: large associative cache, 6-byte lines N 4-byte words for (i=; i<n; i++) A[] += A[i]; for (i=; i<n; i++) B[i] = A[i] Write-thru: n/4 reads n writes Write-back: n/4 reads write Write-thru: 2 x n/4 reads n writes Write-back: 2 x n/4 reads n/4 writes

128 So is write back just better? Short Answer: Yes (fewer writes is a good thing) Long Answer: It s complicated. Evictions require entire line be written back to memory (vs. just the data that was written) Write-back can lead to incoherent caches on multi-core processors (later lecture)

129 Optimization: Write Buffering Q: Writes to main memory are slow! A: Use a write-back buffer A small queue holding dirty lines Add to end upon eviction Remove from front upon completion Q: When does it help? A: short bursts of writes (but not sustained writes) A: fast eviction reduces miss penalty

130 Write-through vs. Write-back Write-through is slower But simpler (memory always consistent) Write-back is almost always faster write-back buffer hides large eviction cost But what about multiple cores with separate caches but sharing memory? Write-back requires a cache coherency protocol Inconsistent views of memory Need to snoop in each other s caches Extremely complex protocols, very hard to get right

131 Cache-coherency Q: Multiple readers and writers? A: Potentially inconsistent views of memory net A A A A L CPU L A L2 L CPU L L Mem CPU L L2 L CPU L disk Cache coherency protocol May need to snoop on other CPU s cache activity Invalidate cache line when other CPU writes Flush write-back caches before other CPU reads Or the reverse: Before writing/reading Extremely complex protocols, very hard to get right

132 Write-through policy with write allocate Cache miss: read entire block from memory Write: write only updated item to memory Eviction: no need to write to memory Slower, but cleaner Write-back policy with write allocate Cache miss: read entire block from memory **But may need to write dirty cacheline first** Write: nothing to memory Eviction: have to write to memory entire cacheline because don t know what is dirty (only dirty bit) Faster, but more complicated, especially with multicore

133 Cache Conscious Programming H // H = 6, W = int A[H][W]; for(x=; x < W; x++) for(y=; y < H; y++) sum += A[y][x]; W YOUR MIND CACHE 6 Every access a cache miss! (unless entire matrix fits in cache) MEMORY

134 Cache Conscious Programming // H = 6, W = int A[H][W]; for(x=; x < H; x++) for(y=; y < W; y++) sum += A[x][y]; W H YOUR MIND CACHE Block size = 4 75% hit rate Block size = % hit rate Block size = % hit rate And you can easily prefetch to warm the cache MEMORY

135 Clicker Question Choose the best block size for your cache among the choices given. Assume that integers and pointers are all 4 bytes each and that the scores array is 4-byte aligned. (a) byte (b) 4 bytes (c) 8 bytes (d) 6 bytes (e) 32 bytes int scores[num STUDENTS] = ; int sum = ; for (i = ; i < NUM STUDENTS; i++) { } sum += scores[i]; 37

136 Clicker Question Choose the best block size for your cache among the choices given. Assume that integers and pointers are all 4 bytes each and that the scores array is 4-byte aligned. (a) byte (b) 4 bytes (c) 8 bytes (d) 6 bytes (e) 32 bytes int scores[num STUDENTS] = ; int sum = ; for (i = ; i < NUM STUDENTS; i++) { } sum += scores[i]; 38

137 Clicker Question Choose the best block size for your cache among the choices given. Assume integers and pointers are 4 bytes. (a) byte (b) 4 bytes (c) 8 bytes (d) 6 bytes (e) 32 bytes typedef struct item_t { int value; struct item_t *next; char *name; } item_t; int sum = ; item_t *curr = list_head; while (curr!= NULL) { sum += curr->value; curr = curr->next; } 39

138 Clicker Question Choose the best block size for your cache among the choices given. Assume integers and pointers are 4 bytes. (a) byte (b) 4 bytes (c) 8 bytes (d) 6 bytes (e) 32 bytes typedef struct item_t { int value; struct item_t *next; char *name; } item_t; int sum = ; item_t *curr = list_head; while (curr!= NULL) { sum += curr->value; curr = curr->next; } 4

139 By the end of the cache lectures

140 A Real Example > dmidecode -t cache Cache Information Socket Designation: L Cache Configuration: Enabled, Not Socketed, Level Operational Mode: Write Back Location: Internal Installed Size: 28 kb Maximum Size: 28 kb Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Parity System Type: Unified Associativity: 8-way Set-associative Cache Information Socket Designation: L2 Cache Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Write Back Location: Internal Installed Size: 52 kb Maximum Size: 52 kb Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Unified Associativity: 4-way Set-associative Microsoft Surfacebook Dual core Intel i GHz (purchased in 26) Cache Information Socket Designation: L3 Cache Configuration: Enabled, Not Socketed, Level 3 Operational Mode: Write Back Location: Internal Installed Size: 496 kb Maximum Size: 496 kb Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Multi-bit ECC System Type: Unified Associativity: 6-way Set-associative

141 A Real Example > sudo dmidecode -t cache Cache Information Configuration: Enabled, Not Socketed, Level Operational Mode: Write Back Installed Size: 28 KB Error Correction Type: None Cache Information Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Varies With Memory Address Installed Size: 644 KB Error Correction Type: Single-bit ECC > cd /sys/devices/system/cpu/cpu; grep cache/*/* cache/index/level: cache/index/type:data cache/index/ways_of_associativity:8 cache/index/number_of_sets:64 cache/index/coherency_line_size:64 cache/index/size:32k cache/index/level: cache/index/type:instruction cache/index/ways_of_associativity:8 cache/index/number_of_sets:64 cache/index/coherency_line_size:64 cache/index/size:32k cache/index2/level:2 cache/index2/type:unified cache/index2/shared_cpu_list:- cache/index2/ways_of_associativity:24 cache/index2/number_of_sets:496 cache/index2/coherency_line_size:64 cache/index2/size:644k Dual-core 3.6GHz Intel (purchased in 2)

142 A Real Example Dual 32K L Instruction caches 8-way set associative 64 sets 64 byte line size Dual 32K L Data caches Same as above Single 6M L2 Unified cache 24-way set associative (!!!) 496 sets 64 byte line size 4GB Main memory TB Disk Dual-core 3.6GHz Intel (purchased in 29)

143

144 Summary Memory performance matters! often more than CPU performance because it is the bottleneck, and not improving much because most programs move a LOT of data Design space is huge Gambling against program behavior Cuts across all layers: users programs os hardware NEXT: Multi-core processors are complicated Inconsistent views of memory Extremely complex protocols, very hard to get right

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17 Caches and emory Anne Bracy CS 34 Computer Science Cornell University Slides by Anne Bracy with 34 slides by Professors Weatherspoon, Bala, ckee, and Sirer. See P&H Chapter: 5.-5.4, 5.8, 5., 5.3, 5.5,

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5. Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17 Writing to caches: policies, performance Cache tradeoffs and

More information

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5 s (Writing) Hakim Weatherspoon CS, Spring Computer Science Cornell University P & H Chapter.,. Administrivia Lab due next onday, April th HW due next onday, April th Goals for Today Parameter Tradeoffs

More information

Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. See P&H 5.2 (writes), 5.3, 5.5

Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. See P&H 5.2 (writes), 5.3, 5.5 Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.2 (writes), 5.3, 5.5 Announcements! HW3 available due next Tuesday HW3 has been updated. Use updated version.

More information

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Caches (Writing) P & H Chapter 5.2 3, 5.5 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Memory Code Stored in Memory (also, data and stack) memory PC +4 new pc

More information

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Caches (Writing) P & H Chapter 5.2 3, 5.5 Hakim Weatherspoon CS 34, Spring 23 Computer Science Cornell University Welcome back from Spring Break! Welcome back from Spring Break! Big Picture: Memory Code

More information

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015 s and emory Deniz Altinbuken CS, Spring Computer Science Cornell University See P& Chapter:.-. (except writes) Big Picture: emory Code Stored in emory (also, data and stack) compute jump/branch targets

More information

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca

More information

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca

More information

Virtual Memory 1! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. P & H Chapter 5.4 (up to TLBs)

Virtual Memory 1! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. P & H Chapter 5.4 (up to TLBs) Virtual Memory 1! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P & H Chapter 5.4 (up to TLBs) Announcements! HW3 available due today Tuesday HW3 has been updated. Use updated

More information

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) s akim Weatherspoon CS, Spring Computer Science Cornell University See P&.,. (except writes) Big Picture: : big & slow vs s: small & fast compute jump/branch targets memory PC + new pc Instruction Fetch

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) Caches akim Weatherspoon CS 341, Spring 212 Computer Science Cornell University See P& 5.1, 5.2 (except writes) ctrl ctrl ctrl inst imm B A B D D Big Picture: emory emory: big & slow vs Caches: small &

More information

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 10

ECE 571 Advanced Microprocessor-Based Design Lecture 10 ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 2 October 2014 Performance Concerns Caches Almost all programming can be

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

www-inst.eecs.berkeley.edu/~cs61c/

www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 34 - Caches II 11/16/2007 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ 1 What to do on a write hit? Two Options: Write-through update

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

Inside out of your computer memories (III) Hung-Wei Tseng

Inside out of your computer memories (III) Hung-Wei Tseng Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Anne Bracy CS 3410 Computer Science Cornell University

Anne Bracy CS 3410 Computer Science Cornell University Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. compute jump/branch targets

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Virtual Memory. CS 3410 Computer System Organization & Programming

Virtual Memory. CS 3410 Computer System Organization & Programming Virtual Memory CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Where are we now and

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search

More information

CS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.15

CS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.15 CS 34, Spring 4 Computer Science Cornell University See P& Chapter: 5.- 5.4, 5.8, 5.5 Code Stored in emory (also, data and stack) memory PC +4 new pc inst control extend imm B A compute jump/branch targets

More information

Anne Bracy CS 3410 Computer Science Cornell University

Anne Bracy CS 3410 Computer Science Cornell University Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. See P&H 2.8 and 2.12, and

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Lecturer SOE Dan Garcia Google Glass may be one vision of the future of post-pc interfaces augmented reality with video

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Asst. Proflecturer SOE Miki Garcia WHEN FIBER OPTICS IS TOO SLOW 07/16/2014: Wall Street Buys NATO Microwave Towers in

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Mo Money, No Problems: Caches #2...

Mo Money, No Problems: Caches #2... Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

1 0 d c b a ...! ! Benefits of Larger Block Size.  Very applicable with Stored-Program Concept  Works well for sequential array accesses Asst. Proflecturer SOE Miki Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III WHEN FIBER OPTICS IS TOO SLOW 07/16/2014: Wall Street Buys NATO Microwave Towers in

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

Virtual Memory 3. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4

Virtual Memory 3. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4 Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 5.4 Project3 available now Administrivia Design Doc due next week, Monday, April 16 th Schedule

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L22 Caches II (1) CPS today! Lecture #22 Caches II 2005-11-16 There is one handout today at the front and back of the room! Lecturer PSOE,

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 32 Caches III 2008-04-16 Lecturer SOE Dan Garcia Hi to Chin Han from U Penn! Prem Kumar of Northwestern has created a quantum inverter

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel

More information

Performance metrics for caches

Performance metrics for caches Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 33 Caches III 2007-04-11 Future of movies is 3D? Dreamworks says they may exclusively release movies in this format. It s based

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! How the hardware and OS give application pgms: The illusion of a large contiguous address space Protection against each other Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Roadmap. Java: Assembly language: OS: Machine code: Computer system: Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp %rbp 0111010000011000

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 61C: Great Ideas in Computer Architecture Caches Part 2 CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/fa15 Software Parallel Requests Assigned to computer eg,

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size CS61C L33 Caches III (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C Machine Structures Lecture 33 Caches III 27-4-11 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Future of movies is 3D? Dreamworks

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Lecture 7 - Memory Hierarchy-II

Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Show Me the $... Performance And Caches

Show Me the $... Performance And Caches Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Memory Hierarchy 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Agenda Review Memory Hierarchy Lab 2 Ques6ons Return Quiz 1 Latencies Comparison Numbers L1 Cache 0.5 ns L2 Cache 7 ns 14x L1 cache Main Memory

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Reducing Conflict Misses with Set Associative Caches

Reducing Conflict Misses with Set Associative Caches /6/7 Reducing Conflict es with Set Associative Caches Not too conflict y. Not too slow. Just Right! 8 byte, way xx E F xx C D What should the offset be? What should the be? What should the tag be? xx N

More information