Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17
|
|
- Amy Green
- 6 years ago
- Views:
Transcription
1 Caches and emory Anne Bracy CS 34 Computer Science Cornell University Slides by Anne Bracy with 34 slides by Professors Weatherspoon, Bala, ckee, and Sirer. See P&H Chapter: , 5.8, 5., 5.3, 5.5, 5.7
2 Programs C Code int main (int argc, char* argv[ ]) { int i; int m = n; int sum = ; } for (i = ; i <= m; i++) { sum += i; } printf (..., n, sum); Load/Store Architectures: Read data from memory (put in registers) anipulate it Store it back to memory main: addiu $sp,$sp,-48 sw $3,44($sp) sw $fp,4($sp) move $fp,$sp sw $4,48($fp) sw $5,52($fp) la $2,n lw $2,($2) sw $2,28($fp) sw $,32($fp) li $2, sw $2,24($fp) $L2: lw $2,24($fp) lw $3,28($fp) slt $2,$3,$2 bne $2,$,$L3... IPS Assembly Instructions that read from or write to memory 2
3 Cycle Per Stage: the Biggest Lie (So Far) Code Stored in emory (also, data and stack) compute jump/branch targets memory PC +4 new pc Instruction Fetch inst detect hazard register file control extend Instruction Decode imm B A ctrl Execute ALU forward unit d in addr d out memory emory IF/ID ID/EX EX/E E/WB B D ctrl Stack, Data, Code Stored in emory D ctrl Write- Back 3
4 What s the problem? CPU ain emory + big slow far away SandyBridge otherboard, 2 4
5 The Need for Speed CPU Pipeline 5
6 The Need for Speed CPU Pipeline Instruction speeds: add,sub,shift: cycle mult: 3 cycles load/store: cycles off-chip 5(-7)ns 2(-3) GHz processor à.5 ns clock 6
7 The Need for Speed CPU Pipeline 7
8 What s the solution? Caches! Level 2 $ Level Data $ Level Insn $ What lucky data gets to go here? Intel Pentium 3, 999 8
9 Locality Locality Locality If you ask for something, you re likely to ask for: the same thing again soon à Temporal Locality something near that thing, soon à Spatial Locality total = ; for (i = ; i < n; i++) total += a[i]; return total; 9
10 Your life is full of Locality Last Called Speed Dial Favorites Contacts Google/Facebook/
11 Your life is full of Locality
12 The emory Hierarchy cycle, Registers 28 bytes L Caches 4 cycles, 64 KB Small, Fast Intel Haswell Processor, 23 L2 Cache L3 Cache ain emory Disk 2 cycles, 256 KB 36 cycles, 2-2 B 5-7 ns, 52 B 4 GB 5-2 ms 6GB 4 TB, Big, Slow 2
13 Some Terminology Cache hit data is in the Cache t hit : time it takes to access the cache %hit: Hit rate. # cache hits / # cache accesses Cache miss data is not in the Cache t miss : time it takes to get the data from below the $ iss rate (%miss): # cache misses / # cache accesses 3
14 The emory Hierarchy cycle, Registers 28 bytes L Caches L2 Cache L3 Cache ain emory Disk 4 cycles, 64 KB 2 cycles, 256 KB 36 cycles, 2-2 B average access time t avg = t hit + % miss * t miss = 4 + 5% x = 9 cycles 5-7 ns, 52 B 4 GB 5-2 ms 6GB 4 TB, Intel Haswell Processor, 23 4
15 Single Core emory Hierarchy ON CHIP Registers L Caches L2 Cache L3 Cache ain emory Disk Processor Regs I$ D$ L2 ain emory Disk 6
16 ulti-core emory Hierarchy ON CHIP Processor Processor Processor Processor Regs Regs Regs Regs I$ D$ I$ D$ I$ D$ I$ D$ Registers( L2 L2 L2 L2 L(Caches( L2(Cache( L3(Cache( ain(emory( Disk( L3 ain emory Disk 7
17 emory Hierarchy by the Numbers CPU clock rates ~.33ns 2ns (3GHz-5Hz) emory technology SRA (on chip) SRA (off chip) DRA Transistor count* Access time *Registers,D-Flip Flops: - s of registers Access time in cycles $ per GIB in 22 Capacity 6-8 transistors ns -3 cycles $4k 256 KB transistor (needs refresh).5-3 ns 5-5 cycles $4k 32 B 5-7 ns 5-2 cycles $-$2 8 GB SSD (Flash) 5k-5k ns Tens of thousands Disk 5-2 ns illions $.5- $. $.75-$ 52 GB 4 TB 8
18 Basic Cache Design Direct apped Caches 9
19 6 Byte emory load x à r Byte-addressable memory 4 address bits à 6 bytes total b addr bits à 2 b bytes in memory EORY addr data A B C D E F G H J K L N O P Q 2
20 EORY 4-Byte, Direct apped Cache addr index XXXX index CACHE data A B C D ßCache entry = row = (cache) line = (cache) block Block Size: byte Direct mapped: Each address maps to cache block 4 entries à 2 index bits (2 n à n bits) Index with LSB: Supports spatial locality data A B C D E F G H J K L N O P Q 2
21 Analogy to a Spice Rack Spice Rack (Cache) index spice Spice Wall (emory) A B C D E F Z Compared to your spice wall Smaller Faster ore costly (per oz.) 22
22 Analogy to a Spice Rack Spice Rack (Cache) index tag spice Spice Wall (emory) A B C D E F Z Cinnamon How do you know what s in the jar? Need labels Tag = Ultra-minimalist label 23
23 4-Byte, Direct apped tag index XXXX index Cache tag CACHE data Tag: minimalist label/address address = tag + index A B C D EORY addr data A B C D E F G H J K L N O P Q 24
24 4-Byte, Direct apped Cache index V tag CACHE data X X X X One last tweak: valid bit EORY addr data A B C D E F G H J K L N O P Q 25
25 Simulation # of a 4-byte, D Cache tag index XXXX load x index iss V tag CACHE data X X X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 26
26 Simulation # of a 4-byte, D Cache tag index XXXX index V tag CACHE data N xx X xx X xx X load x iss Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 27
27 Simulation # of a 4-byte, D Cache tag index XXXX load x... load x Awesome! index iss Hit! V tag CACHE data N X X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 28
28 Block Diagram 4-entry, direct mapped Cache tag index 2 2 V tag CACHE data 2 8 Great! Are we done? = Hit! data
29 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss V tag CACHE data X X X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 3
30 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss V tag CACHE data N xx X xx X xx X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 3
31 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss V tag CACHE data N X X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 32
32 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss V tag CACHE data N O X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 33
33 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss iss V tag CACHE data N O xx X xx X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 34
34 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss iss V tag CACHE data E O X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 35
35 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss iss iss V tag CACHE data E O X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 36
36 tag index XXXX load x load x load x load x Simulation #2: 4-byte, D Cache index iss iss iss iss V tag CACHE data N O X X cold cold cold Disappointed! L EORY addr data A B C D E F G H J K L N O P Q 37
37 Reducing Cold isses by Increasing Block Size Leveraging Spatial Locality 38
38 EORY Increasing Block Size addr data A offset index XXXX CACHE V tag data x A B x C D x E F x G H Block Size: 2 bytes Block Offset: least significant bits indicate where you live in the block Which bits are the index? tag? B C D E F G H J K L N O P Q 39
39 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss CACHE V tag data x X X x X X x X X x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 4
40 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 4
41 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss Hit! CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 42
42 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss Hit! iss CACHE V tag data x X X x X X N O x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 43
43 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss Hit! iss CACHE V tag data x X X x X X E F x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 44
44 index tag offset XXXX load x load x load x load x Simulation #3: 8-byte, D Cache index iss Hit! iss iss CACHE V tag data x X X x X X E F x X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 45
45 load x load x load x load x Simulation #3: 8-byte, D Cache index iss Hit! iss iss CACHE V tag data x X X x X X E F x X X cold cold conflict hit, 3 misses 3 bytes don t fit in an 8 byte cache? EORY addr data A B C D E F G H J K L N O P Q 46
46 Removing Conflict isses with Fully-Associative Caches 47
47 8 byte, fully-associative Cache XXXX tag offset CACHE V tag data V tag data V tag data V tag data xxx X X xxx X X xxx X X xxx X X What should the offset be? What should the index be? What should the tag be? EORY addr data A B C D E F G H J K L N O P Q 48
48 XXXX tag offset V tag data xxx X X load x load x load x load x LRU Pointer Simulation #4: 8-byte, FA Cache V tag data xxx X X iss CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits EORY addr data A B C D E F G H J K L N O P Q 49
49 V tag XXXX tag offset data N O Simulation #4: 8-byte, FA Cache V tag load x load x load x load x data xxx X X iss Hit! CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits EORY addr data A B C D E F G H J K L N O P Q 5
50 V tag XXXX tag offset data N O V tag load x load x load x load x LRU Pointer Simulation #4: 8-byte, FA Cache data xxx X X iss Hit! iss CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits EORY addr data A B C D E F G H J K L N O P Q 5
51 V tag XXXX tag offset data N O Simulation #4: 8-byte, FA Cache V tag load x load x load x load x LRU Pointer data E F iss Hit! iss Hit! CACHE V tag data xxx X X V tag data xxx X X Lookup: Index into $ Check tags Check valid bits EORY addr data A B C D E F G H J K L N O P Q 52
52 Pros and Cons of Full Associativity + No more conflicts! + Excellent utilization! But either: Parallel Reads lots of reading! Serial Reads lots of waiting = 4 + 5% x = 9 cycles t avg = t hit + % miss * t miss = 6 + 3% x = 9 cycles 53
53 Pros & Cons Direct apped Fully Associative Tag Size Smaller Larger SRA Overhead Less ore Controller Logic Less ore Speed Faster Slower Price Less ore Scalability Very Not Very # of conflict misses Lots Zero Hit Rate Low High Pathological Cases Common?
54 Reducing Conflict isses with Set-Associative Caches Not too conflict-y. Not too slow. Just Right! 55
55 8 byte, 2-way set associative Cache XXXX tag offset index CACHE index V tag data V tag data xx E F xx N O xx C D xx P Q What should the offset be? What should the index be? What should the tag be? EORY addr data A B C D E F G H J K L N O P Q 56
56 8 byte, 2-way set associative Cache XXXX tag offset index index load x load x load x load x LRU Pointer V tag data CACHE xx X X xx X X iss V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 58
57 8 byte, 2-way set associative Cache XXXX tag offset index index load x load x load x load x LRU Pointer V tag data CACHE N O xx X X iss Hit! V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 59
58 8 byte, 2-way set associative Cache XXXX tag offset index index load x load x load x load x LRU Pointer V tag data CACHE N O xx X X iss Hit! iss V tag data xx X X xx X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 6
59 8 byte, 2-way set associative Cache XXXX tag offset index index load x load x load x load x LRU Pointer V tag data CACHE N O xx X X iss Hit! iss Hit! V tag data E F xx X X Lookup: Index into $ Check tag Check valid bit EORY addr data A B C D E F G H J K L N O P Q 6
60 Eviction Policies Which cache line should be evicted from the cache to make room for a new line? Direct-mapped: no choice, must evict line selected by index Associative caches Random: select one of the lines at random Round-Robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time 62
61 isses: the Three C s Cold (compulsory) iss: never seen this address before Conflict iss: cache associativity is too low Capacity iss: cache is too small 63
62 iss Rate vs. Block Size 64
63 Block Size Tradeoffs For a given total cache size, Larger block sizes mean. fewer lines so fewer tags, less overhead and fewer cold misses (within-block prefetching ) But also fewer blocks available (for scattered accesses!) so more conflicts can decrease performance if working set can t fit in $ and larger miss penalty (time to fetch block)
64 iss Rate vs. Associativity 66
65 ABCs of Caches t avg = t hit + % miss * t miss + Associativity: conflict misses J hit time L + Block Size: cold misses J conflict misses L + Capacity: capacity misses J hit time L 67
66 Which caches get what properties? t avg = t hit + % miss * t miss L Caches L2 Cache L3 Cache Fast Big Design with speed in mind ore Associative Bigger Block Sizes Larger Capacity Design with miss rate in mind 68
67 Roadmap Things we have covered: The Need for Speed Locality to the Rescue! Calculating average memory access time $ isses: Cold, Conflict, Capacity $ Characteristics: Associativity, Block Size, Capacity Things we will now cover: Cache Figures Cache Performance Examples Writes 69
68 ore Slides Coming
69 2-Way Set Associative Cache (Reading) Tag Index Offset = = line select word select 64bytes hit? data 32bits 7
70 3-Way Set Associative Cache (Reading) Tag Index Offset = = = line select hit? word select data 64bytes 32bits 72
71 How Big is the Cache? Tag Index Offset n bit index, m bit offset, N-way Set Associative Question: How big is cache? Data only? (what we usually mean when we ask how big is the cache) Data + overhead? 73
72 Cache Performance Example t avg = t hit + % miss * t miss t avg = for accessing 6 words? emory Parameters (very simplified): ain emory: 4GB Data cost: 5 cycle for first word, plus 3 cycles per subsequent word L: 52 x 64 byte cache lines, direct mapped Data cost: 3 cycle per word access Lookup cost: 2 cycle Performance if %hit = 9%? Performance if %hit = 95%? Note: here t hit splits up lookup vs. data cost. Why are there two ways? 75
73 Performance Calculation with $ Hierarchy t avg = t hit + % miss * t miss Parameters Reference stream: all loads D$: t hit = ns, % miss = 5% L2: t hit = ns, % miss = 2% (local miss rate) ain memory: t hit = 5ns What is t avgd$ without an L2? t missd$ = t avgd$ = What is t avgd$ with an L2? t missd$ = t avgl2 = t avgd$ = 77
74 Performance Summary Average memory access time (AAT) depends on: cache architecture and size Hit and miss rates Access times and miss penalty Cache design a very complex problem: Cache size, block size (aka line size) Number of ways of set-associativity (, N, ) Eviction policy Number of levels of caching, parameters for each Separate I-cache from D-cache, or Unified cache Prefetching policies / instructions Write policy 79
75 Takeaway Direct apped à fast, but low hit rate Fully Associative à higher hit cost, higher hit rate Set Associative à middleground Line size matters. Larger cache lines can increase performance due to prefetching. BUT, can also decrease performance is working set size cannot fit in cache. Cache performance is measured by the average memory access time (AAT), which depends cache architecture and size, but also the access time for hit, miss penalty, hit rate. 8
76 What about Stores? Where should you write the result of a store? If that memory location is in the cache? Send it to the cache Should we also send it to memory right away? (write-through policy) Wait until we evict the block (write-back policy) If it is not in the cache? Allocate the line (put it in the cache)? (write allocate policy) Write it directly to memory without allocation? (no write allocate policy)
77 Q: How to write data? Cache Write Policies CPU addr data Cache SRA emory DRA If data is already in the cache No-Write writes invalidate the cache and go directly to memory Write-Through writes go to main memory and cache Write-Back CPU writes only to cache cache writes to main memory later (when block is evicted)
78 Write Allocation Policies Q: How to write data? CPU addr data Cache SRA emory DRA If data is not in the cache Write-Allocate allocate a cache line for new data (and maybe write-through) No-Write-Allocate ignore cache, just go to main memory
79 Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Write-Through Stores Register File $ $ $2 $3 6 byte, byte-addressed memory 4 btye, fully-associative cache: 2-byte blocks, write-allocate 4 bit addresses: 3 bit tag, bit offset lru V tag data Cache isses: Hits: emory
80 Write-Through (REF ) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 lru V tag data Cache isses: Hits: emory
81 Write-Through (REF ) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Addr: lru V tag data Cache 78 isses: Hits: emory
82 Write-Through (REF 2) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 lru V tag data Cache 78 isses: Hits: emory
83 Write-Through (REF 2) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V tag data Cache isses: 2 Hits: emory
84 Write-Through (REF 3) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 lru V tag data Cache isses: 2 Hits: emory
85 Write-Through (REF 3) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] Register File $ $ $2 $3 SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] 73 Addr: lru V tag data Cache isses: 2 Hits: emory
86 Write-Through (REF 4) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 73 Addr: lru V tag data Cache isses: 2 Hits: emory
87 Write-Through (REF 4) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 73 lru V tag data Cache isses: 3 Hits: emory
88 Write-Through (REF 5) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 73 Addr: lru V tag data Cache 73 7 isses: 3 Hits: emory
89 Write-Through (REF 5) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 33 lru V tag data Cache isses: 4 Hits: emory
90 Write-Through (REF 6) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 33 Addr: lru V tag data Cache isses: 4 Hits: emory
91 Write-Through (REF 6) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] Hit SB $ à [ ] Register File $ $ $2 $3 33 lru V tag data Cache isses: 4 Hits: emory
92 Write-Through (REF 7) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] Hit SB $ à [ ] Register File $ $ $2 $3 33 Addr: lru V tag data Cache isses: 4 Hits: emory
93 Write-Through (REF 7) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] Hit SB $ à [ ] Hit Register File $ $ $2 $3 33 lru V tag data Cache isses: 4 Hits: emory
94 How any emory References? Write-through performance How many memory reads? How many memory writes? Overhead? Do we need a dirty bit?
95 Write-Through (REF 8,9) Instructions:... SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit Hit Hit 33 lru V tag data Cache 28 7 isses: 4 Hits: emory
96 Write-Through (REF 8,9) Instructions:... Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] Hit SB $ à [ ] Hit SB $ à [ 5 ] Hit SB $ à [ ] Hit Register File $ $ $2 $3 33 lru V tag data Cache 28 7 isses: 4 Hits: emory
97 Summary: Write Through Write-through policy with write allocate Cache miss: read entire block from memory Write: write only updated item to memory Eviction: no need to write to memory
98 Next Goal: Write-Through vs. Write-Back Can we also design the cache NOT to write all stores immediately to memory? Keep the current copy in cache, and update memory when data is evicted (write-back policy) Write-back all evicted lines? No, only written-to blocks
99 Write-Back eta-data (Valid, Dirty Bits) V D Tag Byte Byte 2 Byte N V = means the line has valid data D = means the bytes are newer than main memory When allocating line: Set V =, D =, fill in Tag and Data When writing line: Set D = When evicting line: If D = : just set V = If D = : write-back Data, then set D =, V =
100 Write-back Example Example: How does a write-back cache work? Assume write-allocate
101 Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Handling Stores (Write-Back) Register File $ $ $2 $3 6 byte, byte-addressed memory 4 btye, fully-associative cache: 2-byte blocks, write-allocate 4 bit addresses: 3 bit tag, bit offset lru V d tag data Cache isses: Hits: emory
102 Write-Back (REF ) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 lru V d tag data Cache isses: Hits: emory
103 Write-Back (REF ) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Addr: lru V d tag data Cache 78 isses: Hits: emory
104 Write-Back (REF ) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 lru V d tag data Cache 78 isses: Hits: emory
105 Write-Back (REF 2) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 lru V d tag data Cache 78 isses: Hits: emory
106 Write-Back (REF 2) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V d tag data Cache isses: 2 Hits: emory
107 Write-Back (REF 3) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 lru V d tag data Cache isses: 2 Hits: emory
108 Write-Back (REF 3) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V d tag data Cache isses: 2 Hits: emory
109 Write-Back (REF 4) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 lru V d tag data Cache isses: 2 Hits: emory
110 Write-Back (REF 4) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V d tag data Cache isses: 3 Hits: emory
111 Write-Back (REF 5) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V d tag data Cache 73 7 isses: 3 Hits: emory
112 Write-Back (REF 5) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] Hit SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 73 Addr: lru V d tag data Cache 73 7 isses: 3 Hits: emory
113 Write-Back (REF 5) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 33 Addr: lru V d tag data Cache isses: 4 Hits: emory
114 Write-Back (REF 6) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit 33 Addr: lru V d tag data Cache isses: 4 Hits: emory
115 Write-Back (REF 6) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit Hit 33 lru V d tag data Cache isses: 4 Hits: emory
116 Write-Back (REF 7) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit Hit 33 lru V d tag data Cache isses: 4 Hits: emory
117 Write-Back (REF 7) Instructions: LB $ ß [ ] LB $2 ß [ 7 ] SB $2 à [ ] SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit Hit Hit 33 lru V d tag data Cache 28 7 isses: 4 Hits: emory
118 Write-Back (REF 8,9) Instructions:... SB $ à [ 5 ] LB $2 ß [ ] SB $ à [ 5 ] SB $ à [ ] SB $ à [ 5 ] SB $ à [ ] Register File $ $ $2 $3 Hit Hit Hit 33 lru V d tag data Cache 28 7 isses: 4 Hits: emory
119 Write-Back (REF 8,9) Instructions:... SB $ à [ 5 ] Register File $ $ $2 $3 Hit LB $2 ß [ ] SB $ à [ 5 ] Hit SB $ à [ ] Hit SB $ à [ 5 ] Hit SB $ à [ ] Hit 33 lru V d tag data Cache 28 7 isses: 4 Hits: emory
120 How any emory References? Write-back performance How many reads? How many writes?
121 Write-back vs. Write-through Example Assume: large associative cache, 6-byte lines for (i=; i<n; i++) A[] += A[i]; for (i=; i<n; i++) B[i] = A[i] N words Write-through: n/6 reads n writes Write-back: n/6 reads write Write-through: 2 x n/6 reads n writes Write-back: 2 x n/6 reads n write
122 So is write back just better? Short Answer: Yes (fewer writes is a good thing) Long Answer: It s complicated. Evictions require entire line be written back to memory (vs. just the data that was written) Write-back can lead to incoherent caches on multi-core processors (later lecture)
123 Cache Conscious Programming // H = 2, W = int A[H][W]; for(x=; x < W; x++) for(y=; y < H; y++) sum += A[y][x]; Every access a cache miss! (unless entire matrix fits in cache) 2
124 Cache Conscious Programming // H = 2, W = 2 3 int A[H][W]; for(y=; y < H; y++) for(x=; x < W; x++) sum += A[y][x]; Block size = 4 à 75% hit rate Block size = 8 à 87.5% hit rate Block size = 6 à 93.75% hit rate And you can easily prefetch to warm the cache
125 By the end of the cache lectures
126
127 Summary emory performance matters! often more than CPU performance because it is the bottleneck, and not improving much because most programs move a LOT of data Design space is huge Gambling against program behavior Cuts across all layers: users à programs à os à hardware NEXT: ulti-core processors are complicated Inconsistent views of memory Extremely complex protocols, very hard to get right
Caches & Memory. CS 3410 Computer System Organization & Programming
Caches & Memory CS 34 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 34 by Professors Weatherspoon, Bala, Bracy, and Sirer. Programs C Code int main
More informationCaches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5
s (Writing) Hakim Weatherspoon CS, Spring Computer Science Cornell University P & H Chapter.,. Administrivia Lab due next onday, April th HW due next onday, April th Goals for Today Parameter Tradeoffs
More informationCaches and Memory Deniz Altinbuken CS 3410, Spring 2015
s and emory Deniz Altinbuken CS, Spring Computer Science Cornell University See P& Chapter:.-. (except writes) Big Picture: emory Code Stored in emory (also, data and stack) compute jump/branch targets
More informationCaches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca
More informationCaches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches akim Weatherspoon CS 341, Spring 212 Computer Science Cornell University See P& 5.1, 5.2 (except writes) ctrl ctrl ctrl inst imm B A B D D Big Picture: emory emory: big & slow vs Caches: small &
More informationCaches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca
More informationCaches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Caches (Writing) P & H Chapter 5.2 3, 5.5 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Memory Code Stored in Memory (also, data and stack) memory PC +4 new pc
More informationCaches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
s akim Weatherspoon CS, Spring Computer Science Cornell University See P&.,. (except writes) Big Picture: : big & slow vs s: small & fast compute jump/branch targets memory PC + new pc Instruction Fetch
More informationCaches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Caches (Writing) P & H Chapter 5.2 3, 5.5 Hakim Weatherspoon CS 34, Spring 23 Computer Science Cornell University Welcome back from Spring Break! Welcome back from Spring Break! Big Picture: Memory Code
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17 Writing to caches: policies, performance Cache tradeoffs and
More informationCaches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. See P&H 5.2 (writes), 5.3, 5.5
Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.2 (writes), 5.3, 5.5 Announcements! HW3 available due next Tuesday HW3 has been updated. Use updated version.
More informationCS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.15
CS 34, Spring 4 Computer Science Cornell University See P& Chapter: 5.- 5.4, 5.8, 5.5 Code Stored in emory (also, data and stack) memory PC +4 new pc inst control extend imm B A compute jump/branch targets
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCaches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationAnne Bracy CS 3410 Computer Science Cornell University
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. See P&H 2.8 and 2.12, and
More informationAnne Bracy CS 3410 Computer Science Cornell University
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. compute jump/branch targets
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationSee also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.
13 1 Memory and Caches 13 1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13 1 EE 4720 Lecture Transparency. Formatted 9:11, 22 April
More informationKey Point. What are Cache lines
Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible
More informationInside out of your computer memories (III) Hung-Wei Tseng
Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More information10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache
Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is
More informationEECS 322 Computer Architecture Improving Memory Access: the Cache
EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationECE ECE4680
ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time
More informationECE 571 Advanced Microprocessor-Based Design Lecture 10
ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 2 October 2014 Performance Concerns Caches Almost all programming can be
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationVirtual Memory. CS 3410 Computer System Organization & Programming
Virtual Memory CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Where are we now and
More informationMo Money, No Problems: Caches #2...
Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More information10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy
CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try
More informationReducing Conflict Misses with Set Associative Caches
/6/7 Reducing Conflict es with Set Associative Caches Not too conflict y. Not too slow. Just Right! 8 byte, way xx E F xx C D What should the offset be? What should the be? What should the tag be? xx N
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More information13-1 Memory and Caches
13-1 Memory and Caches 13-1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13-1 EE 4720 Lecture Transparency. Formatted 13:15, 9 December
More informationwww-inst.eecs.berkeley.edu/~cs61c/
CS61C Machine Structures Lecture 34 - Caches II 11/16/2007 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ 1 What to do on a write hit? Two Options: Write-through update
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationPrelim 3 Review. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University
Prelim 3 Review Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University Administrivia Pizza party: PA3 Games Night Tomorrow, Friday, April 27 th, 5:00-7:00pm Location: Upson B17 Prelim
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel
More informationRoadmap. Java: Assembly language: OS: Machine code: Computer system:
Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp %rbp 0111010000011000
More informationSee also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.
13 1 Memory and Caches 13 1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13 1 LSU EE 4720 Lecture Transparency. Formatted 14:51, 28
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983
More informationShow Me the $... Performance And Caches
Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1
More informationMemory Hierarchy. Mehran Rezaei
Memory Hierarchy Mehran Rezaei What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk) Option : Build It Out of Fast SRAM About 5- ns access Decoders
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationVirtual Memory 3. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 5.4 Project3 available now Administrivia Design Doc due next week, Monday, April 16 th Schedule
More informationCaches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) Announcements! HW3 available due next Tuesday Work with alone partner Be responsible
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches
CS 61C: Great Ideas in Computer Architecture Cache Performance, Set Associative Caches Instructor: Justin Hsia 7/09/2012 Summer 2012 Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More information1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses
Asst. Proflecturer SOE Miki Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III WHEN FIBER OPTICS IS TOO SLOW 07/16/2014: Wall Street Buys NATO Microwave Towers in
More informationProf. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.
Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University See P&H 2.8 and 2.12, and A.5 6 compute jump/branch targets memory PC +4 new pc Instruction Fetch
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationCS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017
CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory Hierarchies &
Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationCS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015
CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable
More informationCache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Asst. Proflecturer SOE Miki Garcia WHEN FIBER OPTICS IS TOO SLOW 07/16/2014: Wall Street Buys NATO Microwave Towers in
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationPrelim 3 Review. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Prelim 3 Review Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Administrivia Pizza party: Project3 Games Night Cache Race Tomorrow, Friday, April 26 th, 5:00-7:00pm Location:
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationIntroduction to OpenMP. Lecture 10: Caches
Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Lecturer SOE Dan Garcia Google Glass may be one vision of the future of post-pc interfaces augmented reality with video
More informationComputer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg
Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 33 Caches III 2007-04-11 Future of movies is 3D? Dreamworks says they may exclusively release movies in this format. It s based
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More informationCaches. Samira Khan March 23, 2017
Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationComprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006
Comprehensive Exams COMPUTER ARCHITECTURE Spring 2006 April 3, 2006 ID Number 1 /15 2 /20 3 /20 4 /20 Total /75 Problem 1. ( 15 points) Logic Design: A three-input switching function is expressed as f(a,
More informationMemory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"
Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware
More information