Computer Architecture Memory hierarchies and caches

Size: px
Start display at page:

Download "Computer Architecture Memory hierarchies and caches"

Transcription

1 Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019

2 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 2/44 S Coudert and R Pacalet January 23, 2019

3 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 3/44 S Coudert and R Pacalet January 23, 2019

4 The memory latency problem Latency of external memory access tends to increase (tens to hundreds of CPU clock cycles) Clock cycles between CPU load and instruction/data returning from memory are wasted Clock Per Instruction (CPI) increases On the other hand, in average, 90% of execution time corresponds to 10% of code instructions Caches are a way to take benefit from this to mitigate the memory latency problem 4/44 S Coudert and R Pacalet January 23, 2019

5 CPU Principles of caches All CPU memory accesses Small fast memory Access only when needed Larger slow memory Memory Memory Memory Memory Size smallest largest Speed fastest slowest Cost ($/bit) highest lowest Keep most frequently accessed data in small (expensive) fast (close) memory Performance depend on hit and miss times and on hit rate Technology Latency (s) Cost ($ per byte) Register Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk /44 S Coudert and R Pacalet January 23, 2019

6 Memory hierarchy CPU Increasing size Level 1 Level 2 Level n Increasing latency Size Goals: Minimize miss rate Ideally, the full memory is as fast as level 1 (miss rate = 0) and its size is that of level n (GBytes) 6/44 S Coudert and R Pacalet January 23, 2019

7 Cache miss & Cache hit CPU CACHE address data bus address bus HIT data bus address address bus 4 3 data bus MISS data bus 2 MEMORY Cache miss => CPU wait states 7/44 S Coudert and R Pacalet January 23, 2019

8 Most frequently accessed Best possible choice for cached data: the one I will need next The future s not ours to see (Que sera sera) Second best choice: most frequently accessed data How to identify them Approximation, heuristics based on two locality principles: Spatial: in a given short period of time a program frequently accesses a small memory area (example: working on an array) Temporal: a program often accesses the same memory cell several times in a short period (example: instructions in a loop) 8/44 S Coudert and R Pacalet January 23, 2019

9 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 9/44 S Coudert and R Pacalet January 23, 2019

10 Locality principles: example Sub-program example: 1 f o r ( i = 0; i < 1000; i ++) { 2 C[ i ] = A [ i ] + B [ i ] ; 3 } Variable addresses: array A array B array C constant constant /44 S Coudert and R Pacalet January 23, 2019

11 Locality principles: example 1000 times Temporal locality: an accessed memory location is likely to be accessed again soon Spatial locality: a memory location near an accessed one is likely to be accessed soon 8000 lw $1,36000($0) 8004 lw $2,36004($0) 8008 lw $3,24000($1) 8012 lw $4,28000($1) 8016 add $3,$3,$ sw $3,32000($1) 8024 beq $1,$2, addi $1,$1, j while looping # $r1 < 0 Initialization # $r2 < 3996 # $r3 < A[i] Loop body # $r4 < B[i] # $r3 < $r3 + $r4 # C[i] < $r3 # jump to 8036 if $r1 = $r2 # increment $r1 # jump to 8008 Sequel 11/44 S Coudert and R Pacalet January 23, 2019

12 Locality principles: example 1000 times Temporal locality: an accessed memory location is likely to be accessed again soon Spatial locality: a memory location near an accessed one is likely to be accessed soon 8000 lw $1,36000($0) 8004 lw $2,36004($0) 8008 lw $3,24000($1) 8012 lw $4,28000($1) 8016 add $3,$3,$ sw $3,32000($1) 8024 beq $1,$2, addi $1,$1, j while looping # $r1 < 0 Initialization # $r2 < 3996 # $r3 < A[i] Loop body # $r4 < B[i] # $r3 < $r3 + $r4 # C[i] < $r3 # jump to 8036 if $r1 = $r2 # increment $r1 # jump to 8008 Sequel 11/44 S Coudert and R Pacalet January 23, 2019

13 Locality principles: example adresses 8000 lw $1,36000($0) 8004 lw $2,36004($0) 8008 lw $3,24000($1) 8012 lw $4,28000($1) 8016 add $3,$3,$ sw $3,32000($1) 8024 beq $1,$2, addi $1,$1, j # $r1 < 0 # $r2 < 3996 # $r3 < A[i] # $r4 < B[i] # $r3 < $r3 + $r4 # C[i] < $r3 # jump to 8036 if $r1 = $r2 # increment $r1 # jump to instruction fetch temporal locality iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 time 12/44 S Coudert and R Pacalet January 23, 2019

14 Locality principles: example adresses lw $1,36000($0) 8004 lw $2,36004($0) 8008 lw $3,24000($1) 8012 lw $4,28000($1) 8016 add $3,$3,$ sw $3,32000($1) 8024 beq $1,$2, addi $1,$1, j # $r1 < 0 # $r2 < 3996 # $r3 < A[i] # $r4 < B[i] # $r3 < $r3 + $r4 # C[i] < $r3 # jump to 8036 if $r1 = $r2 # increment $r1 # jump to 8008 data fetch iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 time 12/44 S Coudert and R Pacalet January 23, 2019

15 Locality principles: example adresses lw $1,36000($0) 8004 lw $2,36004($0) 8008 lw $3,24000($1) 8012 lw $4,28000($1) 8016 add $3,$3,$ sw $3,32000($1) 8024 beq $1,$2, addi $1,$1, j # $r1 < 0 # $r2 < 3996 # $r3 < A[i] # $r4 < B[i] # $r3 < $r3 + $r4 # C[i] < $r3 # jump to 8036 if $r1 = $r2 # increment $r1 # jump to spatial locality iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 time 12/44 S Coudert and R Pacalet January 23, 2019

16 Most frequently accessed Selection of data to cache usually based on locality heuristics When data is fetched from lower levels: it is loaded in cache and kept there for later re-use (temporal locality), and data in neighbourhood are also loaded, just in case they will also be needed (spatial locality) 13/44 S Coudert and R Pacalet January 23, 2019

17 CPU Cache miss / cache hit 1 Memory 2 handling cache fault x Cache miss x CPU Cache x Memory 3 Memory 4 loading data at x and around Memory CPU Cache x x+1 hit CPU Cache hit data at x is now in cache neighbourhood too accessing y: miss y 14/44 S Coudert and R Pacalet January 23, 2019

18 CPU Cache miss / cache hit 5 Memory 6 handling cache fault y Cache miss CPU Cache Memory y y 7 Memory 8 Memory Cache x+1 x hit CPU CPU Cache miss 14/44 S Coudert and R Pacalet January 23, 2019

19 Cache management strategies Where in cache shall we store the incoming data when handling cache faults Upon CPU accesses, how do we know if a data is in cache and where In case a data must be replaced, which one to chose How do we handle write accesses Various kinds of caches and associated strategies 15/44 S Coudert and R Pacalet January 23, 2019

20 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 16/44 S Coudert and R Pacalet January 23, 2019

21 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

22 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

23 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

24 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

25 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

26 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

27 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

28 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

29 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit 17/44 S Coudert and R Pacalet January 23, 2019

30 Direct-mapped caches Smallest cache-able unit: the smallest Addressing Unit (AU, eg one byte, one word) For any AU there is one unique possible location in cache Cache capacity: 2 k cache lines (stores at most 2 k AUs) Where in the cache is AU which address in memory is a In line a mod 2 k How do we know it is the right AU The cache line also stores the tag: a/2 k How do we know it is a valid AU (eg after reset) The cache line also stores a validity bit k bits V tag data address tag line index 1 tag line data cache: 2 k lines 17/44 S Coudert and R Pacalet January 23, 2019

31 Direct-mapped cache architecture Example 4 GB memory with 1 kb cache and one-byte cache lines memory address V hit tag = 20 data data 10 Memory /44 S Coudert and R Pacalet January 23, 2019

32 1 MISS CPU handling cache fault Direct-mapped cache running Cache 11 a q b g 01 l q b g a b c d e f g h j i k m l n o p Memory MISS CPU handling cache fault Cache 01 l b f g 01 l q b g a b c d e f g h j i k m l n o p Memory a a b b c c MISS d HIT d e e Cache f Cache f l 00 g p 00 g b 01 h f j i 0 01 b 01 h CPU 0000 CPU f g j i 0 11 g k k handling cache fault m l m l p 00 n n b 01 o o f 10 p p g 11 Memory but 1011 or 1110: miss Memory 19/44 S Coudert and R Pacalet January 23, 2019

33 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 20/44 S Coudert and R Pacalet January 23, 2019

34 Larger addressing unit and cache line Addressing unit: CPU word (eg 32 bits) Locality: cache line stores several consecutive words (a block) Blocks are aligned in memory and cache Memory address tag unit n bits index byte tagidxuubb uubb uubb uubb uubb uubb V n cache lines MUX 21/44 S Coudert and R Pacalet January 23, 2019

35 CPU CPU MISS Direct-mapped cache running Cache handling cache fault p0 o0 n0 m HIT Cache p0 o0 n0 m l0 k0 j0 i o p a b c d e f g h i j k l m n o p Memory o p a b c d e f g h i j k l m n o p Memory CPU CPU MISS Cache p0 o0 n0 m handling cache fault p0 o0 n0 m l0 k0 j0 i MISS Cache p0 o0 n0 m l0 k0 j0 i handling cache fault p1 o1 n1 m l0 k0 j0 i o p a b c d e f g h i j k l m n o p Memory o p a b c d e f g h i j k l m n o p Memory 22/44 S Coudert and R Pacalet January 23, 2019

36 Exercise #1: Cache size and architecture 4-bytes words 4-words blocks (cache lines) 64-bit addresses Direct-mapped cache Cache capacity: 2 16 bytes of data Addresses breakdown Cache architecture Total cache size (with tags and valid bits) 23/44 S Coudert and R Pacalet January 23, 2019

37 Limits of block size Fewer cache lines for same cache capacity Favours spatial locality over temporal locality More data loaded from memory when handling cache miss Potentially increase cache miss cost Example (simplified): Access to three consecutive data in memory Cache access time: 2 cycle Memory access time (1 word): 20 cycles 1 word per cache line and 3 cache misses: = 66 cycles 4 words per cache line and 1 cache miss: 6 +1 (4 20) = 86 cycles Requires efficient data transfer between memory and cache 24/44 S Coudert and R Pacalet January 23, 2019

38 Improving memory-cache transfers Wider data bus Expensive Bounded efficiency Higher bus frequency Expensive Limited by Printed Circuit Board (PCB) constraints Double Data Rate (DDR) Banks rows columns Row latency > column latency Wrapping bursts Requested word first Multi-banking Mask row latencies bank decoder control logic address bus row DDR memory column MUX read logic data bus 25/44 S Coudert and R Pacalet January 23, 2019

39 Cache efficiency vs block size (Computer Organization and Design, the Hardware / Software Interface, Patterson and Hennessy, second edition, 1998) VAX machine, direct-mapped cache 26/44 S Coudert and R Pacalet January 23, 2019

40 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 27/44 S Coudert and R Pacalet January 23, 2019

41 Set-associative caches Other possible improvement: several blocks per index (set) N-way set associative cache: N blocks per set (Computer Organization and Design, the Hardware / Software Interface, Patterson and Hennessy, second edition, 1998) 28/44 S Coudert and R Pacalet January 23, 2019

42 Set-associative cache architecture (Computer Organization and Design, the Hardware / Software Interface, Patterson and Hennessy, second edition, 1998) 29/44 S Coudert and R Pacalet January 23, 2019

43 Set-associative: replacement policy Several blocks per set What block to replace when a set is full First In First Out (FIFO) Same as Least Recently Cached Not that good, first in can be re-referenced frequently Random: simple hardware, sub-optimal but satisfactory Least Recently Used (LRU), complex hardware if > 2-ways LRU approximations Combinations of FIFO and LRU (Not Most Recently Used) 30/44 S Coudert and R Pacalet January 23, 2019

44 Exercise #2: Set-associative cache running 2-set, 2-way set-associative cache FIFO replacement policy Sequence of accesses (addresses) illustrating various situations Hits Misses 31/44 S Coudert and R Pacalet January 23, 2019

45 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 32/44 S Coudert and R Pacalet January 23, 2019

46 Write policies Goals: avoid memory corruption, improve performance miss miss => replace x x x read x write x read y CPU CPU Cache hit Write-through: write in cache and in memory Performance issues when write rate exceeds memory throughput Write-back: memory written only when replacing a dirty block A dirty flag is added to cache lines Helpful when write rate exceeds memory throughput Cache miss Write only in memory (No Write Allocate) Fetch block from memory and write data (Write Allocate) All combinations are possible, some are more frequent (guess) Write buffers can be added to smooth the write rate to memory CPU y 33/44 S Coudert and R Pacalet January 23, 2019

47 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches Write strategies Cache coherence in multiprocessor systems 34/44 S Coudert and R Pacalet January 23, 2019

48 Processors Cache coherence problem CPU 1 CPU 2 CPU n Caches CPU 2: x=123 x=123 z= Write x,237 x absent Bus x=123 Shared memory CPU 1 CPU 2 CPU 1 CPU 2 Multiple copies in caches and memory how to ensure coherence x=123 x=237 x=123 x=237 x=123 x=237 Shared memory Shared memory incoherent! incoherent! 35/44 S Coudert and R Pacalet January 23, 2019

49 Cache coherence Note: problem exists even in mono-processor systems Direct Memory Access (DMA) peripherals Snooping Caches inform others about what they do Each cache continuously monitors others activity (snooping) Take appropriate actions when needed Appropriate depends on cache coherence protocol Directory-based Central or distributed directories track blocks and caches Not studied in this course Cache coherence protocols require support from bus protocol To exchange state changes To exchange cached data 36/44 S Coudert and R Pacalet January 23, 2019

50 Cache coherence protocols Define what action is taken in which circumstance (state) Examples Write invalidate Written cache sends a write invalidate message All snooping caches invalidate their copy of written block What if write through What if write back Write update (broadcast) Written cache broadcasts the new block All snooping caches update their copy of written block What if write through What if write back Operations of one cache can Be delayed upon request from another cache Wait for acknowledges by other caches Expect a response from other caches Use a delay before action unless responded to Be served by another cache or memory 37/44 S Coudert and R Pacalet January 23, 2019

51 Cache coherence protocols Each cache Maintains a state of each block Invalid Clean Dirty Exclusive Owned Reacts on events from its own processor Processor read Processor write Reacts on messages from other caches (snooped on bus) Read Write Flush Emits messages to other caches (on bus) Exchanges data with other caches (on bus) 38/44 S Coudert and R Pacalet January 23, 2019

52 Cache coherence protocols Write-through caches Exercise #3: Imagine protocol (messages, actions) 39/44 S Coudert and R Pacalet January 23, 2019

53 Example of coherence protocol: MSI MSI: Modified, Shared, Invalid Each block is in one of 3 states (M, S, I) for each cache A block can be in different states in different caches States definition Invalid: not in cache (or not valid) Shared: in cache and valid but read-only Modified: in cache and valid and read-write A block not in cache is considered in I (Invalid) state If a block is in M (Modified) state in one cache, it is the only copy The 3 states can be encoded using the Valid and Dirty flags 40/44 S Coudert and R Pacalet January 23, 2019

54 Example of coherence protocol: MSI Write-back allocate caches Exercise #4: Imagine MSI protocol (events, actions) M I S 41/44 S Coudert and R Pacalet January 23, 2019

55 Example of coherence protocol: MESI 3 states of MSI can be encoded using Valid and Dirty flags We could add one more state for free Let us reduce bandwidth with a fourth Exclusive (E) state Exclusive: block in cache, valid, read-only and is the only copy Write-back allocate caches Exercise #5: Imagine MESI protocol (events, actions) M S I E 42/44 S Coudert and R Pacalet January 23, 2019

56 Cache coherence protocol Homework: imagine further improvements with one more state Example: MOESI protocol O (Owner): modified in one cache, shared (S) in others Owner responsible for write-back Owner responsible for providing block to other caches M S I E O 43/44 S Coudert and R Pacalet January 23, 2019

57 Vocabulary Cache hit (miss): requested data is (not) in cache Block: smallest cache-able unit (12 n words, aligned) Cache line: where cache stores a block, its tag and flags Set: group of blocks with same cache index N-way cache: cache with N blocks per set Direct-mapped cache: 1-way cache (number of lines = number of sets) Full-associative cache: 1-set cache (number of lines = number of ways) Index: part of address used to designate a set Tag: part of address stored in cache line and compared with requested address to decide hit or miss Valid flag: flag stored in cache line and used to indicate whether content is valid or not Dirty flag: flag stored in cache line and used to indicate whether content has been modified or not Write-through: writing in cache and in memory Write-back: writing in cache but not in memory Eviction (replacement): replacing a block in a cache line by another block, after writing the replaced block in memory if cache is write-back and block was dirty Write-allocate: cache that, upon write miss, reads the block from memory, stores it in cache (evicts another block if needed) and writes in the cache (and in memory if it is a write-through cache) Write-no-allocate: cache that, upon write miss, writes in memory but not in the cache Coherence: property that guarantees that all CPUs see the same memory content despite their local caches 44/44 S Coudert and R Pacalet January 23, 2019

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Cache memory idea Use a small faster memory, a cache memory, to store recently

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Why memory hierarchy

Why memory hierarchy Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory,

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality Caching Chapter 7 Basics (7.,7.2) Cache Writes (7.2 - p 483-485) configurations (7.2 p 487-49) Performance (7.3) Associative caches (7.3 p 496-54) Multilevel caches (7.3 p 55-5) Tech SRAM (logic) SRAM

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Chapter 6 Objectives

Chapter 6 Objectives Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

Structure of Computer Systems

Structure of Computer Systems 222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory. MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory MEMORY HIERARCHY Memory Hierarchy Memory Hierarchy is to obtain the highest possible access

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Welcome to Part 3: Memory Systems and I/O

Welcome to Part 3: Memory Systems and I/O Welcome to Part 3: Memory Systems and I/O We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? We will now focus on memory issues, which are frequently

More information

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay! Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1

More information

Cache introduction. April 16, Howard Huang 1

Cache introduction. April 16, Howard Huang 1 Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

Recap: Machine Organization

Recap: Machine Organization ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,

More information

Cycle Time for Non-pipelined & Pipelined processors

Cycle Time for Non-pipelined & Pipelined processors Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array

More information

EE 457 Unit 7a. Cache and Memory Hierarchy

EE 457 Unit 7a. Cache and Memory Hierarchy EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several levels of faster and faster memory to hide delay of upper levels Registers Unit of Transfer:, Half, or Byte (LW, LH, LB

More information

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017 CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Introduction to OpenMP. Lecture 10: Caches

Introduction to OpenMP. Lecture 10: Caches Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Cache and Memory. CS230 Tutorial 07

Cache and Memory. CS230 Tutorial 07 Cache and Memory CS230 Tutorial 07 Cache Overview Memory Hierarchy Blocks rom fastest and most expensive to slowest and least expensive astest memory is smallest and closest to CPU Slowest memory is largest

More information

MIPS) ( MUX

MIPS) ( MUX Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1> Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Assignment 1 due Mon (Feb 4pm

Assignment 1 due Mon (Feb 4pm Announcements Assignment 1 due Mon (Feb 19) @ 4pm Next week: no classes Inf3 Computer Architecture - 2017-2018 1 The Memory Gap 1.2x-1.5x 1.07x H&P 5/e, Fig. 2.2 Memory subsystem design increasingly important!

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Logical Diagram of a Set-associative Cache Accessing a Cache

Logical Diagram of a Set-associative Cache Accessing a Cache Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1> Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Introduction. Memory Hierarchy

Introduction. Memory Hierarchy Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright

More information

Memory System Design Part II. Bharadwaj Amrutur ECE Dept. IISc Bangalore.

Memory System Design Part II. Bharadwaj Amrutur ECE Dept. IISc Bangalore. Memory System Design Part II Bharadwaj Amrutur ECE Dept. IISc Bangalore. References: Outline Computer Architecture a Quantitative Approach, Hennessy & Patterson Topics Memory hierarchy Cache Multi-core

More information