Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Size: px
Start display at page:

Download "Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)"

Transcription

1 Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1

2 Outline 7.1 Introduction 7.2 The Basics of Caches 7.3 Measuring and Improving Cache Performance 7.4 Virtual Memory 7.5 A Common Framework for Memory Hierarchies 臺大電機吳安宇教授 - 計算機結構 2

3 Five Classic Components of a Computer 臺大電機吳安宇教授 - 計算機結構 3

4 Principle of Locality The programs access a relatively small portion of their address space at any instant of time (books in library): Temporal locality : If an item (instruction/data) is referenced, it will tend to be referenced again soon. Spatial locality : If an item (instruction/data) is referenced, items whose addresses are close by will ten to be referenced soon. Example: Temporal locality : In programs, instructions data within loops are likely to be accessed repeated. Spatial locality : Instructions are normally accessed sequentially. Elements in an array are accessed sequentially. Take advantage of the Principle of locality by implementing the memory of a computer Memory hierarchy 臺大電機吳安宇教授 - 計算機結構 4

5 Memory Hierarchy A memory hierarchy consists of multiple levels of memory with different speeds and sizes. Guideline: Build memory as a hierarchy of levels, with the fastest memory close to the processor, and the slower, less expensive memory below that. Goal: To present the user with as much as is available in the cheapest technology, while providing access at the speed offered by the fastest memory. Three major technologies used to construct memory hierarchy: Memory hierarchy SRAM DRAM Magnetic disk Typical access time ns ns 5,000,000-20,000,000 ns $ per GB in 2004 $ $10000 $100 - $200 $0.5 - $2 臺大電機吳安宇教授 - 計算機結構 5

6 Basic Structure of a Memory Hierarchy 臺大電機吳安宇教授 - 計算機結構 6

7 Basic Structure of a Memory Hierarchy Register File 臺大電機吳安宇教授 - 計算機結構 7

8 Memory Hierarchy The memory system is organized as a hierarchy: a level closer to the processor is a subset of any level further away, and all the data are stored at the lowest level. The data is copied between only two adjacent levels at a time. The minimum unit of information between the hierarchy is called a block. 臺大電機吳安宇教授 - 計算機結構 8

9 Memory Hierarchy Pyramid 臺大電機吳安宇教授 - 計算機結構 9

10 Terminologies If the data requested by the processor appears in the upper level, it s called a hit. If the data isn t found in the upper level, it s a miss. The lower level in the hierarchy is then accessed to retrieve the block containing the requested data. Hit ratio (hit rate): It is the fraction of memory accesses found in the upper level. It is often used as a measure of the performance of the memory hierarchy. Miss rate (1 hit rate): It is the fraction of memory accesses not found in the upper level. 臺大電機吳安宇教授 - 計算機結構 10

11 Terminologies Hit time: the time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss. Miss penalty: the time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver the block to processor. Note: In general, miss penalty > hit time. Because all programs spend much of their time accessing memory, the memory system is necessarily a major factor in determining performance. 臺大電機吳安宇教授 - 計算機結構 11

12 Outline 7.1 Introduction 7.2 The Basics of Caches 7.3 Measuring and Improving Cache Performance 7.4 Virtual Memory 7.5 A Common Framework for Memory Hierarchies 臺大電機吳安宇教授 - 計算機結構 12

13 The Basics of Cache Cache: a safe place for hiding or storing things. Example: Before the request, the cache contains a collection of recent references X 1, X 2,., X n-1, and the processor requests a word X n that is not in the cache. This request results in a miss, and the word X n is brought form memory into cache. X 4 X 4 X 1 X 1 X n-2 X n-2 X n-1 X n-1 X 2 X 2 X n X 3 Before the reference to Xn X 3 After the reference to Xn 臺大電機吳安宇教授 - 計算機結構 13

14 Tags of the Cache How do we know whether a requested word is in the cache or not?- -add a set of tags to the cache. 臺大電機吳安宇教授 - 計算機結構 14

15 Direct-mapped Cache 臺大電機吳安宇教授 - 計算機結構 15

16 Valid Bit of Cache Add a valid bit to indicate whether an entry contains a valid address. Replacement policy: recently accessed words replace less-recently referenced words. (use temporal locality) 臺大電機吳安宇教授 - 計算機結構 16

17 Accessing a Cache Example: the action for each reference 臺大電機吳安宇教授 - 計算機結構 17

18 Action for each cache reference 臺大電機吳安宇教授 - 計算機結構 18

19 Action for each cache reference 臺大電機吳安宇教授 - 計算機結構 19

20 Address of Cache An address is divided into: 1. Valid bit 2. Tag field: compare with the value of the tag field of the cache 3. Data field: cache data 4. Cache tag: select the block Size of Cache: 1K x (1bit ) bits 臺大電機吳安宇教授 - 計算機結構 20

21 Address of Cache (cont d) Size of Cache: 16K x (1bit ) bits 臺大電機吳安宇教授 - 計算機結構 21

22 Cache of Intrinsity FastMATH processor 16KB caches: 256 blocks with 16 words per block (spatial locality) 臺大電機吳安宇教授 - 計算機結構 22

23 Total Number of Bits for a Cache Assumption: n bits are used for the index, m bits are used for the word within the block and 2 bits are used for the byte part of the address. In the 32-bit byte address, a direct-mapped cache of size 2 n blocks with 2 m -word (2 m+2 -byte) blocks will require a tag field whose size is 32 - (n+m+2) bits. Since the block size is 2 m words (2 m+5 bits) and the address size is 32 bits, the total number of bits in a direct-mapped cache is 2 n * (block size + tag size + valid filed size) = 2 n * (m * 32 + (32 n m -2) + 1) = 2 n * (m * n m) 臺大電機吳安宇教授 - 計算機結構 23

24 Total Number of Bits for a Cache (Question) How many total bits are required for a direct-mapped cache with 16KB of data and 4-word blocks, assuming 32-bit address? (Answer) 16K Bytes (16KB) of Cache = 4K Words Each block = 4 words Cache has 2 10 blocks Block size: 32*4 = 128 bits Tag size: = 18 bits Valid field size: 1 bit The total number of bits: 2 10 * ( ) = 147 Kbits 臺大電機吳安宇教授 - 計算機結構 24

25 Compute Cache Block Address Example: Mapping an address of a cache to a multi-word cache block (Question) Consider a cache with 64 blocks and a block size of 16 bytes. What block number does byte address 1200 map to? (Answer) Block address: [1200/16] = 75 Cache block number: 75 modulo 64 = 11 In fact, this block (11) maps all byte addresses between 1200 and 臺大電機吳安宇教授 - 計算機結構 25

26 Miss rate v.s. block size In general, the miss rate falls as we increase the block size. (take advantage of spatial locality) Miss rate may go up if the block size is made very large, compared with the cache size (cache blocks become less) Miss penalty: the time required to fetch the data form the next lower-level memory and load it into the cache. 臺大電機吳安宇教授 - 計算機結構 26

27 Handling Cache Miss The control unit must detect a miss and process the miss by fetching the requested data from memory (or a lower-level cache). If the cache report a hit, the CPU continues to use data as if nothing happens. If an instruction fetch results in a miss, then the contents of IR are not valid, and the next action (reading the registers) will be useless (harmless). To perform the actions needed for a cache miss on an instruction read, we need to instruct the lower-level memory to perform a read. We wait for the memory to respond (since the access will take multiple cycles), and then write the words into the cache. 臺大電機吳安宇教授 - 計算機結構 27

28 Steps for Instruction Cache Miss 1. Send the original PC value (current PC - 4) to the memory. 2. Perform a read on the instruction main memory to wait for the memory to complete its access. 3. Write the cache entry: Put the data from memory in the data portion of the entry Write the upper bits of address (from the ALU) into the tag field Turn the valid bit ON 4. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache (a hit) 臺大電機吳安宇教授 - 計算機結構 28

29 Steps for Cache Read 1. Send the address to the appropriate cache. The address comes either from the PC (for an instruction) or from the ALU (for data). 2. If the cache signals hit, the requested word is available on the data lines. Since there are 16 words in the desired block, we need to select the right one. A block field is used to control multiplexer, which selects the requested word from the 16 words in the indexed block. 3. If the cache signal has a miss, we send address to the main memory. When the memory returns with the data, we write it into the caches and then read it to fulfill the request. Approximate instruction and data miss rates for the Intrinsity FastMATH processor for SPEC2000 benchmarks (Effective miss rate considers the frequency of the events) Instruction miss rate 0.4% Data miss rate 11.4% Effective combined miss rate 3.2% 臺大電機吳安宇教授 - 計算機結構 29

30 Cache Consistency in writing data Inconsistent: Suppose on a store (sw) instruction, we wrote the data into only the data cache without changing main memory. After the write into the cache, memory would have a different value from that in the cache. Write through: A scheme in which writes always update both the cache and the memory, ensuring that data is always consistent between the two. Write buffer: A queue that holds data while the data are waiting to be written to memory. Write back: A scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower-level of the hierarchy when the block is replaced. 臺大電機吳安宇教授 - 計算機結構 30

31 Cache Write Miss Read misses: It is similar to single-word block. A miss always bring back the entire block. Write misses: Because the block contains more than a single word, we cannot just write the tag and data. -- If the tag of the address and the tag in the cache entry are equal, we have write hit and can continue. If the tags are unequal, we have a write miss and must fetch the block from memory. 臺大電機吳安宇教授 - 計算機結構 31

32 Different organization of memory The primary method of achieving higher memory bandwidth is to increase the physical or logical width of the memory system. Interleaved 臺大電機吳安宇教授 - 計算機結構 32

33 Memory Organization To understand the impact of different organization of memory, we define a set of hypothetical memory access times. Assume : 1 memory bus clock cycle to send the address (data transfer) 15 memory bus clock cycles for each DRAM access initiated (get data) 1 memory bus clock cycle to send a word of data (data transfer) Three methods: (1) For a cache block of four words and a one-word-wide bank of DRAMs: The miss penalty = * *1 = 65 clock cycles. The number of bytes transferred per bus clock cycle for a single miss = 4 * 4 / 65 = 0.25 bytes/cycle (Effective bandwidth per miss) 臺大電機吳安宇教授 - 計算機結構 33

34 (2) Parallel access: Memory Organization With a main memory bus width of two words: The miss penalty = * * 1 = 33 clock cycles (0.48) With a main memory bus width of four words: The miss penalty = * * 1 = 17 clock cycles (0.94) Cost : wider bus (wires) + increasing access time (due to multiplexer and control logic) (3) Interleaving: Instead of making the entire path between the memory and cache wider, the memory chips can be organized in banks to read or write multiple words in one access time rather than reading or writing a single word each time. Each bank could be one word wide so that the width of the bus and cache needn t change, but sending an address to several banks permits them all to read simultaneously. The miss penalty = * * 1 = 20 clock cycles. The effective bandwidth per miss = 4 * 4 / 20 = 0.8. 臺大電機吳安宇教授 - 計算機結構 34

35 Outline 7.1 Introduction 7.2 The Basics of Caches 7.3 Measuring and Improving Cache Performance 7.4 Virtual Memory 7.5 A Common Framework for Memory Hierarchies 臺大電機吳安宇教授 - 計算機結構 35

36 Measuring Cache Performance CPU time = (CPU execution clock cycles + Memory-stall clock cycles) x Clock cycle time Memory-stall clock cycles come primarily from cache misses: Memory-stall clock cycles = Read-stall cycles + Write-stall cycles Read-stall cycles = (reads/program) * read miss rate * read miss penalty Write-stall cycles = ((writes/ program) * write miss rate * write miss penalty) + write buffer stalls For a write-back scheme: It has potential additional stalls arising from the need to write a cache block to memory when the block is replaced. For a write-through scheme: Write miss requires that we fetch the block before continuing the write. Write buffer stalls: The write buffer is full when a write occurs. 臺大電機吳安宇教授 - 計算機結構 36

37 Measuring Cache Performance In most write-through scheme, we assume: Read and write miss penalty are the same. Write buffer stall is negligible. => (1) Memory-stall clock cycles = (Memory accesses/program) * Miss rate * Miss penalty (2) Memory-stall clock cycles = (Instructions/Program) * (Misses/Instruction) * Miss penalty 臺大電機吳安宇教授 - 計算機結構 37

38 Calculating cache performance (question) How much faster a processor would run with perfect cache that never missed? (answer) Assumptions: An instruction cache miss = 2% A data cache miss rate = 4% CPI = 2 without any memory stalls Miss penalty = 100 cycles for all misses Use the instruction frequencies for SPECint2000 from Chapter3, Fig3.26 on page 228. Instruction count = I 臺大電機吳安宇教授 - 計算機結構 38

39 Calculating cache performance Instruction miss cycles = I * 2% * 100 = 2.00 I Data miss cycles = I * 36% * 4% * 100 = 1.44 I Total number of memory-stall cycles = 2.00 I I = 3.44 I The CPI with memory stalls is = 5.44 Since there is no change in instruction count or clock rate, the ratio of the CPU execution times = CPU time with stalls I * CPI stall clock cycle CPU time with perfect cache I * CPI perfect clock cycle CPI stall CPI perfect 2 臺大電機吳安宇教授 - 計算機結構 39

40 Calculating cache performance Example: Suppose we speed up the computer in the previous example by reducing CPI from 2 to 1. The system with cache misses has a CPI of = The system with the perfect cache = 4.44 / 1 = 4.44 times faster. The amount of execution time spent on memory stalls rises from 3.44 / 5.44 = 63% to 3.44 / 4.44 = 77% 臺大電機吳安宇教授 - 計算機結構 40

41 Example: cache performance with increased clock rate (question) How much faster will the computer be with the faster clock, assuming the same miss rate ad the previous example? (answer) Calculating cache performance The new miss penalty = 200 clock cycles Total miss cycles per instruction = (2%*200)+36%*(4%*200) = 6.88 CPI = = 8.88 performance with fast clock The relative performance = performance with slow clock IC * CPI slow * clock cycle IC * CPI fast * (clock cycle/2) 8.88/2 The computer with faster clock is about 1.2 times faster rather than 2 times faster, which it have been if we ignored cache misses. 臺大電機吳安宇教授 - 計算機結構 41

42 Calculating cache performance Reducing cache misses by more flexible placement of blocks (1) Direct mapped cache: A block can go in exactly one place in the cache. (2) Fully-associative cache: A cache structure in which a block can be placed in any location in the cache. (3) Set-associative cache: A cache that has a fixed number of locations (at least two) where each block can be placed. In direct-mapped cache, the position of a memory block is given by (block number) modulo (number of cache blocks) In a set-associative cache, the set containing a memory block is given by (block number) modulo (number of set in the cache). 臺大電機吳安宇教授 - 計算機結構 42

43 Associativity in Caches Direct-mapped cache 臺大電機吳安宇教授 - 計算機結構 43

44 Associativity in Caches Set-associative cache Fully-associative cache A fully-associative cache with m entries is simple an m-way setassociative cache. It has one set with m blocks, and an entry can reside in any block within that set. 臺大電機吳安宇教授 - 計算機結構 44

45 Associative structures 臺大電機吳安宇教授 - 計算機結構 45

46 Example: Misses and Associativity in Caches Assume there are three small caches, each consisting of four one-word blocks. They are fully associative, two-way set associative and direct-mapped. Find the number of misses for each organization given the following sequence of block addresses : 0, 8, 0, 6, 8. 臺大電機吳安宇教授 - 計算機結構 46

47 Misses in Direct-mapped Cache Sequence of block addresses : 0, 8, 0, 6, 8. Block address Cache block (0 modulo 4) = 0 (6 modulo 4) = 2 (8 modulo 4) = 0 Address of memory block accessed Hit or miss 0 Contents of cache blocks after reference miss Memory[0] 8 miss Memory[8] 0 miss Memory[0] 6 miss Memory[0] Memory[6] 8 miss Memory[8] Memory[6] The direct-mapped cache generates 5 misses for the five accesses. 臺大電機吳安宇教授 - 計算機結構 47

48 Misses and Associativity in Caches (2) Two-way set associative cache: Block address Cache block (0 modulo 2) = 0 (6 modulo 2) = 0 (8 modulo 2) = 0 Address of memory block accessed Hit or miss 0 Contents of cache blocks after reference miss Memory[0] 8 miss Memory[0] Memory[8] 0 hit Memory[0] Memory[8] 6 miss Memory[0] Memory[6] 8 miss Memory[8] Memory[6] The two-way set associative cache has 4 misses. 臺大電機吳安宇教授 - 計算機結構 48

49 Measuring and Improving Cache Performance (3) Fully associative cache: Any memory block can be stored in any cache block. Address of memory block accessed Hit or miss 0 Contents of cache blocks after reference miss Memory[0] 8 miss Memory[0] Memory[8] 0 hit Memory[0] Memory[8] 6 miss Memory[0] Memory[8] Memory[6] 8 hit Memory[0] Memory[8] Memory[6] The fully associative cache only has 3 misses: the best one 臺大電機吳安宇教授 - 計算機結構 49

50 Four-way set-associative cache 臺大電機吳安宇教授 - 計算機結構 50

51 Size of tags v.s. Set associativity Question: Assume a cache of 4K caches, a four-word block size, and a 32-bit address. Find the total number of sets and the total number tag bits for caches that are direct-mapped, twoway and four-way set associative and fully associative. Answer: direct-mapped: The bits for index and tag = 32 4 = 28 (4=block offset) The number of sets = the number of blocks = 4K The bits for index = log 2 (4K) = 12 The total number of tag bits = (28-12) * 4K = 64K bits 臺大電機吳安宇教授 - 計算機結構 51

52 Size of tags v.s. Set associativity Two-way set associative: The number of sets = (the number of blocks) / 2 = 2K The total number of tag bits = (28-11) * 2 * 2K = 68K bits Four-way set associative: The number of sets = (the number of blocks) / 4 = 1K The total number of tag bits = (28-10) * 4 * 1K = 72K bits Fully set associative: The number of sets = 1 The total number of tag bits = 28 * 4K * 1= 112K bits Least recently used (LRU): A replacement scheme in which the block replaced is the one that has been unused for the longest time. 臺大電機吳安宇教授 - 計算機結構 52

53 Multilevel cache Multilevel cache: A memory hierarchy with multiple levels of caches, rather than just a cache and main memory. Example: Suppose we have a processor with a base CPI of 1.0, assuming all reference hit in the primary cache, and a clock rate of 5GHz. Assume a main memory access time of 100 ns, including all the miss handling. Suppose the miss rate per instruction at the primary cache is 2%. How much faster will the processor be if we add a secondary cache that has a 5 ns access time for either a hit or a miss and is large enough to reduce the miss rate to the main memory to 0.5%? 臺大電機吳安宇教授 - 計算機結構 53

54 Multilevel cache (cont d) For the processor with one level of cache: The miss penalty to main memory = 100ns/ 0.2ns (1/5G) = 500 clock cycles. Total CPI = base CPI + memory-stall cycles per instruction. Total CPI = % * 500 = 11.0 For the processor with two levels of cache: The miss penalty for an access to second-level cache = 5 / 0.2 = 25 clock cycles. Total CPI = base CPI + primary stalls per instruction + secondary stalls per instruction. Total CPI = % * % * 500 = 4.0 The processor with the secondary cache is faster by 11.0 / 4.0 = 2.8 臺大電機吳安宇教授 - 計算機結構 54

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Main Memory Supporting Caches

Main Memory Supporting Caches Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Cache Issues 1 Example cache block read

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline

More information

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns

More information

Memory Hierarchy Design (Appendix B and Chapter 2)

Memory Hierarchy Design (Appendix B and Chapter 2) CS359: Computer Architecture Memory Hierarchy Design (Appendix B and Chapter 2) Yanyan Shen Department of Computer Science and Engineering 1 Four Memory Hierarchy Questions Q1 (block placement): where

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Cache memory idea Use a small faster memory, a cache memory, to store recently

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Why memory hierarchy

Why memory hierarchy Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory,

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Memory Hierarchy: The motivation

Memory Hierarchy: The motivation Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Cache Memory and Performance

Cache Memory and Performance Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

COSC3330 Computer Architecture Lecture 19. Cache

COSC3330 Computer Architecture Lecture 19. Cache COSC3330 Computer Architecture Lecture 19 Cache Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Cache Topics 3 Cache Hardware Cost How many total bits are required

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Chapter 5. Memory Technology

Chapter 5. Memory Technology Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Memory Hierarchy: Motivation

Memory Hierarchy: Motivation Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

ECE ECE4680

ECE ECE4680 ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time

More information

Lecture 12: Memory hierarchy & caches

Lecture 12: Memory hierarchy & caches Lecture 12: Memory hierarchy & caches A modern memory subsystem combines fast small memory, slower larger memories This lecture looks at why and how Focus today mostly on electronic memories. Next lecture

More information

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S. Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 Reading (2) 1 SRAM: Value is stored on a pair of inerting gates Very fast but

More information

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Chapter 7: Large and Fast: Exploiting Memory Hierarchy Chapter 7: Large and Fast: Exploiting Memory Hierarchy Basic Memory Requirements Users/Programmers Demand: Large computer memory ery Fast access memory Technology Limitations Large Computer memory relatively

More information

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality Caching Chapter 7 Basics (7.,7.2) Cache Writes (7.2 - p 483-485) configurations (7.2 p 487-49) Performance (7.3) Associative caches (7.3 p 496-54) Multilevel caches (7.3 p 55-5) Tech SRAM (logic) SRAM

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Large and Fast: Exploiting Memory Hierarchy

Large and Fast: Exploiting Memory Hierarchy 5 Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available. We are forced to recognize the possibility of constructing a hierarchy of

More information

CMPT 300 Introduction to Operating Systems

CMPT 300 Introduction to Operating Systems CMPT 300 Introduction to Operating Systems Cache 0 Acknowledgement: some slides are taken from CS61C course material at UC Berkeley Agenda Memory Hierarchy Direct Mapped Caches Cache Performance Set Associative

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information