Memory. Lecture 22 CS301

Size: px
Start display at page:

Download "Memory. Lecture 22 CS301"

Transcription

1 Memory Lecture 22 CS301

2 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday

3 Pipelined Machine Fetch Decode Execute Memory PC 4 Read Addr Out Data Instruction Memory << 2 op/fun rs rt rd imm src1 src1data src2 src2data Register File destreg destdata << 2 Addr Out Data Data Memory In Data 16 Sign Ext 32 Pipeline Register (Writeback)

4 The Challenge Be able to randomly access gigabytes (or more) of data at processor speeds

5 How Do We Access Data?

6 Program Characteristics Temporal Locality Spatial Locality

7 Program Characteristics Temporal Locality w If you use one item, you are likely to use it again soon Spatial Locality

8 Program Characteristics Temporal Locality w If you use one item, you are likely to use it again Spatial Locality w If you use one item, you are likely to use its neighbors soon

9 Examples of Each Type of Locality? Temporal locality w Good? w Bad? Spatial locality w Good? w Bad?

10 Locality Programs tend to exhibit spatial & temporal locality. Just a fact of life. How can we use this knowledge of program behavior to solve our problem?

11 Predicting Data Accesses Can we predict what data we will use?

12 Predicting Data Accesses Can we predict what data we will use? w Instead of predicting branch direction, predict next memory address request

13 Predicting Data Accesses Can we predict what data we will use? w Instead of predicting branch direction, predict next memory address request w Like branch prediction, use previous behavior

14 Predicting Data Accesses Can we predict what data we will use? w Instead of predicting branch direction, predict next memory address request w Like branch prediction, use previous behavior Keep a prediction for every load? w Fetch stage for load is *TOO LATE* Keep a prediction per-memory address?

15 Predicting Data Accesses Can we predict what data we will use? w Instead of predicting branch direction, predict next memory address request w Like branch prediction, use previous behavior Keep a prediction for every load? w Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? w Given address, guess next likely address

16 Predicting Data Accesses Can we predict what data we will use? w Instead of predicting branch direction, predict next memory address request w Like branch prediction, use previous behavior Keep a prediction for every load? w Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? w Given address, guess next likely address w Too many choices table too large or fits

17 Memory Hierarchy Tech Speed Size Cost/bit SRAM (logic) Fastest CPU L1 Smallest Highest SRAM (logic) L2 Cache DRAM (capacitors) Slowest DRAM Largest Lowest

18 Using Caches To Improve Performance Caches make the large gap between processor speed and memory speed appear much smaller Caches give the appearance of having lots and lots of quickly accessible memory w Achieved by exploiting spatial and temporal locality

19 SRAM Static Random Access Memory w Volatile memory array 4-6 transistors per bit w Fast accesses ns Dimensions: w Height: # addressable locations w Width: # of b per addressable unit Usually 1 or 4 2M x 16 SRAM Height: 2M Width: 16

20 SRAM: Selection Logic Need to choose which addressable unit goes to output lines 2M multiplexor infeasible Single shared output line: bit line w Tri-state buffer used to allow multiple sources to drive bit line Tri-State Buffer A Ctrl Z 0 0 X X Select 0 Enable Data 0 In Out Select 1 Enable Data 1 In Out Select 2 Enable Output Data 2 In Out Select 3 Data 3 In Enable Out

21 SRAM: Using Bit Lines input lines address or word lines bit (output) lines

22 SRAM: For Large Arrays Large arrays (4Mx8 SRAM) require HUGE decoders and word lines Instead, 2-stage decoding w Selects addresses for eight 4Kx1024 arrays w Multiplexors select 1 bit from each 1024-b wide array

23 DRAM Dynamic RAM Value stored as charge in capacitor (1T) w Must be refreshed w Refresh by reading and writing back cells w Only uses 1-2% of active DRAM cycles 2 level decoder w Row access Row access strobe (RAS) w Column access Column access strobe (CAS) Access times w ns (typical) 4M x 1 DRAM 2048 x 2048 array

24 Memory Hierarchy Tech Speed Size Cost/bit SRAM (logic) Fastest CPU L1 Smallest Highest SRAM (logic) L2 Cache DRAM (capacitors) Slowest DRAM Largest Lowest

25 What Do We Need to Think About? 1. Design cache that takes advantage of spatial & temporal locality

26 What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase spatial & temporal locality

27 What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase locality w Java - difficult to do w C - more control over data placement

28 What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase locality w Java - difficult to do w C - more control over data placement Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!

29 Cache Design Temporal Locality Spatial Locality

30 Cache Design Temporal Locality w When we obtain the data, store it in the cache. Spatial Locality

31 Cache Design Temporal Locality w When we obtain the data, store it in the cache. Spatial Locality w Transfer large block of contiguous data to get item s neighbors. w Block (Line): Amount of data transferred for a single miss (data plus neighbors)

32 Where do we put data? Searching whole cache takes time & power Direct-mapped w Limit each piece of data to one possible position Search is quick and simple

33 Memory Direct-Mapped Index Cache

34 Memory Direct-Mapped One block (line) Index Cache

35 Direct-Mapped cache Block (Line) size = 2 words (8B) Index Data Byte Address 0b Where do we look in the cache? How do we know if it is there?

36 Direct-Mapped cache Block (Line) size = 2 words (8B) Index Data Byte Address 0b Block Address Where is it within the block? Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1) How do we know if it is there?

37 Direct-Mapped cache Block (Line) size = 2 words (8B) Valid Tag Data M[ ] M[ ] Byte Address 0b Tag Index Where is it within the block? Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1) How do we know if it is there? We need a tag & valid bit

38 Splitting the Address Direct-Mapped Cache Valid Tag Data b Tag Index Block Offset Byte Offset

39 Definitions Byte Offset: Which within? Block Offset: Which within? Set: Group of checked each access Index: Which within cache? Tag: Is this the right one?

40 Definitions Byte Offset: Which byte within word Block Offset: Which within? Set: Group of checked each access Index: Which within cache? Tag: Is this the right one?

41 Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of checked each access Index: Which within cache? Tag: Is this the right one?

42 Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which within cache? Tag: Is this the right one?

43 Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which set within cache? Tag: Is this the right one?

44 Definitions Block (Line) Hit Miss Hit time / Access time Miss Penalty

45 Definitions Block - unit of data transfer bytes/ words Hit Miss Hit time / Access time Miss Penalty

46 Definitions Block - unit of data transfer bytes/ words Hit - data found in this cache Miss Hit time / Access time Miss Penalty

47 Definitions Block - unit of data transfer bytes/ words Hit - data found in this cache Miss - data not found in this cache w Send request to lower level Hit time / Access time Miss Penalty

48 Definitions Block - unit of data transfer bytes/ words Hit - data found in this cache Miss - data not found in this cache w Send request to lower level Hit time / Access time w Time to access this cache Miss Penalty

49 Definitions Block - unit of data transfer bytes/words Hit - data found in this cache Miss - data not found in this cache w Send request to lower level Hit time / Access time w Time to access this cache Miss Penalty w Time to receive block from lower level w Not always constant

50 Example 1 Direct-Mapped Block size=2 words Direct-Mapped Cache Valid Tag Data x Tag Index Block Offset Byte Offset

51 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data Reference Stream: Hit/Miss 0b b b b b b Miss Rate: Tag Index Block Offset Byte Offset

52 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data 72 Reference Stream: Hit/Miss 0b b b b b b Miss Rate: Tag Index Block Offset Byte Offset

53 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag 10 Data M[76-79] M[72-75] 72 Reference Stream: Hit/Miss 0b M 0b b b b b Miss Rate: Tag Index Block Offset Byte Offset

54 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag 10 Data M[76-79] M[72-75] 20 Reference Stream: Hit/Miss 0b M 0b b b b b Miss Rate: Tag Index Block Offset Byte Offset

55 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] 20 Reference Stream: Hit/Miss 0b M 0b b b b b Miss Rate: Tag Index Block Offset Byte Offset

56 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] 56 Reference Stream: Hit/Miss 0b M 0b M 0b b b b Miss Rate: Tag Index Block Offset Byte Offset

57 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] 11 M[60-63] M[56-59] 56 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b b b Miss Rate: Tag Index Block Offset Byte Offset

58 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] 16 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b b b Miss Rate: Tag Index Block Offset Byte Offset

59 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] 16 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b b Miss Rate: Tag Index Block Offset Byte Offset

60 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[20-23] M[16-19] M[76-79] M[72-75] M[60-63] M[56-59] 20 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b b Miss Rate: Tag Index Block Offset Byte Offset

61 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[20-23] M[16-19] M[76-79] M[72-75] M[60-63] M[56-59] 20 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b H 0b Miss Rate: Tag Index Block Offset Byte Offset

62 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] 36 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b H 0b M Miss Rate: Tag Index Block Offset Byte Offset

63 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[36-39] M[32-35] M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] 36 Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b H 0b M Miss Rate: Tag Index Block Offset Byte Offset

64 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[36-39] M[32-35] M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b H 0b M Miss Rate: Tag Index Block Offset Byte Offset

65 Example 1 Direct-Mapped Block size=2 words Valid Direct-Mapped Cache Tag Data M[36-39] M[32-35] M[76-79] M[72-75] M[20-23] M[16-19] M[60-63] M[56-59] Reference Stream: Hit/Miss 0b M 0b M 0b M 0b H 0b H 0b M Miss Rate: 4 / 6 = 67% Hit Rate: 2 / 6 = 33% Tag Index Block Offset Byte Offset

66 Implementation Byte Address 0x Tag Valid Index Tag Data Byte Offset Block offset = MUX Hit? Data

67 Example 2 You are implementing a 64-Kbyte cache, 32-bit address The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? How many bits is the index? How many bits is the tag?

68 Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? w 16 / 4 = 4 words -> 2 bits How many bits is the index? How many bits is the tag?

69 Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? w 16 / 4 = 4 words -> 2 bits How many bits is the index? w 64*1024 / 16 = > 12 bits How many bits is the tag?

70 Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? w 16 / 4 = 4 words -> 2 bits How many bits is the index? w 64*1024 / 16 = > 12 bits How many bits is the tag? w 32 - ( ) = 16 bits

71 Direct-mapped $ w Block size = 2 words w Total size = 16 words Word addresses w 0 w 16 w 1 w 17 w 32 w 16 w 36 w 45 What is the hit rate? Example

72 Example Direct-mapped $ w Block size = 2 words w Total size = 16 words Word addresses w 0 w 16 w 1 w 17 w 32 w 16 w 36 w 45 What is the hit rate?

73 Reducing Cache Conflicts Problem: w Lines that map to same cache index conflict w Lines conflict even if other cache lines unused Solution: w Have multiple cache lines for each mapping

74 Cache Set Associativity Set: Group of cache lines address can map to Direct-mapped: 1 location for block n-way set associative: n locations for block Fully-associative: Maps to any location Direct-mapped 2-way set associative Fully-associative Set Set Set 0

75 Cache Set Associativity Decreases conflicts => increases hit rate On cache request, must check every cache line in set w Increases hit time Number of sets smaller than direct mapped, so fewer index bits w lg (number of sets) where sets < # cache lines Tag bits increase

76 2-way set associative $ w Block size = 2 words w Total size = 16 words Word addresses w 0 w 16 w 1 w 17 w 32 w 16 w 36 w 45 What is the hit rate? Example

77 Example 2-way set associative $ w Block size = 2 words w Total size = 16 words Word addresses w 0 w 16 w 1 w 17 w 32 w 16 w 36 w 45 What is the hit rate?

78 Implementation Byte Address 0x Valid Tag Tag Data Index Valid Byte Offset Block offset Tag Data 1 Hit? = MUX = MUX MUX Data

79 Example You are implementing a 1Mbyte 4-way set associative cache, 32-bit address The block size (line size) is 256 bytes. How many bits is the block offset? How many bits is the index? How many bits is the tag?

80 What Happens on Cache Miss? Detect desired block is not there w Valid bit 0 OR w Tag not one we re looking for If valid bit is set but tag not one we re looking for, evict current block Request line from lower level Upon receipt of data from lower level, set tag and valid bits and store data. Pass data up to requestor

81 How caches work Classic abstraction Each level of hierarchy has no knowledge of the configuration of lower level L1 cache s perspective Me L1 L2 cache s perspective Me L2 Cache Memory L2 Cache Memory DRAM DRAM

82 Memory Operation at any level Address 1. Me Cache 1. Cache receives request Memory

83 Memory operation at any level Address 1. Me 2. Cache 1. Cache receives request 2. Look for item in cache Memory

84 Memory operation at any level Address 1. Me 2. Cache 3. Data 1. Cache receives request 2. Look for item in cache Hit - return data Memory

85 Memory operation at any level Address 1. Me Memory Cache 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory

86 Memory operation at any level Address 1. Me Memory Cache Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive data update cache

87 Memory operation at any level Address 1. Me Memory Cache 5. Data Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive data update cache return data

88 Performance Hit: latency = Miss: latency = Goal: minimize misses!!!

89 Performance Hit: latency = access time Miss: latency = Goal: minimize misses!!!

90 Performance Hit: latency = access time Miss: latency = access time + miss penalty Goal: minimize misses!!!

91 Performance How does the memory system affect CPI? Penalty on cache hit: w hit time w frequently only 1 cycle needed to access on cache hit Penalty on cache miss: w miss time time to get from lower level of memory CPI = 1 + memory stalls/instruction = 1 + (% miss) (cache miss penalty)

92 L1 cache s perspective Me Memory L1 L1 s miss penalty contains the access of L2, and possibly the access of DRAM!!! L2 Cache DRAM

93 Multi-level Caches Base CPI 1.0, 500MHz clock Main memory-100 cycles, L2-10 cycles L1 miss rate per instruction - 5% W/L2-2% of instructions go to DRAM What is the speedup with the L2 cache?

94 Multi-level Caches CPI = 1 + memory stalls / instructions

95 Multi-level Caches CPI = 1 + memory stalls / instructions CPI old = 1 + 5% miss/instr * 100 cycles/ miss = = 6 cycles / instr

96 Multi-level Caches CPI = 1 + memory stalls / instructions CPI old = 1 + 5% miss/instr * 100 cycles/miss = = 6 cycles / instr CPI new =1 + L2%*L2penalty +Mem %*MemPenalty instr =1 + 5% * % * 100=3.5 cycles/

97 Multi-level Caches CPI = 1 + memory stalls / instructions CPI old = 1 + 5% miss/instr * 100 cycles/miss = = 6 cycles / instr CPI new =1 + L2%*L2penalty +Mem %*MemPenalty instr =1 + 5% * % * 100=3.5 cycles/ Speedup = 6 / 3.5 = 1.7

98 Average Memory Access Time AMAT = L1 access time + L1 miss penalty L1 miss penalty = L2 access time + L2 miss penalty L2 miss penalty = Memory access time + Memory miss penalty

99 Calculate AMAT Organization: w L1 cache Access time is 1 cycle Hit rate of 90% w L2 cache Access time is 10 cycles Hit rate of 95% w Memory Access time is 100 cycles Hit rate of 100%

100 Ways To Improve Cache Performance Make the cache bigger w Pro: More stuff can fit in the cache so stuff doesn t have to get thrown out as often w Con: Time to access larger memory longer Reduce the number of conflicts in the cache by increasing associativity w Pro: Multiple memory lines that map to same cache set can reside in cache simultaneously w Con: More time needed to determine if there is a hit because have to check multiple cache blocks

101 Ways To Improve Cache Performance Use multiple levels of cache w Access time of non-primary cache not as important. More important for it to have lower miss rate. w Pro: Reduces (average) miss penalty if there is a hit in lower level of cache w Con: Takes up space and increases (worst-case) latency if access misses in this level of cache. Make the block size larger to exploit spatial locality w Pro: Fewer misses for sequential accesses w Pro: Decreases bits dedicated to tags w Con: Fewer blocks in cache for given cache size w Con: Miss penalty may be larger because larger blocks need to be retrieved from lower level of hierarchy

102 2-way set associative $ w Block size = 4 words w Total size = 32 words Word addresses w 2 w 35 w 63 w 110 w 210 w 77 w 3 w 97 w 170 What is the hit rate? Example

103 Cache Writes There are multiple copies of the data lying around w L1 cache, L2 cache, DRAM Do we write to all of them? Do we wait for the write to complete before the processor can proceed?

104 Do we write to all of them? Write-through - write to all levels of hierarchy Write-back - write to lower level only when cache line gets evicted from cache w creates inconsistent data - different values for same item in cache and DRAM. w Inconsistent data in highest level in cache is referred to as dirty

105 Write-Through CPU sw $3, 0($5) L1 L2 Cache DRAM

106 Write-Back CPU sw $3, 0($5) L1 L2 Cache DRAM

107 Write-through vs Write-back Which performs the write faster? w Write-back - it only writes the L1 cache Which has faster evictions from a cache? w Write-through - no write involved, just overwrite tag Which causes more bus traffic? w Write-through. DRAM is written every store. Write-back only writes on eviction.

108 Beyond The Cache: Memory

109 Memory System Design Challenges DRAM is designed for density, not speed DRAM is slower than the bus We are allowed to change the width, the number of DRAMs, and the bus protocol, but the access latency stays slow. Widening anything increases the cost by quite a bit.

110 Narrow Configuration CPU Given: w 1 clock cycle request w 15 cycles / word DRAM latency w 1 cycle / word bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? Cache Bus DRAM

111 Narrow Configuration CPU Given: w 1 clock cycle request w 15 cycles / word DRAM latency w 1 cycle / word bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? 1cycle + 15 cycles/word * 8 words + 1 cycle/word * 8 words = 129 cycles Cache Bus DRAM

112 Wide Configuration CPU Given: w 1 clock cycle request w 15 cycles / 2 words DRAM latency w 1 cycle / 2 words bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? Cache Bus DRAM

113 Wide Configuration CPU Given: w 1 clock cycle request w 15 cycles / 2 words DRAM latency w 1 cycle / 2 words bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? 1cycle + 15 cycles/2 words * 8 words + 1 cycle/2words*8words = 65 cycles Cache Bus DRAM

114 Interleaved Configuration CPU Byte 0 in DRAM 0, byte 1 in DRAM 1, Byte 2 in DRAM 0,... Given: w 1 clock cycle request w 15 cycles / word DRAM latency w 1 cycle / word bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? Cache Bus DRAM DRAM

115 Interleaved Configuration CPU Given: w 1 clock cycle request w 15 cycles / word DRAM latency w 1 cycle / word bus latency If a cache block is 8 words, what is the miss penalty of an L2 cache miss? 1 cycle + 15 cycles / 2 words * 8 words + 1 cycle / word * 8 words = 69 cycles Cache Bus DRAM DRAM

116 DRAM Optimizations Fast page mode w Allow repeated accesses to row buffer without another row access time Synchronous DRAM w Add clock signal to DRAM interface to make synchronous w Programmable register holds number of bytes to transfer over many cycles Double Data Rate (DDR) w Transfer data on rising and falling clock edges instead of just one.

117 DRAM Optimizations Make DRAM chip act like a memory system Each chip has interleaved memory and a high speed interface RDRAM w Switch RAS/CAS lines to bus that allows multiple access to be inflight simultaneously You don t have to wait for one DRAM request to finish before sending another request Direct RDRAM w Don t multiplex over one bus. Have 3: Data Row Column

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality Caching Chapter 7 Basics (7.,7.2) Cache Writes (7.2 - p 483-485) configurations (7.2 p 487-49) Performance (7.3) Associative caches (7.3 p 496-54) Multilevel caches (7.3 p 55-5) Tech SRAM (logic) SRAM

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/16/17 Fall 2017 - Lecture #15 1 Outline

More information

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts // CS C: Great Ideas in Computer Architecture (Machine Structures) s Part Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~csc/ Organization and Principles Write Back vs Write Through

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

CpE 442. Memory System

CpE 442. Memory System CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance

More information

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Bernhard Boser & Randy H Katz http://insteecsberkeleyedu/~cs61c/ 10/18/16 Fall 2016 - Lecture #15 1 Outline

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Cycle Time for Non-pipelined & Pipelined processors

Cycle Time for Non-pipelined & Pipelined processors Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Lecture 18: DRAM Technologies

Lecture 18: DRAM Technologies Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture

More information

Review : Pipelining. Memory Hierarchy

Review : Pipelining. Memory Hierarchy CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality

More information

Chapter 6 Objectives

Chapter 6 Objectives Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to

More information

Caches Concepts Review

Caches Concepts Review Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

CENG4480 Lecture 09: Memory 1

CENG4480 Lecture 09: Memory 1 CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Movie Rental Store You have a huge warehouse with every movie ever made.

More information