Lecture 11 Cache and Virtual Memory
|
|
- Augustine Norris
- 5 years ago
- Views:
Transcription
1 Lecture 11 Cache and Virtual Memory Peng Liu 1
2 Associative Cache Example 2
3 Tag & Index with Set-Associative Caches Assume a 2 n -byte cache with 2 m -byte blocks that is 2 a set-associative Which bits of the address are the tag or the index? m least significant bits are byte select within the block Basic idea The cache contains 2 n /2 m =2 n-m blocks Each cache way contains 2 n-m /2 a =2 n-m-a blocks Cache index: (n-m-a) bits after the byte select Same index used with all cache ways Observation For fixed size, length of tags increases with the associativity Associative caches incur more overhead for tags 3
4 Placement Policy Block Number Memory Set Number Cache block 12 can be placed Fully Associative anywhere (2-way) Set Direct Associative Mapped anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) 4
5 Direct-Mapped Cache Tag Index Block Offset t V Tag k Data Block b 2 k lines = t HIT Data Word or Byte 5
6 2-Way Set-Associative Cache Tag Index Block Offset b t V Tag k Data Block V Tag Data Block t = = Data Word or Byte HIT 6
7 Fully Associative Cache V Tag Data Block = t Block Offset Tag b t = = Data Word or Byte HIT 7
8 Replacement Methods Which line do you replace on a miss? Direct Mapped Easy, you have only one choice Replace the line at the index you need N-way Set Associative Need to choose which way to replace Random (choose one at random) Least Recently Used (LRU) (the one used least recently) Often difficult to calculate, so people use approximations. Often they are really not recently used 8
9 Replacement Policy In an associative cache, which block from a set should be evicted when the set becomes full? Random Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way) pseudo-lru binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin used in highly associative caches Not Least Recently Used (NLRU) FIFO with exception for most recently used block or blocks This is a second-order effect. Why? Replacement only happens on misses 9
10 Block Size and Spatial Locality Block is unit of transfer between the cache and memory Tag Word0 Word1 Word2 Word3 4 word block, b=2 Split CPU address block address offset b 32-b bits b bits 2 b = block size a.k.a line size (in bytes) Larger block size has distinct hardware advantages less tag overhead exploit fast burst transfers from DRAM exploit fast burst transfers over wide busses What are the disadvantages of increasing block size? Fewer blocks => more conflicts. Can waste bandwidth. 10
11 CPU-Cache Interaction (5-stage pipeline) PCen PC 0x4 Add nop addr inst hit? Primary Instruction Cache IR D Decode, Register Fetch E A B MD1 ALU M Y MD2 we addr Primary Data rdata Cache hit? wdata R Stall entire CPU on data cache miss To Memory Control Cache Refill Data from Lower Levels of Memory Hierarchy 11
12 Improving Cache Performance Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the hit time reduce the miss rate reduce the miss penalty What is the simplest design strategy? Biggest cache that doesn t increase hit time past 1-2 cycles (approx 8-32KB in modern technology) [ design issues more complex with out-of-order superscalar processors ] 12
13 Serial-versus-Parallel Cache and Memory Access α is HIT RATIO: Fraction of references in cache 1 - α is MISS RATIO: Remaining references Processor Addr Data CACHE Addr Data Main Memory Average access time for serial search: t cache + (1 - α) t mem Processor Addr Data CACHE Data Main Memory Average access time for parallel search: α t cache + (1 - α) t mem Savings are usually small, t mem >> t cache, hit ratio α high High bandwidth required for memory path Complexity of handling parallel paths can slow t cache 13
14 Causes for Cache Misses Compulsory: first-reference to a block a.k.a. cold start misses - misses that would occur even with infinite cache Capacity: cache is too small to hold all data needed by the program - misses that would occur even under perfect replacement policy Conflict: misses that occur because of collisions due to block-placement strategy - misses that would not occur with full associativity 14
15 Effect of Cache Parameters on Performance Larger cache size + reduces capacity and conflict misses - hit time will increase Higher associativity + reduces conflict misses - may increase hit time Larger block size + reduces compulsory and capacity (reload) misses - increases conflict misses and miss penalty 15
16 Multilevel Caches A memory cannot be large and fast Increasing sizes of cache at each level CPU L1$ L2$ DRAM Local miss rate = misses in cache / accesses to cache Global miss rate = misses in cache / CPU memory accesses Misses per instruction = misses in cache / number of instructions 16
17 Multilevel Caches Primary (L1) caches attached to CPU Small, but fast Focusing on hit time rather than hit rate Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Unified instruction and data Focusing on hit rate rather than hit time Main memory services L2 cache misses Some high-end systems include L3 cache 17
18 A Typical Memory Hierarchy Split instruction & data primary caches (on-chip SRAM) Multiple interleaved memory banks (off-chip DRAM) CPU RF L1 Instruction Cache L1 Data Cache Unified L2 Cache Memory Memory Memory Memory Multiported register file (part of CPU) Large unified secondary cache (on-chip SRAM) 18
19 Itanium-2 On-Chip Caches (Intel/HP, 2002) Level 1: 16KB, 4-way s.a., 64B line, quad-port (2 load+2 store), single cycle latency Level 2: 256KB, 4-way s.a, 128B line, quad-port (4 load or 4 store), five cycle latency Level 3: 3MB, 12-way s.a., 128B line, single 32B port, twelve cycle latency 22
20 What About Writes? Where do we put the data we want to write? In the cache? In main memory? In both? Caches have different policies for this question Most systems store the data in the cache (why?) Some also store the data in memory as well (why?) Interesting observation Processor does not need to wait until the store completes 23
21 Cache Write Policies: Major Options Write-through (write data go to cache and memory) Main memory is updated on each cache write Replacing a cache entry is simple (just overwrite new block) Memory write causes significant delay if pipeline must stall Write-back (write data only goes to the cache) Only the cache entry is updated on each cache write so main memory and cache data are inconsistent Add dirty bit to the cache entry to indicate whether the data in the cache entry must be committed to memory Replacing a cache entry requires writing the data back to memory before replacing the entry if it is dirty 24
22 Write-through Write Policy Trade-offs Misses are simpler and cheaper (no write-back to memory) Easier to implement Requires buffering to be practical Uses a lot of bandwidth to the next level of memory Write-back Writes are fast on a hit Multiple writes within a block require only one writeback later Efficient block transfer on write back to memory at evicaiton 25
23 Write Policy Choices Cache hit: write through: write both cache & memory generally higher traffic but simplifies cache coherence write back: write cache only (memory is written only when the entry is evicted) a dirty bit per block can further reduce the traffic Cache miss: no write allocate: only write to main memory write allocate (aka fetch on write): fetch into cache Common combinations: write through and no write allocate write back with write allocate 26
24 Write Buffer to Reduce Read Miss Penalty CPU RF Data Cache Write buffer Unified L2 Cache Evicted dirty lines for writeback cache OR All writes in writethru cache Processor is not stalled on writes, and read misses can go ahead of write to main memory Problem: Write buffer may hold updated value of location needed by a read miss Simple scheme: on a read miss, wait for the write buffer to go empty Faster scheme: Check write buffer addresses against read miss addresses, if no match, allow read miss to go ahead of writes, else, return value in write buffer 27
25 Write Buffers for Write-Through Caches Processor Cache Write Buffer Lower Level Memory Holds data awaiting write-through to lower level memory Q. Why a write buffer? Q. Why a buffer, why not just one register? Q. Are Read After Write (RAW) hazards an issue for write buffer? A. So CPU doesn t stall A. Bursts of writes are common. A. Yes! Drain buffer before next read, or check write buffers. 28
26 Avoiding the Stalls for Write-Through Use write buffer between cache and memory Processor writes data into the cache and the write buffer Memory controller slowly drains buffer to memory Write buffer: a first-in-first-out buffer (FIFO) Typically holds a small number of writes Can absorb small bursts as long as the long term rate of writing to the buffer does not exceed the maximum rate of writing to DRAM 29
27 Cache Write Policy: Allocation Options What happens on a cache write that misses? It s actually two subquestions Do you allocate space in the cache for the address? Write-allocate VS no-write allocate Actions: select a cache entry, evict old contents, update tags, Do you fetch the rest of the block contents from memory? Of interest if you do write allocate Remember a store updates up to 1 word from a wider block Fetch-on-miss VS no-fetch-on-miss For no-fecth-on-miss must remember which words are valid Use fine-grain valid bits in each cache line 30
28 Write-back caches Typical Choices Write-allocate, fetch-on-miss Write-through caches Write-allocate, fetch-on-miss Write-allocate, no-fetch-on-miss No-write-allocate, write-around Modern HW support multiple polices Select by OS on at some coarse granularity 31
29 Steps Write allocate Fetch on miss Pick replacement Fetch block Write Miss Actions Write through No fetch on miss Pick replacement No write allocate Write around Write invalidate Hit Invalidate tag Fetch on miss Pick replacement Write back Fetch block Write back Write allocate No fetch on miss Pick replacement Write back 4 Write cache Write partial cache Write cache Write partial cache 5 Write memory Write memory Write memory Write memory 32
30 Be Careful, Even with Write Hits Reading form a cache Read tags and data in parallel If it hits, return the data, else go to lower level Writing a cache can take more time First read tag to determine hit/miss Then overwrite data on a hit Otherwise, you may overwrite dirty data or write the wrong cache way Can you ever access tag and write data in parallel? 33
31 Splitting Caches Most processors have separate caches for instructions & data Often noted $I and $D Advantages Extra access port Can customize to specific access patterns Low hit time Disadvantages Capacity utilization Miss rate 34
32 Write Policy: Write-Through vs Write-Back Policy Write-Through Data written to cache block also written to lower-level memory Write-Back Write data only to the cache Update lower level when a block falls out of the cache Debug Easy Hard Do read misses produce writes? Do repeated writes make it to lower level? No Yes Additional option -- let writes to an un-cached address allocate a new cache line ( write-allocate ). Yes No 35
33 Cache Design: Datapath + Control Most design errors come from incorrect specification of state machine behavior! Common bugs: Stalls, Block replacement, Write buffer To CPU Control State Machine Control Control To Lower Level Memory To CPU Addr Din Dout Blocks Tags Addr Din Dout To Lower Level Memory 36
34 Cache Controller Example cache characteristics Direct-mapped, write-back, write allocate Block size: 4 words (16 bytes) Cache size: 16KB (1024 blocks) 32-bit byte addresses Valid bit and dirty bit per block Blocking cache CPU waits until assess is complete Address 37
35 Signals between the Processor and the Cache 38
36 Finite State Machines Use and FSM to sequence control steps Set of states, transition on each clock edge State values are binary encoded Current state stored in a register Next state = fn (current state, current inputs) Control output signals = fo (current state) 39
37 Cache Controller FSM Idle state Waiting for a valid read or write request from the processor Compare Tag state Testing if hit or miss If hit, set Cache Ready after read or write -> Idle state If miss, updates the cache tag If dirty ->Write-Back state, else -> Allocate state Write-Back state Writing the 128-bit block to memory Waiting for ready signal from memory ->Allocate state Allocate state Fetching new blocks is from memory 40
38 Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Example cache block read 1 bus cycle for address transfer 15 bus cycles per DRAM access 1 bus cycle per data transfer For 4-word block, 1-word-wide DRAM Miss penalty = 1 + 4x15 + 4x1 = 65 bus cycles Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle 41
39 Increasing Memory Bandwidth 42
40 DRAM Technology 43
41 Why DRAM over SRAM? Density! SRAM Cell: Large bit bit 6 transistors nfet and pfet 3 interface wires Vdd and Gnd 1 0 word DRAM Cell: Small transistor + capacitor nfet only 2 interface wires no Vdd Density advantage: 3X to 10X, depends on metric 44
42 DRAM: Reading, Writing, Refresh Writing DRAM: Drive data on bit line Select row Capacitor holds state for 60 ms -- then must do refresh 1 Reading DRAM Select row Sense bit line (~1 million electrons) Write value back 1 Refresh: a dummy read
43 Synchronous DRAM (SDRAM) Interface A clocked bus protocol (ex: 100 MHz) Note! This example is best-case! For a random access, DRAM takes many more than 2 cycles! Cache controller puts commands on bus (CAS = Column Address Strobe) Data comes out several cycles later. 46
44 Advanced DRAM Organization Bits in a DRAM are organized as a rectangular array DRAM accesses an entire row Burst mode: supply successive words from a row with reduced latency Double data rate (DDR) DRAM Transfer on rising and falling clock edges Quad data rate (QDR) DRAM Separate DDR inputs and outputs DIMMs: small boards with multiple DRAM chips connected in parallel Functions as a higher capacity, wider interface DRAM chip Easier to manipulate, replace,.. 47
45 Measuring Performance 48
46 Measuring Performance Memory system is important for performance Cache access time often determines the overall system clock cycle time since it is often the slowest pipeline stage Memory stalls is a large contributor to CPI Stall due to instructions & data, reading & writing Stalls include both cache miss stalls and write buffer stalls Memory system & performance CPU Time = (CPU Cycles + Memory Stall Cycles) * Cycle Time MemStallCycles = Read Stall Cycles + Write Stalls Cycles CPI = CPIpipe + AcgMemStallCycles CPIpipe = 1 + HazardStallsCycles 49
47 Memory Performance Read stalls are fairly easy to understand Read Cycles = Read/prog * ReadMissRate * ReadMissPenalty Write stalls depend upon the write policy Write-through Write Stall = (Writes/Prog * WriteMissRate *WriteMissPenalty)+ Write Buffer Stalls Write-back Write Stall = (Writes/Prog * WriteMissRate * WriteMissPenalty) Write miss penalty can be complex: Can be partially hidden if processor can continue executing Can include extra time to write-back a value we are evicting 50
48 Worst-Case Simplicity Assume that write and read misses cause the same delay MemoryAccesses MemoryStallCycles = MissRate MissPenalty Pr ogram Instructions Misses MemoryStallCycles = MissPenalty Pr ogram Instruction In a single-level cache system MissPenalty = latency of DRAM In a multi-level cache system MissPenalty is the latency of L2 cache etc Calculate by considering MissRateL2, MissPenaltyL2 etc Watch out: global vs local miss rate for L2 51
49 Simple Cache Performance Example Consider the following Miss rate for instruction access is 5% Miss rate for data access is 8% Data references per instruction are 0.4 CPI with perfect cache is 2 Read and write miss penalty is 20 cycles Including possible write buffer stalls What is the performance of this machine relative to one without misses? Always start by considering execution times (IC*CPI*CCT) But IC and CCT are the same here, so focus on CPI 52
50 Performance Solution Find the CPI for the base system without misses CPI no misses = CPIperfect = 2 Find the CPI for system with misses Misses/inst = I Cache Misses + D Cache Misses = 0,05 + (0.08*0.4) = Memory Stall Cycles = Misses/Inst * MissPenalty = 0.082*20 = 1.64 cycles/inst CPI with misses = CPIperfect + Memory Stall Cycles = = 3.64 Compare the performance Performancenomisses CPIwithmisses 3.64 n = 1.82 Performance = CPI = 2 = withmisses nomisses 53
51 Another Cache Problem Given the following data Base CPI of instruction reference per instruction fetch 0.27 loads/instruction 0.13 stores/instruction A 64KB S.A, cache with 4-word block size has a miss rate of 1.7% Memory access time = 4 cycles + #words/block Suppose the cache uses a write through, write-around write strategy without a write buffer. How much faster would the machine be with a perfect write buffer? CPUtime = Instruction Count*(CPIbase + CPImemory) * ClockCycleTime Performance is proportional to CPI = CPImemory 54
52 No Write Buffer CPU Cache Lower Level Memory CPI memory = reads/inst.*miss rate * read miss penalty + writes/inst.* write penalty read miss penalty = 4 cycles + 4 words * 1cycle/word = 8 cycles write penalty = 4 cycles + 1word * 1cycle/word = 5 cycles CPI memory = (1 if ld)(1/inst.)*(0.017)*8 cycles + (0.13st)(1/inst.)*5cycles CPI memory = 0.17 cycles/inst cycles/inst. = 0.82 cycles/inst. CPI overall = 1.5 cycles/inst cycles/inst. = 2.32 cycles/inst. 55
53 Perfect Write Buffer CPU Cache Wbuff Lower Level Memory CPI memory = reads/inst.*miss rate * 8 cycle read miss penalty + writes/inst.* (1- miss rate) * 1 cycle hit penalty A hit penalty is required because on hits we must Access the cache tags during the MEM cycle to determine a hit Stall the processor for a cycle to update a hit cache block CPI memory = 0.17 cycles/inst. + (0.13st)(1/inst.)*( )*1cycle CPI memory = 0.17 cycles/inst cycles/inst. = 0.30 cycles/inst. CPI overall = 1.5 cycles/inst cycles/inst. = 1.80 cycles/inst. 56
54 Perfect Write Buffer + Cache Write Buffer WBuff CPU Cache Lower Level Memory CWB CPI memory = reads/inst.*miss rate * 8 cycle read miss penalty Avoid a hit penalty on write by: Add a one-entry write buffer to the cache itself Write the last store hit to the data array during next stors s MEM Hazard: On loads, must check CWB along with cache!. CPI memory = 0.17 cycles/inst. CPI overall = 1.5 cycles/inst cycles/inst. = 1.67 cycles/inst. 57
55 Virtual Memory 58
56 Motivation #1: Large Address Space for Each Executing Program Each program thinks it has a ~232 byte address space of its own May not use it all though 0xFFFF_FFFF 0x8000_0000 Kernel (OS) memory (code, data, heap, stack) User stack (created at runtime) memory invisible to user code $sp (stack pointer) Available main memory may be much smaller brk run-time heap (managed by malloc) 0x1000_0000 0x0040_0000 Read/write segment (.data,.bss) Read-only segment (.text) Loaded from the executable file 0x0000_0000 unused 59
57 Motivation #2: Memory Management for Multiple Programs At an point in time, a computer may be running multiple programs E.g., Firefox + Thunderbird Questions: How do we share memory between multiple programs? How do we avoid address conflicts? How do we protect programs Isolations and selective sharing 60
58 Virtual Memory in a Nutshell Use hard disk (or Flash) as a large storage for data of all programs Main memory (DRAM) is a cache for the disk Managed jointly by hardware and the operating system (OS) Each running program has its own virtual address space Address space as shown in previous figure Protected from other programs Frequently-used portions of virtual address space copied to DRAM DRAM = physical address space Hardware + OS translate virtual addresses (VA) used by program to physical addresses (PA) used by the hardware Translation enables relocation (DRAM disk) & protection 61
59 Reminder: Memory Hierarchy Everything is a Cache for Something Else Access time Capacity Managed by 1 cycle ~500B Software/compiler 1-3 cycles ~64KB hardware 5-10 cycles 1-10MB hardware ~100 cycles ~10GB Software/OS cycles ~100GB Software/OS 62
60 DRAM vs. SRAM as a Cache DRAM vs. disk is more extreme than SRAM vs. DRAM Access latencies: DRAM ~10X slower than SRAM Disk ~100000X slower than DRAM Importance of exploiting spatial locality First byte is ~100,000X slower than successive bytes on disk vs, ~4X improvement for page-mode vs. regular accesses to DRAM 63
61 Impact of These Properties on Design Bottom line: Design decision made for virtual memory driven by enormous cost of misses (disk accesses) Consider the following parameters for DRAM as a cache for the disk Line size? Large, since disk better at transferring large blocks and minminzes miss rate Associativity? High, to miminze miss rate Write through or write back? Write back, since can t afford to perform small writes to disk 64
62 Terminology for Virtual Memory Virtual memory used DRAM as a cache for disk New terms VM block is called a page The unit of data moving between disk and DRAM It is typically larger than a cache block (e.g., 4KB or 16KB) Virtual and physical address spaces can be divided up to virtual pages and physical pages (e.g., contiguous chunks of 4KB) VM miss is called a page fault More on this later 65
63 Locating an Object in a Cache SRAM Cache (L1,L2, etc) Tag stored with cache line Maps from cache block to a memory address No tag for blocks not in cache If not in cache, then it is in main memory Hardware retrieves and manages tag information Can quickly match against multiple tags 66
64 Locating an Object in a Cache (cont.) SRAM Cache (virtual memory) Each allocated page of virtual memory has entry in page table Mapping from virtual pages to physical pages One entry per page in the virtual address space Page table entry even if page not in memory Specifies disk address OS retrieve and manages page table information Object Name X Page Table D J Location 0 On Disk 0 1 Cache Data X 1 N
65 A System with Physical Memory Only Examples: Most Cray machines, early PCs, nearly all embedded systems, etc Addresses generated by the CPU point directly to bytes in physical memory 68
66 A System with Virtual Memory Examples: Workstations, serves, modern PCs, etc. Page Table Memory Virtual Addresses 0 1 Physical Addresses 0 1 CPU P-1 N-1 Disk Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table) 69
67 Page Faults (Similar to Cache Misses ) What if an object is on disk rather than in memory? Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into memory OS has full control over placement Full-associativity to minimize future misses Before fault After fault Page Table Memory Virtual Addresses 0 1 Physical Addresses 0 1 CPU P-1 N-1 Disk 70
68 Does VM Satisfy Original Motivations? Multiple active programs can share physical address space Address conflicts are resolved All programs think their code is at 0x Data from different programs can be protected Programs can share data or code when desired 71
69 Answer: Yes, Using Separate Addresses Spaces Per Program Each program has its own virtual address space and own page table Addresses 0x from different programs can map to different locations or same location as desired OS control how virtual pages as assigned to physical memory 72
70 Translation: High-level View Fixed-size pages Physical page sometimes called as frame 73
71 Translation: Process 74
72 Translation Process Explained Valid page Check access rights (R,W,X) against access type Generate physical address if allowed Generate a protection fault (exception) if illegal access Invalid page Page is not currently mapped and a page fault is generated Faults are handled by the operating system Sometimes due to a program error => program terminated E.g. accessing out of the bounds of array Sometimes due to caching => refill & restart Desired data or code available on disk Space allocated in DRAM, page copied from disk, page table updated Replacement may be needed 75
73 VM: Replacement and Writes To reduce page fault rate, OS uses least-recently used (LRU) replacement Reference bit (aka use bit) in PTE set to 1 on access to page Periodically cleared to 0 by OS A page with reference bit = 0 has not been used recently Disk writes take millions of cycles Block at once, not individual locations Write through is impractical Use write-back Dirty bit in PTE set when page is written 76
74 VM: Issues with Unaligned Accesses Memory access might be aligned or unaligned What happens if unaligned address access straddles a page boundary? What if one page is present and the other is not? Or, what if neither is present? MIPS architecture disallows unaligned memory access Interesting legacy problem on 80x86 which does support unaligned access 77
75 Fast Translation Using a TLB Address translation would appear to require extra memory references One to access the PTE Then the actual memory access But access to page tables has good locality So use a fast hardware cache of PTEs within the processor Called a Translation Look-aside Buffer (TLB) Typical: PTEs, cycle for hit cycles for miss, 0.01%-1% miss rate Misses could be handled by hardware or software 78
76 Fast Translation Using a TLB 79
77 TLB Entries The TLB is a cache for page table entries (PTE) The data for a TLB entry ( == a PTE entry) Physical page number (frame #) Access rights (R/W bits) Any other PTE information (dirty bit, LRU info, etc) The tags for a TLB entry Physical page number Portion of it not used for indexing into the TLB Valid bit LRU bits If TLB is associative and LRU replacement is used 80
78 TLB Case Study: MIPS R2000/R3000 Consider the MIPS R2000/R3000 processors Addresses are 32 bits with 4KB pages (12 bit offset) TLB has 64 entries, fully associative Each entry is 64 bits wide 81
79 If page is in memory TLB Misses Load the PTE from memory and retry Could be handled in hardware Can get complex for more complicated page table structures Or in software Raise a special exception, with optimized handler This is what MIPS does using a special vectored interrupt If page is not in memory (page fault) OS handles fetching the page and updating the page table Then restart the faulting instruction 82
80 TLB & Memory Hierarchies Once address is translated, it used to access memory hierarchy A hierarchy of caches (L1, L2, etc) 83
81 Basic process TLB and Cache Interaction Use TLB to get VA Use VA to access caches and DRAM Question: can you ever access the TLB and the cache in parallel? 84
82 TLB Caveats What happens to the TLB when switching between programs The OS must flush the entries in the TLB Large number of TLB misses after every switch Alternatively, use PIDs (process ID) in each TLB entry Allows entries from multiple programs to co-exist Gradual replacement Limited reach 64 entry TLB with 8KB pages maps 0.5MB Smaller than many L2 caches in most systems TLB miss rate > L2 miss rate! Potential solutions Multilevel TLBs ( just like multi-level caches) Larger pages 85
83 Larger Pages Advantages Page Size Tradeoff Smaller page tables Fewer page faults and more efficient transfer with larger applications Improved TLB coverage Disadvantages Higher internal fragmentation Smaller Pages Advantages Improved time to start up small processes with fewer pages Internal fragmentation is low (important for small programs) Disadvantages High overhead in large page tables General trend toward larger pages 1978:512B, 1984:4KB, 1990:16KB, 2000:64KB 86
84 Multiple Page Sizes Many machines support multiple page sizes SPARC: 8KB, 64KB, 1MB, 4MB MIPS R4000: 4KB -16MB Page size dependent upon application OS kernel used large pages User applications use smaller pages Issues Software complexity TLB complexity 87
85 Virtual Memory Summary Use hard disk ( or Flash) as large storage for data of all programs Main memory (DRAM) is a cache for the disk Managed jointly by hardware and the operating system (OS) Each running program has its own virtual address space Address space as shown in previous figure Protected from other programs Frequently-used portions of virtual address space copied to DRAM DRAM = physical address space Hardware + OS translate virtual addresses (VA) used by program to physical addresses (PA) used by the hardware Translation enables relocation & protection 88
86 Acknowledgements These slides contain material from courses: UCB CS152 Stanford EE108B 89
Lecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationEE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Christos Kozyrakis Stanford University
EE108B Lecture 13 Caches Wrap Up Processes, Interrupts, and Exceptions Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Lab3 and PA2.1 are due ton 2/27 Don t forget
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationAnnouncements. EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Measuring Performance. Memory Performance
Announcements EE18B Lecture 13 Caches Wrap Up Processes, Interrupts, and Exceptions Lab3 and PA2.1 are due ton 2/27 Don t forget to study the textbook Don t forget the review sessions Christos Kozyrakis
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More information5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Movie Rental Store You have a huge warehouse with every movie ever made.
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationvirtual memory Page 1 CSE 361S Disk Disk
CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple
More informationComputer Architecture. Memory Hierarchy. Lynn Choi Korea University
Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 29: an Introduction to Virtual Memory Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Virtual memory used to protect applications
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationVirtual Memory. Motivations for VM Address translation Accelerating translation with TLBs
Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated
More informationMemory Technologies. Technology Trends
. 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Different Storage Memories Chapter 5 Large and Fast: Exploiting Memory
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationHY225 Lecture 12: DRAM and Virtual Memory
HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationvirtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk
5-23 March 23, 2 Topics Motivations for VM Address translation Accelerating address translation with TLBs Pentium II/III system Motivation #: DRAM a Cache for The full address space is quite large: 32-bit
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More informationTopic 18 (updated): Virtual Memory
Topic 18 (updated): Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 5 Large
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationVirtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical
More informationCS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS/ECE 3330 Computer Architecture. Chapter 5 Memory
CS/ECE 3330 Computer Architecture Chapter 5 Memory Last Chapter n Focused exclusively on processor itself n Made a lot of simplifying assumptions IF ID EX MEM WB n Reality: The Memory Wall 10 6 Relative
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationVirtual Memory Oct. 29, 2002
5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More informationMotivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk
class8.ppt 5-23 The course that gives CMU its Zip! Virtual Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationCPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner
CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationBackground. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.
Memory Hierarchies Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory Mem Element Background Size Speed Price Register small 1-5ns high?? SRAM medium 5-25ns $100-250 DRAM large
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationAgenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File
EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Part II Virtual Memory Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system
More informationComputer Systems. Virtual Memory. Han, Hwansoo
Computer Systems Virtual Memory Han, Hwansoo A System Using Physical Addressing CPU Physical address (PA) 4 Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in simple systems like embedded microcontrollers
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More information14 May 2012 Virtual Memory. Definition: A process is an instance of a running program
Virtual Memory (VM) Overview and motivation VM as tool for caching VM as tool for memory management VM as tool for memory protection Address translation 4 May 22 Virtual Memory Processes Definition: A
More informationMemory Hierarchy Requirements. Three Advantages of Virtual Memory
CS61C L12 Virtual (1) CS61CL : Machine Structures Lecture #12 Virtual 2009-08-03 Jeremy Huddleston Review!! Cache design choices: "! Size of cache: speed v. capacity "! size (i.e., cache aspect ratio)
More informationCS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.
CS152 Computer Architecture and Engineering Lecture 15 Virtual Memory 2004-10-21 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/
More informationVirtual Memory - Objectives
ECE232: Hardware Organization and Design Part 16: Virtual Memory Chapter 7 http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy Virtual Memory - Objectives
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationCPS 104 Computer Organization and Programming Lecture 20: Virtual Memory
CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationRandom-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics
Systemprogrammering 27 Föreläsning 4 Topics The memory hierarchy Motivations for VM Address translation Accelerating translation with TLBs Random-Access (RAM) Key features RAM is packaged as a chip. Basic
More information