Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
|
|
- Meryl Lindsey
- 5 years ago
- Views:
Transcription
1 Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off chip this provides a quick tag check and also a larger cache capacity simple (by using direct-mapped caches) means shorter hit time note that direct-mapped may have a higher miss rate than associative caches but concurrent read and tag check makes the hit time (clock cycle) shorter Avoid VM address -> PM address translation use virtually-addressed caches - access by VM addresses address translation proceeds in parallel with cache search if translation indicates that the VM address is not mapped to a PM address, then void the cache search result if the translation indicates a protection violation, then raise an exception if there is a cache miss, then bring the cache block in using the PM address Chapter 5 page 35
2 Problems with Virtual Caches Task switch causes the same VM address to refer to different PM addresses hence cache must be flushed creating a large task switch overhead to avoid flushing, we need a OS-generated tags to identify processes the cache tag field includes a PID -- may increase hit time VM alias problems OS and User code have different VM addresses which map to same PM address, which results in 2 copies in the cache need hardware's anti-aliasing mechanisms to guarantee single copy or need software's ``page coloring'' solution e.g., SUN s UNIX guarantees the last 18 bits of VM and PM addresses are the same Common compromise (Dec Alpha and HP7200) virtually addressed physically tagged Chapter 5 page 36
3 Page Coloring in SUN's UNIX Objective: VM and PM addresses match in the last 18 bits, so no two diff. VMAs -> same PMA direct-mapped < 256K can never have duplicated PMAs for blocks Chapter 5 page 37
4 Virtually-Addressed, Physically-Tagged Caches Use part of the page offset within the VM address to index the cache page offset of a VM address consists of "index" and "block offset" the index field is used to select a set in the cache the block offset is used to select a word within the cache block cache blocks still contain "physical tags" as in a physical cache Advanatage: VMA -> PMA occurs simultaneously once the VM address is translated, can compare the tag field of the translated PM address with the tags in the cache set for direct-mapped, only one tag needs to be compared for n-way set associative, all n tags in the set are compared See Figure 5.27 Chapter 5 page 38
5 Faster Write Hits Write hits are slower since the tag check must be performed before write can proceed Hence pipeline the writes (e.g. Dec Alpha) tags and data are split and addressed independently tag check of (i+1)th write request occurs simultaneously in the same cycle as write of the ith request in a pipeline way Result looks like a write happens on every cycle cycle-time can stay short since real write is spread over several cycles mostly works since CPU is not dependent on data from a write Chapter 5 page 39
6 Cache Improvement Summary Table 1: Technique Miss Rate Miss Penalty Hit Time HW Complexity Comments Larger Block Size win lose easy trivial engineering effort Higher Associativity win lose 1 associative match isn t free Victim Caches win 2 e.g. HP 7200 Pseudo-Associative win 2 Used in L2 of MIPS R10000 HW Prefetch of I and D win 2 D fetch hard to do Compiler controlled prefetch win 3 Needs non-blocking cache Compiler cache scheduling (blocking, merging, loop interchaning,...) win 0 Too bad it s hard to do for all applications - loop focus for now Prioritizing Read Misses win 1 write buffer - simple Subblock placement win 1 good at reducing tag overhead Early restart + critical word first win 2 MIPS R10K, IBM 620 Nonblocking Caches win 3 MIPS R10K, AXP Second Level Caches big win 2 big additional cost Small and Simple Caches lose win 0 trivial so it s widely used Avoid VM->PM translation effects win 2 Alpha AXP 21064, PA Pipelining Writes win 1 AXP 21064, PA-8000, UltraSPARC Chapter 5 page 40
7 Main Memory At the low level of the memory hierarchy 3 important issues Capacity Bell s law - 1 MB per MIP needed real key here is to avoid those costly page faults Latency how long does it take to get the data back by addressing big chunks - like an entire cache block this can be amortized critical to cache performance when the miss is to main Bandwidth affects the time it takes to transfer the block a key issue when DMA service from an I/O device is considered also a key issue with the very large block sizes in lower level caches Chapter 5 page 41
8 Memory Technology SRAM s and DRAM s are different DRAM: 1 transistor/bit; SRAM: 4-6 transistors/bit DRAM capacity is 4-8 times that of SRAM at same feature size SRAM speed is 8-16 times that of DRAM but cost is as much Main memory today means DRAM Multiplexed address lines - row and then column access 2 dimensional address - rows go to a buffer and subsequent column selects subrow Refresh needed every few milliseconds Where are we today (Figure 5.35) 64 Mbit chips RAS access time between 50 and 65ns CAS access 10 ns cycle time - 90 ns (separation between subsequent accesses) Chapter 5 page 42
9 Consider Example 4 cycles to send the address 24 cycles to access a word in the memory unit 4 cycles to transmit the data Hence if main memory is organized by word then 32 cycles for every word is spent Given a cache block size of 4 words 32 *4 = 128 cycles is the miss penalty Clearly we need a better organizational model Chapter 5 page 43
10 Memory Organization Improvements Wider memory Make the width of main memory match/w the cache block size Easy to do - need 4 words on a miss - just quadruple the memory bandwidth; following the numbers in the last slide, miss penalty now? Problem is the cost of the wider bus between cache and MM moreover, since CPU accesses one word at a time, so a multiplexer is needed between cache and CPU, which may adversely affect the cache hit time Interleaved Memory Bus bandwidth is the same but we make it work more often Organize the memory in banks so they read simultaneously and then each deliver one word to bus interleavingly; miss penalty now? Both are optimized for sequential memory accesses e.g. capitalizes on spatial locality principle just like caches do Chapter 5 page 44
11 Reducing Bank Access Conflicts Like to deliver one word from a bank per cycle Therefore # of banks >= access time per word in cycles to avoid any access gap Ex: # banks = 8 & access time per word = 10 cycles -- access gap? Interleaved memory ideal for sequential access Bad if data references go to the same bank e.g., accessing array words #0, #128, #256, etc. in a 128-bank IM Addressing in power-of-2 interleaved memory bank # = (address) mod (# of banks) offset within a bank = (address) / (# of banks) Addressing in prime-number interleaved memory offset within a bank = (address) mod (# of words in a bank) Chapter 5 page 45
12 Addressing in Power-of-2 Interleaved Memory See Figure 5.32 Chapter 5 page 46
13 Reducing Bank Access Conflicts Power-of-2 vs. prime-number interleaved memories See Figure 5.34 Chapter 5 page 47
14 Simple Calculation Problems A machine has 16MB main memory organized into 32-way interleaving and a 64KB cache The cache block size is 512B and the cache is always presented with a PM address Q1: How many bits are in the tag field? (1) for a 16-way set associative cache? (a) 8 (b) 10 (c) 12 (d) 14 (e) 16 (2) for a direct-mapped cache? (a) 8 (b) 10 (c) 12 (d) 14 (e) 16 Q2: For a direct-mapped cache, what is the physical address of a byte at offset 63 of memory bank 3 (all numbers start from 0)? (a) (b) (c) (d) Chapter 5 page 48
15 Virtual Memory Permits applications to use an address space large than the main memory in size Helps with multiple process management Each process gets its own portion of memory Access protection can be imposed on a per process basis Mapping of all VM dynamically onto one shared PM Mapping also facilitates relocation Applications and CPU run in virtual space Mapping onto physical space is invisible to the application VM Management applies between main memory and secondary (disk) hierarchy levels Miss becomes a page or address fault Block becomes a page or segment Chapter 5 page 49
16 Typical Performance Parameters Parameter Page Size L1 Cache Hit Time Virtual Hit (e.g. in Main Memory) Miss Penalty - all the way to disk Disk access time Page Transfer time Table 1: Value 4KB - 64KB 1-2 clock cycles clock cycles 700K - 6M clock cycles 500K - 4M cycles 200K - 2M clock cycles Page Fault Rate.00001% -.001% Main Memory Size 4MB - 4 GB It s a lot like what happens in Cache But all the numbers are much much larger With the exception of the miss rate Chapter 5 page 50
17 Replacement Cache vs. VM Differences Cache miss is handled by hardware Page fault is often handled by the OS since page fault penalty is very large hence more sophisticated strategies can be used to reduce miss penalty Addresses VM space is determined by the address space of the CPU Cache size is independent of the CPU address space Lower level memory For caches - the main memory is not shared by something else For VM - most of the disk contains the file system File system addressed differently - usually in I/O space The VM lower level is usually called the SWAP space Chapter 5 page 51
18 2 VM Styles - Paged or Segmented Pages are fixed-size blocks Segments' sizes vary from 1 byte to 2**32 Table 1: Aspect Page Segment Words per addr. One - contains page and offset Two - possible large max-size hence need two words to address segment and offset Programmer visible No Sometimes yes Replacement Trivial - due to fixed size Hard - need to find contiguous space ==> GC necessary or wasted memory Memory Inefficiency Disk Efficiency Internal fragmentation - wasted part of a page Yes - adjust page size to balance access and transfer time External fragmentation - due to variable size blocks Not always - segment size varies Chapter 5 page 52
19 Block Placement The 4 Questions for VM Choice between lower miss rates and complex placement or vice versa Miss penalty is huge So choose low miss rate ==> place anywhere Similar to fully associative cache model Block Addressing - both use additional data structure Pages - use a page table Virtual page number ==> physical page number and catenate offset Tag bit to indicate presence in main memory Segments - segment table segment # ==> starting physical address of segment + offset Segment table needs an entry for every possible segment Lots of little segments mean a large segment table - always a possibility Chapter 5 page 53
20 Normal Page Tables More on Page Tables Size: number of entries = number of virtual pages For a 32 bit virtual address, 4K pages & 4 bytes per entry size of page table = (4 GB / 4 KB) * 4B = 4MB required too large for nowadays machines - must go for smaller VM space, bigger pages, OR Inverted Page Table Why allocate an entry for each VM page? Instead we allocate an entry for each PM block (page) Example: PM size = 64MB, then the size of inverted page table = (64MB/4KB)*4B=64KB Hash the virtual page number into an entry of the page table Then compare the virtual page number with the tag stored in the hashed entry to make sure it is a match If miss, go to full page table stored on disk - this implies 2 disk accesses in the worst cases; however, we trade increased worst-case penalty for decrease in capacity misses since there is now more room for real pages rather than page table pages check the valid bit of the page table entry - if valid then the page is in PM - else page fault Chapter 5 page 54
21 Back to the 4Q s for VM Block Replacement LRU is the best so use it to minimize the huge miss penalty However like caches true LRU is very expensive - so Page table contains a use tag On access the use tag is set OS checks them periodically - records what it sees in an internal data structure - then clears all the use tags On a miss the OS decides the least used based on the records in its data structure Note - worth a few OS cycles to avoid the huge miss penalty Write Strategy Always write back Due to the access time to the disk - write through is impractical Chapter 5 page 55
22 Page Size? An architectural choice Large pages are good: Reduces page table size Amortizes the long disk access If spatial locality is good then hit rate will improve Large pages are bad: More internal fragmentation If everything is random then each program's segment is only half full Half of bigger is still bigger If there are 3 segments per process: code, data, and stack Then 1.5 pages are wasted for each process Process start up time takes longer since at least 1 page will be necessary and transfer penalty aspect is higher And vice versa of course Chapter 5 page 56
23 Address Translation Page tables are large and paged themselves in some systems So double page faults are possible -- bad for performance If locality applies then cache the references This is called a TLB (a special cache) a TLB entry consists tag (virtual page #), PM block #, valid bit, use bit, protection field, dirty bit, etc. TLB and page table must be consistent with each other TLB is in the CPU critical path TLB must be checked before the cache access can hit Result is the cache hit time may get stretched a bit Virtually-indexed, physically-tagged caches will help Chapter 5 page 57
24 More on TLB s The TLB must be on chip - otherwise it is worthless Small TLB s are worthless anyway Large TLB s are expensive - fully associative Typical TLB s Block size - same as a page table entry - 1 or 2 words Hit time - 1 cycle Miss penalty - 10 to 30 cycles Miss rate -.1% to 2% TLB size - 32 B to 8 KB They re expensive but necessary ==> Price of CPU s is going up! Chapter 5 page 58
25 e.g. AXP TLB 30 bits Page Frame Number 13 bits Page Offset Virtual Address V 2 R 2 W 30 bits 21 bits VPN Tag Physical Frame # V R W VPN Tag Physical Frame # V R W VPN Tag Physical Frame # 32 entries total V R W VPN Tag Physical Frame # protection hit location 32:1 Mux n Indicates steps that could be pipelined bit physical address Chapter 5 page 59
26 A Simple Calculation Problem (5.8(b), Text) A machine with based CPI=1.5; 20% are data transfer instructions; A write-back ``virtual'' cache with 32B per cache block; miss rate = 2.2%; 50% cache blocks are dirty Memory latency = 40 cycles; transfer rate = 4B/cycle TLB does not slow down a cache hit; TLB miss rate 0.2% and TLB miss penalty = 20 cycles What is the CPI? CPI is affected by three sources of stalls: (1) caused by instruction fetch stalls (2) caused by data reference stalls (3) caused by TLB access stalls when there are cache misses So CPI = [(1+20%)*2.2%*72] + [(1+20%)*2.2%*(1+50%)*0.2%*20] The 3rd term above accounts for the TLB stalls per instrcution (1+20%) is the number of memory references per instruction 2.2%*(1+50%) is the number of TLB accesses per memory reference because on a miss 50% of the time TLB needs to do another VM->PM translation to flush a victim cache block Chapter 5 page 60
CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationLecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff
Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationLecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Reducing Miss Penalty Summary Five techniques Read priority
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationVirtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.
CSE420 Virtual Memory Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) Virtual Memory Virtual memory first used to relive programmers from the burden
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationCPS 104 Computer Organization and Programming Lecture 20: Virtual Memory
CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationCPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner
CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationChapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationMemory Technologies. Technology Trends
. 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationECE468 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively
More informationBackground. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.
Memory Hierarchies Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory Mem Element Background Size Speed Price Register small 1-5ns high?? SRAM medium 5-25ns $100-250 DRAM large
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationECE4680 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationCMSC 611: Advanced Computer Architecture. Cache and Memory
CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationCS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 9, 2015
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationCS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)
CS6C Review of Cache/VM/TLB Lecture 26 April 3, 999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs6c/schedule.html Outline Review Pipelining Review Cache/VM/TLB Review
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationVirtual Memory. Motivation:
Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More information