CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)
|
|
- Ariel Fields
- 6 years ago
- Views:
Transcription
1 CSE 4201 Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)
2 Memory Hierarchy We need huge amount of cheap and fast memory Memory is either fast or cheap; never both. Do as politicians do: fake it Give a little bit of fast memory and tons of cheap memory. As technology progresses Cheap becomes cheaper rapidly Fast becomes faster rapidly Cheap does not become fast, or fast cheap
3 Since the 80's... Processors became 10,000 times faster Memory became 10 times faster Back then cache was for high performance systems Now we need multiple levels of cache
4 Addressing Scheme Cache Index Tag Mapping Address Block Offset Set Block Address Match? Block
5 Set-Associative (Set Address) = (Block Address) MOD K Where K is the number of sets in kache (Block Address) = (Address) DIV b Where b is the number of bytes in the block (Block Offset) = (Address) MOD b Set has n blocks. (n-way associative) Every block has data and address (tag). If K=1, fully associative If n=1, direct mapped
6 Victim Selection Which block to expel to make room for new entry Least recently used Random First In First Out All work more or less the same. LRU is rarely exact, almost always approximate Little effect on big caches About 10% for smaller
7 What Happens on a Write? Writes are less common than reads All instruction fetches are reads Stores are 10% of the instructions, loads, 25% We have 10 writes for every 125 reads Better take good care of the reads Writes are costlier than reads We write 1-8 bytes at a time in a block typically bytes long Have issues with consistency
8 Write Through, Back Write through No need to write back on a cache miss No need to have dirty bit Write back Less bus traffic
9 Write Through, Back Write through <-> no-write-allocate Allocate cache block only on reads Multiple writes w/o immediate read do not disturb cache Write back <-> write-allocate Makes subsequent reads fast
10 AMD Opteron Cache L1 cache (data): 64 KB, 64 byte blocks 1 K blocks 2-way associative 512 sets. LRU Write back, write allocate 64 bogo-bits address 48 virtual, 40 physical
11 AMD Opteron Cache Various sizes: Physical address: 40 bits Block address: 34 bits Block offset: 6 bits Cache index: 9 bits Tag: 25 bits Size of set: 2 blocks (2-way set associative) Number of clock cycles: 2 (2 stalls on hazard)
12 AMD Opteron Cache Steps of cache hit: The 40 bits are split int tag(25), index(9), offset(6) A set (2 blocks) is retrieved using the index Their tags are compared and their valid bits checked The correct block is selected The 3 MSBits of the offset are used to select the word to be read/written. Update the LRU bits
13 AMD Opteron Cache Steps of cache miss Same up until we know it is a miss Identify a victim (LRU) If victim dirty write back If read, stall until next level responds, if write continue (provisionally)
14 Miss Rate Not your elementary school teacher The three Cs Compulsory (the first time) Capacity (reached the maximum number of blocks in the set) Conflict (when the blocks have to share the same spot) We may add one more: coherency
15 Sneaky Miss Rate Miss Rate can be misleading Defined as misses per (1000) access(es) Our delay is related to misses per instruction Misses per instruction is the miss rate times the memory accesses per instruction. Even this can be misleading We want to reduce the delay
16 What is the delay Avg. Mem. Access Time = Hit Time + Miss Rate * Miss Penalty We do better by decreasing any of the three quantities in the right hand side Unfortunately, these always involve trade-offs And, they are just an approximation of the effect on the execution time.
17 Complications... What exactly is a miss in speculative execution? How much does a miss cost under dynamic scheduling? Under multi-threading? If we allow a miss over miss?
18 Example Effective Access time for 16KB+16KB split cache Miss per 1000 instructions: 3.82 (instr. Cache), 40.9 (data cache) Mix: 36% of instructions are load/store Hit: 1 cycle, Miss: 100 cycles
19 Example Instruction miss rate: Data miss rate: /0.36=0.114 Percent of references that are instructions 100/136 = 74% Avg. Mem. Acc. Time: 74%*( *100) + 26%*( *100)
20 Miss Penalty under Dynamic Execution Is it the full latency? Is it the exposed latency? What about the latency due to contention by speculative instructions Any form of latency has the same problem Simple (simplistic) solution Find which instruction did not commit in time Attribute the stall to it
21 How to Increase Performance Larger cache Obviously reduces misses Increases cost, power Increases hit time Larger block Decreases compulsory (initial) misses Better exploits spatial locality Decreases number of blocks Increases miss penalty, bus traffic
22 How to Increase Performance Higher associativity Reduce conflict misses Increase hit time, silicon area, power consumption Multilevel cache Reduces hit time and miss penalty Increase cost and power Give priority to read misses Let reads jump the queue Overlap TLB and cache read...
23 TLB and Cache Cache understands physical addresses We have to consult the TLB to convert a virtual address to physical How about if we overlap the two? When is such a thing possible?
24 What is the Trick? TLB is a small cache that associates a (virtual) page number to (physical) frame number The page offset and the frame offset are the same and need no translation If the page offset is enough to index the set in the cache We do not need any bit from the frame number We can retrieve the set while the TLB does the translation When the TLB is done we compare the tags
25 This is the Trick Cache Index Tag Mapping Physical TLB Block Offset Virtual Set Block Address Match? Block
26 Disadvantages Cache size = Page size * associativity Usually we want a medium size page and a large cache. There are ways to deviate from this rule with extra hardware.
27 11 Advanced Optimizations We organize them in 5 groups Reduce Hit Time Increase bandwidth Reduce Miss Penalty Reduce Miss Rate Prefetching
28 Small is Beautiful Small and simple caches are faster Reduce size Reduce associativity Rely on L2 cache L1 cache sizes do not change much with technology
29 Way Prediction Tag comparison costly Store, along with the data, prediction bits for the next access The index is augmented by the predictor bits The data is sent to the CPU while we check the tags If the tags do not match, we send an Oops! Pentium 4 uses it
30 Example Hit rate 85% (typical) Hit: 1 cycle Miss: 3 cycle Without: 2 cycle.85*1+.15*3 = 1.3 < 2
31 Trace Cache Seems so devious... It is almost Harry-Potterish The cache contains dynamic trace (sequence of instructions as they are executed) Branch prediction is folded into the cache Pentium 4 uses it for its micro-operations cache
32 Cache Pipeline Most caches have more than 1 cycle Pipelining is tried and true Embed the cache pipeline into the CPU pipeline Pentium 4 takes 4 cycles (despite way prediction, etc)
33 Non-Blocking Miss Allow hit under miss Or multiple miss under miss FP intensive programs benefit from multiple miss under miss Dynamic execution benefits from it
34 Multi-Banked Cache Multi-bank (aka interleaved) memories were always popular Suits best for L2 cache Allows each bank to be smaller Allows each to work independently Increase bandwidth AMD Opteron has 2 banks, Sun T1 has 4
35 Critical Word First Critical (the one we asked) first If the block is transmitted in multiple cycles Early restart Do not wait for the whole block to arrive
36 Merging Write Buffer A write miss might be in the (victim) write-back buffer Similar idea to victim buffer (virtual memory)
37 Compiler Re-ordering Try to access arrays the way they are in the cache The magic behind fast matrix multiplication (blocking) Break the matrix into pieces that are comfortably fitting in the cache
38 Prefetching Hardware If two misses in the same page, prefetch Most prefetch instructions from the instruction cache Opteron and P-4 prefetch data too. Compiler Insert special prefetch instructions Needs non-blocking cache Increase traffic
39 Memory Technologies SRAM Static RAM Big transistors optimized for speed DRAM Cheap capacitors Optimized for density Reads destructively Needs refreshing
40 DRAMs Rule the Desktop Memory size and CPU speed grow at the same speed It always took about a second to scan the whole memory. Through most of their history increased 4-fold every three years Now increase 4-fold every four years. Speed increases about 5% per year.
41 Data-I/O DRAM Organization Column Decoder Memory array Sense Amps Address Buffer Row Decoder
42 RAS and CAS Row Address Strobe Column Address Strobe First goes RAS Whole row is copied out CAS selects the bit or bits
43 Improving RAM Fast Page Mode Increment CAS several times with the same RAS Make use of the modularity available Memory is organized in blocks 1-4 Mbits each for manufacturing reasons. Naturally interleaved
44 SDRAM, DDR Synchronous DRAM Shares the clock with the CPU No synchronization overhead in communication. DDR Double Data Rate Front end of memory is fast Heavily interleaved back-ends
45 Virtual Memory Expand RAM to disk (not that useful today) Allow multiple processes to share the physical memory Allow arbitrary mapping File I/O, shared memory, dynamic libraries, etc Critical to security
46 Security Virtual memory handled through the kernel Page tables can be manipulated only in monitor mode A process does not have the means to access the space of another process
47 However... A kernel is a huge program Huge programs have bugs Most bugs cause the system to crush A few of them are exploits.
48 A better way... Use virtual machines Much smaller Fewer bugs One extra level of protection Vms have other advantages as well Share a computer Cloud computing Can migrate a live program to different H/W
49 VMM Virtual Machine Monitors (hypervisors) Allow a guest OS to run efficiently as a process on a host OS User level code runs natively System calls are trapped and emulated VMM mediates between the guest OS and the H/W on the host Network connection, USB device management, etc Filesystem and state.
50 ISA Support An ISA supporting virtualization is called virtualizable Virtualization is a new idea (geologically speaking) Attempts by guest to execute privileged instructions result in traps The problem is that not all relevant instructions result in traps And handling virtual memory is tricky
51 Virtual Memory for Virtual Machines Normally we distinguish between Virtual memory Physical memory Now we have an intermediate level Real memory Guest OS maps virtual to real VMM maps real to physical
52 Shadow Page Table Two step process Too slow Interferes with h/w assisted virtual memory Shadow tables do it in one shot But this means guest OS cannot manage the page tables of its own processes TLBs must have PID tags and/or be flushed on context switch IBM ('70s) had one more level of indirection
53 Virtualized I/O There are far too many devices and drivers to handle I/O happens with the mediation of H/W, so it would be too slow to handle with emulation Solution: generic devices for each type of I/O Network: time shared or NATed.
54 Example: Xen Instead of trying to emulate everything just to trick the guest Allow small modifications to the guest to keep things simple. It is called paravirtualization and Xen is the most popular example (VMWare is another)
55 The Tricks of Xen Augment kernel E.g. 1% of Linux is modified Uses the protection levels of x86 Xen at level 0 (highest), guest OS at 1, apps at 3 Wraps I/O devices in special virtual machines (driver domains) and talks to them with page remapping
56 VMM and ISA Designers of ISAs were cheapos! To save a couple of bits had the same instruction behave differently in monitor mode and user mode POPF (pop flags) ignores privileged flags in user mode 70's technology IBM-370 is still the golden standard.
57 Cache and I/O Should we do I/O with the cache? Get the data immediately with perfect consistency Slows down processor, infects cache I/O directly with memory Most popular Works well with write through (no stale data) Or can mark pages as non-cacheable Or flush cache Or send cache invalidations
58 Fallacy: Predicting cache performance Miss rates vary by a factor of 10,000 or more Tremendous difference between instruction and data miss rates
59 RAMBUS promises RAMBUS had a bandwidth 8 times higher than competition Performance was only 0-15% faster overall Cost was 2-3 times higher (20% larger die) The reason was that most of the traffic is at the L2 cache.
LECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationMEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming
MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationCOMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy
COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationPollard s Attempt to Explain Cache Memory
Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationVirtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationVirtualization and memory hierarchy
Virtualization and memory hierarchy Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationAdvanced Computer Architecture- 06CS81-Memory Hierarchy Design
Advanced Computer Architecture- 06CS81-Memory Hierarchy Design AMAT and Processor Performance AMAT = Average Memory Access Time Miss-oriented Approach to Memory Access CPIExec includes ALU and Memory instructions
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 5 Large
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationLecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff
Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationMemory systems. Memory technology. Memory technology Memory hierarchy Virtual memory
Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationTutorial 11. Final Exam Review
Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache
More informationMemory Hierarchy Design
Memory Hierarchy Design Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCache Optimisation. sometime he thought that there must be a better way
Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCS 654 Computer Architecture Summary. Peter Kemper
CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationMain Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be
More informationECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU
More informationCS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook
CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array
More informationChapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.
Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More information24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.
24-vm.txt Mon Nov 21 22:13:36 2011 1 Notes on Virtual Machines 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Tannenbaum, 3.2 Barham, et al., "Xen and the art of virtualization,"
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More informationBackground. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.
Memory Hierarchies Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory Mem Element Background Size Speed Price Register small 1-5ns high?? SRAM medium 5-25ns $100-250 DRAM large
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationMemories. CPE480/CS480/EE480, Spring Hank Dietz.
Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More information