Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!
|
|
- Derick Jenkins
- 5 years ago
- Views:
Transcription
1 Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio by Traslatio Lookaside Buffer (TLB) Demad pagig (OS support Performace 100 -Memory Performace Gap: (grows 50% / year) 10 DRAM" 1 Memory Hierarchy µproc 60%/yr. (2X/1.5yr) CPU Moore s Law " DRAM 9%/yr. (2X/10 yrs) Reaso: Mai memory too slow Solutio: Memory Time MicroComputer Egieerig MemoryAcceleratio slide 1 MicroComputer Egieerig MemoryAcceleratio slide 2 MicroComputer Egieerig MemoryAcceleratio slide 3 MicroComputer Egieerig MemoryAcceleratio slide 4 DRAMs over Time DRAM Geeratio Mb 4 Mb Die Sie (mm2) Memory Area (mm2) st Ge. Sample Memory Sie Mb 1 Gb Mb 64 Mb Memory Cell Area (µm2) (from Kauhiro Sakashita, Mitsubishi) MicroComputer Egieerig MemoryAcceleratio slide 5 MicroComputer Egieerig MemoryAcceleratio slide 6 1 1
2 Observatios: Two Differet Types of Locality: Temporal Locality (Locality i Time): If a item is refereced, it will ted to be refereced agai soo. Spatial Locality (Locality i Space): If a item is refereced, items whose addresses are close by ted to be refereced soo. By takig advatage of the priciple of locality: Preset the user with as much memory as is available i the cheapest techology. Provide access at the speed offered by the fastest techology. DRAM (mai memory) is slow but cheap ad dese: Good choice for presetig the user with a BIG memory system SRAM (cache) is fast but expesive ad ot very dese: Good choice for providig the user FAST access time. Memory Hierarchy of a Moder Computer By takig advatage of the priciple of locality: Preset the user with as much memory as is available i the cheapest techology. Provide access at the speed offered by the fastest techology. Datapath Cotrol Registers O-Chip Secod Level (SRAM) Mai Memory (DRAM) Speed : 1s Xs 10s 100s Sie (bytes): K K..M G T Secodary Storage (Disk) 10 ms MicroComputer Egieerig MemoryAcceleratio slide 7 MicroComputer Egieerig MemoryAcceleratio slide 8 Levels of the Memory Hierarchy The Art of Memory System Desig Reg:s Istr. Operads" Stagig Xfer Uit Prog 4/8 bytes faster" Workload or Bechmark programs Optimie the memory system orgaiatio to miimie the average memory access time for typical workloads Blocks" Memory s" cache ctl bytes OS 512-4K bytes referece stream <op,addr>, <op,addr>,<op,addr>,<op,addr>,... op: i-fetch, read, write Disk Files" Tape (backup) user/operator Mbytes Larger" Memory $ MEM MicroComputer Egieerig MemoryAcceleratio slide 9 MicroComputer Egieerig MemoryAcceleratio slide 10 Istr MIPS PIPELINE -bit Virtual Address Address Mappig Data User process 2 ruig 24-bit (out of ) Physical Address Here we eed page table 2 for address mappig User Memory Kerel Memory Traslatio Lookaside Buffer (TLB) MIPS PIPELINE Virtual Address O TLB hit, the -bit virtual address is traslated ito a 24-bit physical address by hardware We ever call the Kerel D R Physical Addr [:10] 24 User Memory Kerel Memory MicroComputer Egieerig MemoryAcceleratio slide 11 MicroComputer Egieerig MemoryAcceleratio slide
3 So Far, NO GOOD IM DE EX DM STALL 60 s, RAM IM DE EX DM Let s put i a 60 s, RAM Critical path 20 s TLB 24-bit Physical Address Critical path 20 s TLB MIPS pipe is clocked at 50 MH But RAM eeds 3 cycles to read/write STALLS the pipe 5s Kerel Memory MIPS pipe is clocked at 50 MH 5s A cache Hit ever STALLS the pipe 15s Kerel Memory MicroComputer Egieerig MemoryAcceleratio slide 13 MicroComputer Egieerig MemoryAcceleratio slide bit PA Fully Associative Tag PA[:2] Check all lies Hit if PA[:2]=TAG Data Word PA[1:0] Fully Associative Very good hit ratio (r hits/r accesses) But Too expesive checkig all 2 lies cocurretly A comparator for each lie A lot of hardware all 2 lies 2 * 4=256kb MicroComputer Egieerig MemoryAcceleratio slide 15 MicroComputer Egieerig MemoryAcceleratio slide 24-bit PA 1 lie Direct Mapped Tag PA[:18] Selects ONE cache lie Hit if PA[:18]=TAG Data Word PA[1:0] Direct Mapped Not so good hit ratio Each lie ca hold oly certai addresses, less freedom But Much cheaper to implemet, oly oe lie checked Oly oe comparator 2 * 4=256kb MicroComputer Egieerig MemoryAcceleratio slide 17 MicroComputer Egieerig MemoryAcceleratio slide
4 24-bit PA 2 lies Set Associative Tag PA[:18-] Selects ONE set of lies, sie 2 Hit if PA[:18-]=TAG i the set Data Word PA[1:0] Set Associative Quite good hit ratio The umber (set) of differet addresses for each lie is greater tha that of a directly mapped cache The larger Z the better hit ratio, but more expesive 2 comparators Cost-performace tradeoff 2 -way set associative 2 * 4=256kb MicroComputer Egieerig MemoryAcceleratio slide 19 MicroComputer Egieerig MemoryAcceleratio slide 20 Miss A Miss should be hadled by the hardware If hadled by the OS it would be very slow (>>60 s) O a Miss Stall the pipe Read i ew data to cache Release the pipe, ow we get a Hit Extreme Example: sigle big lie Tag Data Byte 3 Byte 2 Byte 1 Byte 0 0 Sie = 4 bytes Block Sie = 4 bytes Oly ONE etry i the cache If a item is accessed, likely that it will be accessed agai soo But it is ulikely that it will be accessed agai immediately The ext access will likely to be a miss agai Cotiually loadig data ito the cache but discard (force out) them before they are used agai Worst ightmare of a cache desiger: Pig Pog Effect Coflict Misses are misses caused by: Differet memory locatios mapped to the same cache idex Solutio 1: make the cache sie bigger Solutio 2: Multiple etries for the same Idex MicroComputer Egieerig MemoryAcceleratio slide 21 MicroComputer Egieerig MemoryAcceleratio slide 22 A Summary o Sources of Misses Compulsory (cold start or process migratio, first referece): first access to a block Cold fact of life: ot a whole lot you ca do about it Note: If you are goig to ru billios of istructios, Compulsory Misses are isigificat Coflict (collisio): Multiple memory locatios mapped to the same cache locatio Solutio 1: icrease cache sie Solutio 2: icrease associativity Capacity: caot cotai all blocks access by the program Solutio: icrease cache sie Ivalidatio: other process (e.g., I/O) updates memory Block Sie Tradeoff I geeral, larger block sie take advatage of spatial locality BUT: Larger block sie meas larger miss pealty: Takes loger time to fill up the block If block sie is too big relative to cache sie, miss rate will go up: Too few cache blocks I gereral, Average Access Time: Time Av = Hit Time x (1 - Miss Rate) + Miss Pealty x Miss Rate Miss Pealty Miss Rate Exploits Spatial Locality" Fewer blocks: " compromises" temporal locality" Average Access Time Icreased Miss Pealty" & Miss Rate" Block Sie Block Sie Block Sie MicroComputer Egieerig MemoryAcceleratio slide MicroComputer Egieerig MemoryAcceleratio slide
5 Hierarchy Small, fast ad expesive VS Slow big ad iexpesive Cotais copies What if copies are chaged? INCONSISTENCY 256kb RAM Mb I D HD 2 Gb Miss, Write Through/Back To avoid INCONSISTENCY we ca Write Through Always write data to RAM Not so good performace (write 60s) Therefore, WT always combied with write buffers so that do t wait for lower level memory Write Back Write data to memory oly whe cache lie is replaced We eed a Dirty bit (D) for each cache lie D-bit set by hardware o write operatio Much better performace, but more complex hardware MicroComputer Egieerig MemoryAcceleratio slide 25 MicroComputer Egieerig MemoryAcceleratio slide 26 Write Buffer for Write Through Write Buffer Saturatio Write Buffer DRAM A Write Buffer is eeded betwee the ad Memory : writes data ito the cache ad the write buffer Memory cotroller: write cotets of the buffer to memory Write buffer is just a FIFO: Typical umber of etries: 4 Works fie if: Store frequecy (w.r.t. time) << 1 / DRAM write cycle Memory system desiger s ightmare: Store frequecy (w.r.t. time) -> 1 / DRAM write cycle Write buffer saturatio Store frequecy (w.r.t. time) -> 1 / DRAM write cycle If this coditio exist for a log period of time (CPU cycle time too quick ad/or too may store istructios i a row): Store buffer will overflow o matter how big you make it The CPU Cycle Time <= DRAM Write Cycle Time Solutio for write buffer saturatio: Use a write back cache Istall a secod level (L2) cache: Write Buffer L2 DRAM MicroComputer Egieerig MemoryAcceleratio slide 27 MicroComputer Egieerig MemoryAcceleratio slide 28 Replacemet Strategy i Hardware Sequetial RAM Access A Direct mapped cache selects ONE cache lie No replacemet strategy Set/Fully Associative selects a set of lies. Strategy to select oe lie Radom, Roud Robi Not so good, spoils the idea with Associative Least Recetly Used, (move to top strategy) Good, but complex ad costly for large Z We could use a approximatio (heuristic) Not Recetly Used, (replace if ot used for a certai time) MicroComputer Egieerig MemoryAcceleratio slide 29 Accessig sequetial words from RAM is faster tha accessig RAM radomly Oly lower address bits will chage How could we exploit this? Let each Lie hold a Array of Data words Give the Base address ad array sie Burst Read the array from RAM to Burst Write the array from to RAM We might deploy early restart, i.e. Fill the cache lie startig with the word causig the miss. Oce that word is filled, the pipe ca be released ad we ca fill the rest of the lie i the backgroud. MicroComputer Egieerig MemoryAcceleratio slide
6 System Startup, RESET Radom Cotets We might read icorrect values from the We eed to kow if the cotets is Valid, a V- bit for each cache lie Let the hardware clear all V-bits o RESET Set the V-bit ad clear the D-bit for the lie copied from RAM to 24-bit PA V D Fial Model Tag PA[:18-] Data Word PA[1+j:0] 2+j 1+j 0 Selects ONE set of lies, sie 2 Hit if (PA[:18-]=TAG) ad V i set Set D bit if Write 2 lies... MicroComputer Egieerig MemoryAcceleratio slide 31 MicroComputer Egieerig MemoryAcceleratio slide Traslatio Lookaside Buffer (TLB) MIPS PIPELINE Virtual Address O TLB hit, the -bit virtual address is traslated ito a 24-bit physical address by hardware We ever call the Kerel D R Physical Addr [:10] 24 User Memory Kerel Memory Physical Address It takes a extra memory access to traslate VA to PA This makes cache access very expesive, ad this is the "iermost loop" that you wat to go as fast as possible data CPU Tras- latio hit VA PA miss Mai Memory MicroComputer Egieerig MemoryAcceleratio slide 33 MicroComputer Egieerig MemoryAcceleratio slide 34 Virtual Address Why access cache with PA at all? VA caches have a problem syoym / alias problem: two differet virtual addresses map to same physical address => two differet cache etries holdig data for the same physical address for update: must update all cache etries with same physical address or memory becomes icosistet determiig this requires sigificat hardware, essetially a associative lookup o the physical address tags to see if you have multiple hits; or software eforced alias boudary: same lsb of VA &PA withi cache sie, data CPU hit miss Tras- latio VA VA PA Mai Memory Traslatio Look-Aside Buffers Just like ay other cache, the TLB ca be orgaied as fully associative, set associative, or direct mapped TLBs are usually small, typically ot more tha 256 etries eve o high ed machies. This permits fully associative lookup o these machies. Most mid-rage machies use small -way set associative orgaiatios. Traslatio with a TLB CPU hit VA PA miss TLB Lookup miss OS table hit data Mai Memory MicroComputer Egieerig MemoryAcceleratio slide 35 MicroComputer Egieerig MemoryAcceleratio slide
7 Reducig Traslatio Time Overlapped & TLB Access Machies with TLBs go oe step further to reduce cycles/cache access They overlap the cache access with the TLB access Works because high order bits of the VA are used to look i the TLB while low order bits are used as idex ito cache PA TLB Hit/ Miss assoc lookup idex page # disp = 4 bytes 1 K PA Data Hit/ Miss IF cache hit AND (cache tag = PA) the deliver data to CPU ELSE IF [cache miss OR (cache tag <> PA)] ad TLB hit THEN access memory with the PA from the TLB ELSE do stadard VA traslatio MicroComputer Egieerig MemoryAcceleratio slide 37 MicroComputer Egieerig MemoryAcceleratio slide 38 Problems With Overlapped TLB Access Overlapped access oly works as log as the address bits used to idex ito the cache do ot chage as the result of VA traslatio This usually limits thigs to small caches, large page sies, or high -way set associative caches if you wat a large cache Example: suppose everythig the same except that the cache is icreased to 8 K bytes istead of 4 K: 11 2 cache idex virt page # disp This bit is chaged by VA traslatio, but is eeded for cache lookup Summary:, TLB, Virtual Memory s, TLBs, Virtual Memory all uderstood by examiig how they deal with 4 questios: Where ca a data be placed? How is data foud? What data is replaced o miss? How are writes hadled (cocistecy problem)? s speed up average access time Solutios: go to 8K byte page sies; go to 2 way set 10 associative cache; or SW guaratee VA[13]=PA[13] 4 4 1K 2 way set assoc cache tables map virtual address to physical address TLBs are importat for fast traslatio TLB misses are sigificat i processor performace: (some systems ca t access all of 2d level cache without TLB misses) MicroComputer Egieerig MemoryAcceleratio slide 39 MicroComputer Egieerig MemoryAcceleratio slide 40 Summary: Memory Hierachy Virtual memory was cotroversial at the time: ca SW automatically maage 64KB across may programs? 1000X DRAM growth removed the cotroversy Today VM allows may processes to share sigle memory without havig to swap all processes to disk; VM protectio is more importat tha memory space icrease Today CPU time is a fuctio of (ops, cache misses) vs. just of(ops): What does this mea to Compilers, Data structures, Algorithms? Vtue performace aalyer, cache misses. MicroComputer Egieerig MemoryAcceleratio slide
Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationModern Computer Architecture
Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap
More informationCS61C : Machine Structures
CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationTime. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?
CS5 / EE5365 cache. Where Have We Been? Multi-Cycle Control Finite State Machines Microsequencing (Microprogramming) Exceptions Pipelining Datapath Making use of multi-cycle datapath Pipelining Control
More informationCourse Site: Copyright 2012, Elsevier Inc. All rights reserved.
Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationCS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.
CS152 Computer Architecture and Engineering Caches and Virtual Memory October 31, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides http//www-inst.eecs.berkeley.edu/~cs152/ cs 152 L1
More informationTime. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)
Recap Who Cares About the Hierarchy? -DRAM Gap (latency) CS52 Computer Architecture and Engineering s and Virtual October 3, 997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides http//www-inst.eecs.berkeley.edu/~cs52/
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationComputer Architecture ELEC3441
CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationCPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner
CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:
More informationHandout 4 Memory Hierarchy
Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCPS 104 Computer Organization and Programming Lecture 20: Virtual Memory
CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationECE468 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively
More informationECE4680 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationMemory Technologies. Technology Trends
. 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM
More informationThe University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.
Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationCS152 Computer Architecture and Engineering Lecture 18: Virtual Memory
CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory March 22, 1995 Dave Patterson (patterson@cs) and Shing Kong (shingkong@engsuncom) Slides available on http://httpcsberkeleyedu/~patterson
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationCISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review
CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David
More informationCS152 Computer Architecture and Engineering Lecture 17: Cache System
CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson
More informationRegisters. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH
PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationChapter 4 The Datapath
The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationLecture 28: Data Link Layer
Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig
More information1&1 Next Level Hosting
1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationSwitching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1
Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)
More informationCPE 631 Lecture 04: CPU Caches
Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR
More informationPage 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "
Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationEECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements
EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall
More informationECE468 Computer Organization and Architecture. Memory Hierarchy
ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationMemory Hierarchy Review
EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual January 27 th, 20 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More informationSD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.
SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationThreads and Concurrency in Java: Part 1
Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationThreads and Concurrency in Java: Part 1
Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationVirtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.
CSE420 Virtual Memory Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) Virtual Memory Virtual memory first used to relive programmers from the burden
More informationPerformance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.
Since 1980, CPU has outpaced DRAM... EEL 5764: Graduate Computer Architecture Appendix C Hierarchy Review Ann Gordon-Ross Electrical and Computer Engineering University of Florida http://www.ann.ece.ufl.edu/
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationCS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts
CS 111 Gree: Program Desig I Lecture 27: Speed (cot.); partig thoughts By Nascarkig - Ow work, CC BY-SA 4.0, https://commos.wikimedia.org/w/idex.php?curid=38671041 Robert H. Sloa (CS) & Rachel Poretsky
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationECE ECE4680
ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationAnnouncements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components
Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,
More informationCS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)
CS6C Review of Cache/VM/TLB Lecture 26 April 3, 999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs6c/schedule.html Outline Review Pipelining Review Cache/VM/TLB Review
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationCS2410 Computer Architecture. Flynn s Taxonomy
CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)
More informationMemory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationECE4050 Data Structures and Algorithms. Lecture 6: Searching
ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More information