The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.
|
|
- Debra Hunter
- 5 years ago
- Views:
Transcription
1 Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive per bit tha slower memory Solutio: orgaize memory system ito a hierarchy Etire addressable memory space available i largest, slowest memory Icremetally smaller ad faster memories, each cotaiig a subset of the memory below it, proceed i steps up toward the processor Temporal ad spatial locality isures that early all refereces ca be foud i smaller memories Gives the allusio of a large, fast memory beig preseted to the processor Itroductio 2 Chapter 2 Istructios: Laguage of the Computer 1
2 Memory Hierarchy Memory Performace Gap 3 4Itroductio Chapter 2 Istructios: Laguage of the Computer 2
3 Memory Hierarchy Desig Memory hierarchy desig becomes more crucial with recet multi-core processors: Aggregate peak badwidth grows with # cores: Itel Core i7 ca geerate two refereces per core per clock Four cores ad 3.2 GHz clock 25.6 billio 64-bit data refereces/secod billio 128-bit istructio refereces/secod = GB/s! DRAM badwidth is oly 8% of this (34.1 GB/s) Requires: Multi-port, pipelied caches Two levels of cache per core Shared third-level cache o chip Performace ad Power High-ed microprocessors have >10 MB o-chip cache Cosumes large amout of area ad power budget 5 6Itroductio Chapter 2 Istructios: Laguage of the Computer 3
4 Memory Hierarchy Basics Whe a word is ot foud i the cache, a miss occurs: Fetch word from lower level i hierarchy, requirig a higher latecy referece Lower level may be aother cache or the mai memory Also fetch the other words cotaied withi the block Takes advatage of spatial locality Place block ito cache i ay locatio withi its set, determied by address block address MOD umber of sets i cache Memory Hierarchy Basics sets => -way set associative Direct-mapped cache => oe block per set Fully associative => oe set Writig to cache: two strategies Write-through Immediately update lower levels of hierarchy Write-back Oly update lower levels of hierarchy whe a updated block is replaced Both strategies use write buffer to make writes asychroous 7 8Itroductio Chapter 2 Istructios: Laguage of the Computer 4
5 Memory Hierarchy Basics Miss rate Fractio of cache access that result i a miss Itroductio Causes of misses Compulsory First referece to a block Capacity Blocks discarded ad later retrieved Coflict Program makes repeated refereces to multiple addresses from differet blocks that map to the same locatio i the cache 9 Memory Hierarchy Basics Itroductio Speculative ad multithreaded processors may execute other istructios durig a miss Reduces performace impact of misses 10 Chapter 2 Istructios: Laguage of the Computer 5
6 Memory Hierarchy Basics Six basic cache optimizatios: Larger block size Reduces compulsory misses Icreases capacity ad coflict misses, icreases miss pealty Larger total cache capacity to reduce miss rate Icreases hit time, icreases power cosumptio Higher associativity Reduces coflict misses Icreases hit time, icreases power cosumptio Higher umber of cache levels Reduces overall memory access time Givig priority to read misses over writes Reduces miss pealty Avoidig address traslatio i cache idexig Reduces hit time Itroductio 11 Memory Techology ad Optimizatios Performace metrics Latecy is cocer of cache Badwidth is cocer of multiprocessors ad I/O Access time Time betwee read request ad whe desired word arrives Cycle time Miimum time betwee urelated requests to memory Memory Techology ad Optimizatios SRAM memory has low latecy, use for cache Orgaize DRAM chips ito may baks for high badwidth, use for mai memory 12 Chapter 2 Istructios: Laguage of the Computer 6
7 Memory Techology SRAM Requires low power to retai bit Requires 6 trasistors/bit DRAM Must be re-writte after beig read Must also be periodically refreshed Every ~ 8 ms (roughly 5% of time) Each row ca be refreshed simultaeously Oe trasistor/bit Address lies are multiplexed: Upper half of address: row access strobe (RAS) Lower half of address: colum access strobe (CAS) Memory Techology ad Optimizatios 13 Iteral Orgaizatio of DRAM Memory Techology ad Optimizatios 14 Chapter 2 Istructios: Laguage of the Computer 7
8 Memory Techology Amdahl: Memory capacity should grow liearly with processor speed Ufortuately, memory capacity ad speed has ot kept pace with processors Some optimizatios: Multiple accesses to same row Sychroous DRAM Added clock to DRAM iterface Burst mode with critical word first Wider iterfaces Double data rate (DDR) Multiple baks o each DRAM device Memory Techology ad Optimizatios 15 Memory Optimizatios Memory Techology ad Optimizatios 16 Chapter 2 Istructios: Laguage of the Computer 8
9 Memory Optimizatios Memory Techology ad Optimizatios 17 Memory Optimizatios DDR: DDR2 Lower power (2.5 V -> 1.8 V) Higher clock rates (266 MHz, 333 MHz, 400 MHz) DDR3 1.5 V 800 MHz DDR V 1333 MHz Memory Techology ad Optimizatios GDDR5 is graphics memory based o DDR3 18 Chapter 2 Istructios: Laguage of the Computer 9
10 Memory Optimizatios Reducig power i SDRAMs: Lower voltage Low power mode (igores clock, cotiues to refresh) Graphics memory: Achieve 2-5 X badwidth per DRAM vs. DDR3 Wider iterfaces (32 vs. 16 bit) Higher clock rate Possible because they are attached via solderig istead of socketted DIMM modules Memory Techology ad Optimizatios 19 Memory Power Cosumptio Memory Techology ad Optimizatios 20 Chapter 2 Istructios: Laguage of the Computer 10
11 Stacked/Embedded DRAMs Stacked DRAMs i same package as processor High Badwidth Memory (HBM) Memory Techology ad Optimizatios 21 Flash Memory Type of EEPROM Types: NAND (deser) ad NOR (faster) NAND Flash: Reads are sequetial, reads etire page (.5 to 4 KiB) 25 us for first byte, 40 MiB/s for subsequet bytes SDRAM: 40 s for first byte, 4.8 GB/s for subsequet bytes 2 KiB trasfer: 75 us vs 500 s for SDRAM, 150X slower 300 to 500X faster tha magetic disk Memory Techology ad Optimizatios 22 Chapter 2 Istructios: Laguage of the Computer 11
12 NAND Flash Memory Must be erased (i blocks) before beig overwritte Novolatile, ca use as little as zero power Limited umber of write cycles (~100,000) $2/GiB, compared to $20-40/GiB for SDRAM ad $0.09 GiB for magetic disk Memory Techology ad Optimizatios Phase-Chage/Memrister Memory Possibly 10X improvemet i write performace ad 2X improvemet i read performace 23 Memory Depedability Memory is susceptible to cosmic rays Soft errors: dyamic errors Detected ad fixed by error correctig codes (ECC) Hard errors: permaet errors Use spare rows to replace defective rows Chipkill: a RAID-like error recovery techique Memory Techology ad Optimizatios 24 Chapter 2 Istructios: Laguage of the Computer 12
13 Advaced Optimizatios Reduce hit time Small ad simple first-level caches Way predictio Icrease badwidth Pipelied caches, multibaked caches, o-blockig caches Reduce miss pealty Critical word first, mergig write buffers Reduce miss rate Compiler optimizatios Reduce miss pealty or miss rate via parallelizatio Hardware or compiler prefetchig Advaced Optimizatios 25 L1 Size ad Associativity Advaced Optimizatios Access time vs. size ad associativity 26 Chapter 2 Istructios: Laguage of the Computer 13
14 L1 Size ad Associativity Advaced Optimizatios Eergy per read vs. size ad associativity 27 Way Predictio To improve hit time, predict the way to pre-set mux Mis-predictio gives loger hit time Predictio accuracy > 90% for two-way > 80% for four-way I-cache has better accuracy tha D-cache First used o MIPS R10000 i mid-90s Used o ARM Cortex-A8 Exted to predict block as well Way selectio Icreases mis-predictio pealty Advaced Optimizatios 28 Chapter 2 Istructios: Laguage of the Computer 14
15 Pipelied Caches Pipelie cache access to improve badwidth Examples: Petium: 1 cycle Petium Pro Petium III: 2 cycles Petium 4 Core i7: 4 cycles Icreases brach mis-predictio pealty Makes it easier to icrease associativity Advaced Optimizatios 29 Multibaked Caches Orgaize cache as idepedet baks to support simultaeous access ARM Cortex-A8 supports 1-4 baks for L2 Itel i7 supports 4 baks for L1 ad 8 baks for L2 Advaced Optimizatios Iterleave baks accordig to block address 30 Chapter 2 Istructios: Laguage of the Computer 15
16 Noblockig Caches Allow hits before previous misses complete Hit uder miss Hit uder multiple miss L2 must support this I geeral, processors ca hide L1 miss pealty but ot L2 miss pealty Advaced Optimizatios 31 Critical Word First, Early Restart Critical word first Request missed word from memory first Sed it to the processor as soo as it arrives Early restart Request words i ormal order Sed missed work to the processor as soo as it arrives Advaced Optimizatios Effectiveess of these strategies depeds o block size ad likelihood of aother access to the portio of the block that has ot yet bee fetched 32 Chapter 2 Istructios: Laguage of the Computer 16
17 Mergig Write Buffer Whe storig to a block that is already pedig i the write buffer, update write buffer Reduces stalls due to full write buffer Do ot apply to I/O addresses Advaced Optimizatios No write bufferig Write bufferig 33 Compiler Optimizatios Loop Iterchage Swap ested loops to access memory i sequetial order Blockig Istead of accessig etire rows or colums, subdivide matrices ito blocks Requires more memory accesses but improves locality of accesses Advaced Optimizatios 34 Chapter 2 Istructios: Laguage of the Computer 17
18 Blockig for (i = 0; i < N; i = i + 1) }; for (j = 0; j < N; j = j + 1) { r = 0; for (k = 0; k < N; k = k + 1) r = r + y[i][k]*z[k][j]; x[i][j] = r; 35 Blockig for (jj = 0; jj < N; jj = jj + B) }; for (kk = 0; kk < N; kk = kk + B) for (i = 0; i < N; i = i + 1) for (j = jj; j < mi(jj + B,N); j = j + 1) { r = 0; for (k = kk; k < mi(kk + B,N); k = k + 1) r = r + y[i][k]*z[k][j]; x[i][j] = x[i][j] + r; 36 Chapter 2 Istructios: Laguage of the Computer 18
19 Hardware Prefetchig Fetch two blocks o miss (iclude ext sequetial block) Advaced Optimizatios Petium 4 Pre-fetchig 37 Compiler Prefetchig Isert prefetch istructios before data is eeded No-faultig: prefetch does t cause exceptios Advaced Optimizatios Register prefetch Loads data ito register Cache prefetch Loads data ito cache Combie with loop urollig ad software pipeliig 38 Chapter 2 Istructios: Laguage of the Computer 19
20 Use HBM to Exted Hierarchy 128 MiB to 1 GiB Smaller blocks require substatial tag storage Larger blocks are potetially iefficiet Advaced Optimizatios Oe approach (L-H): Each SDRAM row is a block idex Each row cotais set of tags ad 29 data segmets 29-set associative Hit requires a CAS 39 Use HBM to Exted Hierarchy Aother approach (Alloy cache): Mold tag ad data together Use direct mapped Advaced Optimizatios Both schemes require two DRAM accesses for misses Two solutios: Use map to keep track of blocks Predict likely misses 40 Chapter 2 Istructios: Laguage of the Computer 20
21 Use HBM to Exted Hierarchy Advaced Optimizatios 41 Summary Advaced Optimizatios 42 Chapter 2 Istructios: Laguage of the Computer 21
22 Virtual Memory ad Virtual Machies Protectio via virtual memory Keeps processes i their ow memory space Role of architecture Provide user mode ad supervisor mode Protect certai aspects of CPU state Provide mechaisms for switchig betwee user mode ad supervisor mode Provide mechaisms to limit memory accesses Provide TLB to traslate addresses Virtual Memory ad Virtual Machies 43 Virtual Machies Supports isolatio ad security Sharig a computer amog may urelated users Eabled by raw speed of processors, makig the overhead more acceptable Allows differet ISAs ad operatig systems to be preseted to user programs System Virtual Machies SVM software is called virtual machie moitor or hypervisor Idividual virtual machies ru uder the moitor are called guest VMs Virtual Memory ad Virtual Machies 44 Chapter 2 Istructios: Laguage of the Computer 22
23 Requiremets of VMM Guest software should: Behave o as if ruig o ative hardware Not be able to chage allocatio of real system resources VMM should be able to cotext switch guests Hardware must allow: System ad use processor modes Privileged subset of istructios for allocatig system resources Virtual Memory ad Virtual Machies 45 Impact of VMs o Virtual Memory Each guest OS maitais its ow set of page tables VMM adds a level of memory betwee physical ad virtual memory called real memory VMM maitais shadow page table that maps guest virtual addresses to physical addresses Requires VMM to detect guest s chages to its ow page table Occurs aturally if accessig the page table poiter is a privileged operatio Virtual Memory ad Virtual Machies 46 Chapter 2 Istructios: Laguage of the Computer 23
24 Extedig the ISA for Virtualizatio Objectives: Avoid flushig TLB Use ested page tables istead of shadow page tables Allow devices to use DMA to move data Allow guest OS s to hadle device iterrupts For security: allow programs to maage ecrypted portios of code ad data Virtual Memory ad Virtual Machies 47 Fallacies ad Pitfalls Predictig cache performace of oe program from aother Simulatig eough istructios to get accurate performace measures of the memory hierarchy Not deliveryig high memory badwidth i a cache-based system 48 Chapter 2 Istructios: Laguage of the Computer 24
Course Site: Copyright 2012, Elsevier Inc. All rights reserved.
Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationCSE 502 Graduate Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 CSE 502 Graduate Computer Architecture Lec 11-14 Advanced Memory Memory Hierarchy Design Larry Wittie Computer Science, StonyBrook
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationMemories. CPE480/CS480/EE480, Spring Hank Dietz.
Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationPage 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!
Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must
More informationIntroduction to cache memories
Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationComputer Architecture ELEC3441
CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by
More informationOperating System Concepts. Operating System Concepts
Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the
More informationMemory Hierarchy Basics
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases
More informationMEMORY HIERARCHY DESIGN
1 MEMORY HIERARCHY DESIGN Chapter 2 OUTLINE Introduction Basics of Memory Hierarchy Advanced Optimizations of Cache Memory Technology and Optimizations 2 READINGASSIGNMENT Important! 3 Read Appendix B
More informationChapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings
Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016
More information1. SWITCHING FUNDAMENTALS
. SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig
More informationBasic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.
5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator
More informationCS61C : Machine Structures
CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor
More informationUH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu
UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies
More informationReducing SDRAM Energy Consumption in Embedded Systems Λ
Reducig SDRAM Eergy Cosumptio i Embedded Systems Λ Jelea Trajkovic, Alexader Veidebaum Uiversity of Califoria, Irvie fjeleat,alexvg@ics.uci.edu Abstract DRAM eergy cosumptio i embedded systems ca be very
More informationInstruction and Data Streams
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad
More informationFundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018
Fudametals of Chapter 1 Microprocessor ad Microcotroller Dr. Farid Farahmad Updated: Tuesday, Jauary 16, 2018 Evolutio First came trasistors Itegrated circuits SSI (Small-Scale Itegratio) to ULSI Very
More informationn Explore virtualization concepts n Become familiar with cloud concepts
Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationMemory technology and optimizations ( 2.3) Main Memory
Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead
More informationOutline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers
Outlie CSCI 4730 s! What is a s?!! System Compoet Architecture s Overview Questios What is a?! What are the major operatig system compoets?! What are basic computer system orgaizatios?! How do you commuicate
More informationArquitectura de Computadores
Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th
More informationSCI Reflective Memory
Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o
More informationLecture 1: Introduction and Fundamental Concepts 1
Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios
More informationChapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig
More information,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics
,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also
More informationCOSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1
COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,
More informationn Learn how resiliency strategies reduce risk n Discover automation strategies to reduce risk
Chapter Objectives Lear how resiliecy strategies reduce risk Discover automatio strategies to reduce risk Chapter #16: Architecture ad Desig Resiliecy ad Automatio Strategies 2 Automatio/Scriptig Resiliet
More informationA collection of open-sourced RISC-V processors
Riscy Processors A collectio of ope-sourced RISC-V processors Ady Wright, Sizhuo Zhag, Thomas Bourgeat, Murali Vijayaraghava, Jamey Hicks, Arvid Computatio Structures Group, CSAIL, MIT 4 th RISC-V Workshop
More informationCS2410 Computer Architecture. Flynn s Taxonomy
CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)
More informationReliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1
Reliable Trasmissio Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Reliable Trasmissio Hello! My computer s ame is Alice. Alice Bob Hello! Alice. Sprig 2018 CS 438 Staff - Uiversity of Illiois 2 Reliable
More informationAvid Interplay Bundle
Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers
More informationFAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS
SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG
More informationECE4050 Data Structures and Algorithms. Lecture 6: Searching
ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied
More informationEnd Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization
Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationAdvanced Computer Architecture- 06CS81-Memory Hierarchy Design
Advanced Computer Architecture- 06CS81-Memory Hierarchy Design AMAT and Processor Performance AMAT = Average Memory Access Time Miss-oriented Approach to Memory Access CPIExec includes ALU and Memory instructions
More informationChapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationCSC 631: High-Performance Computer Architecture
CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 9: Memory Part I CSC 631: High-Performance Computer Architecture 1 Introduction Programmers want unlimited amounts of memory with low
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationThreads and Concurrency in Java: Part 1
Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationThreads and Concurrency in Java: Part 1
Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)
More informationLecture 28: Data Link Layer
Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig
More informationThe VSS CCD photometry spreadsheet
The VSS CCD photometry spreadsheet Itroductio This Excel spreadsheet has bee developed ad tested by the BAA VSS for aalysig results files produced by the multi-image CCD photometry procedure i AIP4Wi v2.
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationLower Bounds for Sorting
Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig
More informationAPPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS
APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationReview: The ACID properties
Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of
More informationCSE 305. Computer Architecture
CSE 305 Computer Architecture Computer Architecture Course Teachers Rifat Shahriyar (rifat1816@gmail.com) Johra Muhammad Moosa Textbook Computer Orgaizatio ad Desig (The Hardware/Software Iterface) David
More informationSession Initiated Protocol (SIP) and Message-based Load Balancing (MBLB)
F5 White Paper Sessio Iitiated Protocol (SIP) ad Message-based Load Balacig (MBLB) The ability to provide ew ad creative methods of commuicatios has esured a SIP presece i almost every orgaizatio. The
More informationU8 Flash Memory Controller
U8 Flash Memory Cotroller U8 U8 Flash Memory Cotroller The Hyperstoe U8 family of Flash Memory Cotrollers together with provided applicatio ad Flash specific firmware offers a easy-to-use turkey platform
More informationAn Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem
A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.
More informationMEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming
MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationChapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.
Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More information