Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
|
|
- Ginger Brianne Neal
- 6 years ago
- Views:
Transcription
1 Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache? + bookkeepng) Ht/mss? Store (stores memory blocks) A B Drect-Mapped Cache: Placement and Access Assume byte-addressable memory: 256 bytes, 8-byte blocks à 32 blocks Assume cache: 64 bytes, 8 blocks Drect-mapped: A block can go to only one locaton 2b 3 bts 3 bts Address V tag Cache ht rate = (# hts) / (# hts + # msses) = (# hts) / (# accesses) Average memory access tme (AMAT) = ( ht-rate ht-latency ) + ( mss-rate mss-latency ) Memory MUX Addresses wth same ndex contend for the same locaton Cause conflct msses 4 1
2 A A, B, A, B, A, B A = b xxx B = b 1 xxx Drect-Mapped Cache: Placement and Access XXX A A, B, A, B, A, B A = b xxx B = b 1 xxx Drect-Mapped Cache: Placement and Access XXX XXXXXXXXX 2 bts 3 bts 3 bts 8-bt address MUX MISS: Fetch A and update tag 2 bts 3 bts 3 bts 8-bt address MUX B A, B, A, B, A, B A = b xxx B = b 1 xxx Drect-Mapped Cache: Placement and Access 1 XXX XXXXXXXXX B A, B, A, B, A, B A = b xxx B = b 1 xxx Drect-Mapped Cache: Placement and Access 1 XXX YYYYYYYYYY 2 bts 3 bts 3 bts 8-bt address MUX Tags do not match: MISS 2 bts 3 bts 3 bts 8-bt address MUX Fetch block B, update tag 2
3 A A, B, A, B, A, B A = x xxx B = x 1 xxx Drect-Mapped Cache: Placement and Access XXX YYYYYYYYYY A A, B, A, B, A, B A = x xxx B = x 1 xxx Drect-Mapped Cache: Placement and Access XXX XXXXXXXXX 2 bts 3 bts 3 bts 8-bt address MUX Tags do not match: MISS 2 bts 3 bts 3 bts 8-bt address MUX Fetch block A, update tag Set Assocatve Cache Assocatvty (and Tradeoffs) A A, B, A, B, A, B A = b xxx B = b 1 xxx XXX XXXXXXXXX YYYYYYYYYY MUX Degree of assocatvty: How many blocks can map to the same ndex (or set)? Hgher assocatvty ++ Hgher ht rate -- Slower cache access tme (ht latency and data access latency) -- More expensve hardware (more comparators) 3 bts 2 bts 3 bts 8-bt address HIT MUX Dmnshng returns from hgher assocatvty 12 ht rate assocatvty 3
4 Issues n Set-Assocatve Caches Thnk of each block n a set havng a prorty Indcatng how mportant t s to keep the block n the cache Key ssue: How do you determne/adust block prortes? There are three key decsons n a set: Inserton, promoton, evcton (replacement) Inserton: What happens to prortes on a cache fll? Where to nsert the ncomng block, whether or not to nsert the block Promoton: What happens to prortes on a cache ht? Whether and how to change block prorty Evcton/replacement: What happens to prortes on a cache mss? Whch block to evct and how to adust prortes Evcton/Replacement Polcy Whch block n the set to replace on a cache mss? Any nvald block frst If all are vald, consult the replacement polcy Random FIFO Least recently used (how to mplement?) Not most recently used Least frequently used Hybrd replacement polces Set LRU -1-2 A B C D Set LRU -1-2 E B C D ACCESS PATTERN: ACBD 15 ACCESS PATTERN: ACBDE 16 4
5 Set -1-2 E B C D Set E B C D ACCESS PATTERN: ACBDE 17 ACCESS PATTERN: ACBDE 18 Set E B C D Set -2 LRU -1 E B C D ACCESS PATTERN: ACBDE 19 ACCESS PATTERN: ACBDE 2 5
6 Set LRU -1 E B C D Set -1 LRU -1 E B C D ACCESS PATTERN: ACBDEB 21 ACCESS PATTERN: ACBDEB 22 Set -1 E B C D LRU -2 Implementng LRU Idea: Evct the least recently accessed block Problem: Need to keep track of access orderng of blocks Queston: 2-way set assocatve cache: What do you need to mplement LRU perfectly? Queston: 16-way set assocatve cache: What do you need to mplement LRU perfectly? What s the logc needed to determne the LRU vctm? ACCESS PATTERN: ACBDEB
7 Approxmatons of LRU Most modern processors do not mplement true LRU (also called perfect LRU ) n hghly-assocatve caches Why? True LRU s complex LRU s an approxmaton to predct localty anyway (.e., not the best possble cache management polcy) Examples: Not (not most recently used) Cache Replacement Polcy: LRU or Random LRU vs. Random: Whch one s better? Example: 4-way cache, cyclc references to A, B, C, D, E % ht rate wth LRU polcy Set thrashng: When the program workng set n a set s larger than set assocatvty Random replacement polcy s better when thrashng occurs In practce: Depends on workload Average ht rate of LRU and Random are smlar Best of both Worlds: Hybrd of LRU and Random How to choose between the two? Set samplng See Quresh et al., A Case for MLP-Aware Cache Replacement, ISCA What s In A Tag Store Entry? Vald bt Tag Replacement polcy bts Drty bt? Wrte back vs. wrte through caches Handlng Wrtes (I) n When do we wrte the modfed data n a cache to the next level? Wrte through: At the tme the wrte happens Wrte back: When the block s evcted Wrte-back + Can consoldate multple wrtes to the same block before evcton Potentally saves bandwdth between cache levels + saves energy -- Need a bt n the tag store ndcatng the block s drty/modfed Wrte-through + Smpler + All levels are up to date. Consstent -- More bandwdth ntensve; no coalescng of wrtes
8 Handlng Wrtes (II) Do we allocate a cache block on a wrte mss? Allocate on wrte mss No-allocate on wrte mss Allocate on wrte mss + Can consoldate wrtes nstead of wrtng each of them ndvdually to next level + Smpler because wrte msses can be treated the same way as read msses -- Requres (?) transfer of the whole cache block No-allocate + Conserves cache space f localty of wrtes s low (potentally better cache ht rate) Instructon vs. Caches Separate or Unfed? Unfed: + Dynamc sharng of cache space: no overprovsonng that mght happen wth statc parttonng (.e., splt I and D caches) -- Instructons and data can thrash each other (.e., no guaranteed space for ether) -- I and D are accessed n dfferent places n the ppelne. Where do we place the unfed cache for fast access? Frst level caches are almost always splt Manly for the last reason above Second and hgher levels are almost always unfed 29 3 Mult-level Cachng n a Ppelned Desgn Frst-level caches (nstructon and data) Decsons very much affected by cycle tme Small, lower assocatvty and data store accessed n parallel Second-level, thrd-level caches Decsons need to balance ht rate and access latency Usually large and hghly assocatve; latency less crtcal and data store accessed serally Cache Performance Seral vs. Parallel access of levels Seral: Second level cache accessed only f frst-level msses Second level does not see the same accesses as the frst Frst level acts as a flter (flters some temporal and spatal localty) Management polces are therefore dfferent 31 8
9 Cache Parameters vs. Mss/Ht Rate Cache sze Block sze Assocatvty Replacement polcy Inserton/Placement polcy 33 Cache Sze Cache sze: total data (not ncludng tag) capacty bgger can explot temporal localty better not ALWAYS better Too large a cache adversely affects ht and mss latency smaller s faster => bgger s slower access tme may degrade crtcal path ht rate Too small a cache doesn t explot temporal localty well useful data replaced often workng set Workng set: the whole set of data the executng applcaton references Wthn a tme nterval 34 sze cache sze Block Sze Block sze s the data that s assocated wth an address tag Assocatvty How many blocks can map to the same ndex (or set)? Too small blocks don t explot spatal localty well have larger tag overhead Too large blocks too few total # of blocks à less temporal localty explotaton waste of cache space and bandwdth/energy f spatal localty s not hgh Wll see more examples later ht rate block sze Larger assocatvty lower mss rate, less varaton among programs dmnshng returns, hgher ht latency ht rate Smaller assocatvty lower cost lower ht latency Especally mportant for L1 caches Power of 2 assocatvty requred? assocatvty
10 Hgher Assocatvty Hgher Assocatvty 3-way 4 bts 1 bts 3 bts 8-bt address 4 bts 1 bts 3 bts 8-bt address MUX MUX MUX MUX Classfcaton of Cache Msses Compulsory mss frst reference to an address (block) always results n a mss subsequent references should ht unless the cache block s dsplaced for the reasons below Capacty mss cache s too small to hold everythng needed defned as the msses that would occur even n a fully-assocatve cache (wth optmal replacement) of the same capacty Conflct mss defned as any mss that s nether a compulsory nor a capacty mss How to Reduce Each Mss Type Compulsory Cachng cannot help Prefetchng Conflct More assocatvty Other ways to get more assocatvty wthout makng the cache assocatve Vctm cache Hashng Software hnts? Capacty Utlze cache space better: keep blocks that wll be referenced Software management: dvde workng set such that each phase fts n cache
11 Matrx Sum Cache Performance wth Code Examples nt sum1(nt matrx[4][8]) { nt sum = ; for (nt = ; < 4; ++) { for (nt = ; < 8; ++) { sum += matrx[][]; } } } access pattern: matrx[][], [][1], [][2],, [1][] Explotng Spatal Localty 8B cache block, 4 blocks, LRU, 4B nteger Access pattern matrx[][], [][1], [][2],, [1][] [][] à mss [][1] à ht [][2] à mss [][3] à ht [][4] à mss [][5] à ht [][6] à mss [][7] à ht [1][] à mss [1][1] à ht [][]-[][1] [][2]-[][3] [][4]-[][5] [][6]-[][7] Cache Blocks Replace [1][]-[1][1] [][2]-[][3] [][4]-[][5] [][6]-[][7] Explotng Spatal Localty block sze and spatal localty larger blocks explot spatal localty but larger blocks means fewer blocks for same sze less good at explotng temporal localty 11
12 Alternate Matrx Sum nt sum2(nt matrx[4][8]) { nt sum = ; // swapped loop order for (nt = ; < 8; ++) { for (nt = ; < 4; ++) { sum += matrx[][]; } } } access pattern: matrx[][], [1][], [2][], [3][], [][1], [1][1], [2][1], [3][1],, Bad at Explotng Spatal Localty 8B cache block, 4B nteger Access pattern matrx[][], [1][], [2][], [3][], [][1], [1][1], [2][1], [3][1],, [][] à mss [1][] à mss [2][] à mss [3][] à mss [][1] à ht [1][1] à ht [2][1] à ht [3][1] à ht [][2] à mss [1][2] à mss [][]-[][1] [1][]-[1][1] [2][]-[2][1] [3][]-[3][1] Cache Blocks Replace [][2]-[][3] [1][]-[1][1] [2][]-[2][1] [3][]-[3][1] Replace [][2]-[][3] [1][2]-[1][3] [2][]-[2][1] [3][]-[3][1] A note on matrx storage A > N X N matrx: represented as an 2D array makes dynamc szes easer: float A_2d_array[N][N]; float A_flat = malloc(n N); A_flat[ N + ] === A_2d_array[][] B "# = & A "( A (# (+, / verson 1: nner loop s k, mddle s / for (nt = ; < N; ++) for (nt = ; < N; ++) for (nt k = ; k < N; ++k) B[N+] += A[ N + k] A[k N + ]; 12
13 B B -, B -. B -/ B,- B,, B,. B,/ A -- A -, A -. A -/ A,- A,, A,. A,/ A.- A., A.. A./ A /- A /, A /. A // B B -, B -. B -/ B,- B,, B,. B,/ A A -, A -. A -/ A,- A,, A,. A,/ A.- A., A.. A./ A /- A /, A /. A // B -- = & A -( A (- (+- B -- = (A -- A -- ) + (A -, A,- ) + (A -. A.- ) + (A -/ A /- ) B -- = & A -( A (- (+- B -- = (A A ) + (A -, A,- ) + (A -. A.- ) + (A -/ A /- ) B B -, B -. B -/ B,- B,, B,. B,/ A A 1 A -. A -/ A 1 A,, A,. A,/ A.- A., A.. A./ A /- A /, A /. A // B B -, B -. B -/ B,- B,, B,. B,/ A A 1 A 2 A -/ A 1 A,, A,. A,/ A 2 A., A.. A./ A /- A /, A /. A // B -- = & A -( A (- (+- B -- = (A -- A -- ) + (A 1 A 1 ) + (A -. A.- ) + (A -/ A /- ) B -- = & A -( A (- (+- B -- = (A -- A -- ) + (A -, A,- ) + (A 2 A 2 ) + (A -/ A /- ) 13
14 B B -, B -. B -/ B,- B,, B,. B,/ A A 1 A 2 A 3 A 1 A,, A,. A,/ A 2 A., A.. A./ A 3 A /, A /. A // A k has spatal localty B -- B 1 B -. B -/ B,- B,, B,. B,/ A A 1 A 2 A 3 A,- A 11 A,. A,/ A.- A 21 A.. A./ A /- A 31 A /. A // A k has spatal localty B -- = & A -( A (- (+- B -- = (A -- A -- ) + (A -, A,- ) + (A -. A.- ) + (A 3 A 3 ) B -, = & A -( A (, (+- B 1 = (A A 1 ) + (A 1 A 11 ) + (A 2 A 21 ) + (A 3 A 31 ) Concluson B -- B -, B 2 B -/ B,- B,, B,. B,/ A A 1 A 2 A 3 A,- A,, A 12 A,/ A.- A., A 22 A./ A /- A /, A 32 A // A k has spatal localty A k has spatal localty B has temporal localty B -. = & A -( A (. (+- B 2 = (A A 2 ) + (A 1 A 12 ) + (A 2 A 22 ) + (A 3 A 32 ) 14
15 B "# = & A "( A (# (+, / verson 2: outer loop s k, mddle s / for (nt k = ; k < N; ++k) for (nt = ; < N; ++) for (nt = ; < N; ++) B[N+] += A[ N + k] A[k N + ]; Access pattern k =, = B[][] = A[][] A[][] B[][1] = A[][] A[][1] B[][2] = A[][] A[][2] B[][3] = A[][] A[][3] Access pattern k =, = 1 B[1][] = A[1][] A[][] B[1][1] = A[1][] A[][1] B[1][2] = A[1][] A[][2] B[1][3] = A[1][] A[][3] : k order B B 1 B 2 B 3 B,- B,, B,. B,/ A A 1 A 2 A 3 A,- A,, A,. A,/ A.- A., A.. A./ A /- A /, A /. A // B = (A A ) + (A -, A,- ) + (A -. A.- ) + (A -/ A /- ) B 1 = (A A 1 ) + (A -, A,, ) + (A -. A., ) + (A -/ A /, ) B 2 = (A A 2 ) + (A -, A,. ) + (A -. A.. ) + (A -/ A /. ) B 3 = (A A 3 ) + (A -, A,/ ) + (A -. A./ ) + (A -/ A // ) : k order B -- B -, B -. B -/ B 1 B 11 B 12 B 13 A A 1 A 2 A 3 A 1 A,, A,. A,/ A.- A., A.. A./ A /- A /, A /. A // B, A k have spatal localty A k has temporal localty B 1 = (A 1 A ) + (A,, A,- ) + (A,. A.- ) + (A,/ A /- ) B 11 = (A 1 A 1 ) + (A,, A,, ) + (A,. A., ) + (A,/ A /, ) B 12 = (A 1 A 2 ) + (A,, A,. ) + (A,. A.. ) + (A,/ A /. ) B 13 = (A 1 A 3 ) + (A,, A,/ ) + (A,. A./ ) + (A,/ A // ) k order B, A k have spatal localty A k has temporal localty k order A k has spatal localty B has temporal localty 15
16 Whch order s better? Order k performs much better 16
Computer Architecture ELEC3441
Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationMemory and I/O Organization
Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest
More informationADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Page Table Structures Page table structures Page table structures Problem: smple lnear table s too bg Problem:
More informationIf you miss a key. Chapter 6: Demand Paging Source:
ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday
More information4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.
//7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the
More information#4 Inverted page table. The need for more bookkeeping. Inverted page table architecture. Today. Our Small Quiz
ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operatng Systems (-6-) Chapter 6: Demand Pagng http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor () # Inverted table One system-wde table
More informationCaches. Samira Khan March 23, 2017
Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationCache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed
More informationCaches 3/23/17. Agenda. The Dataflow Model (of a Computer)
Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationLecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.
Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons
More informationGiving credit where credit is due
CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege
More informationInsertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array
Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationReal-Time Guarantees. Traffic Characteristics. Flow Control
Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak
More informationAn Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems
S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,
More informationGoals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture
Goals and Approach CS 194: Dstrbuted Systems Resource Allocaton Goal: acheve predcable performances Three steps: 1) Estmate applcaton s resource needs (not n ths lecture) 2) Admsson control 3) Resource
More informationSorting. Sorting. Why Sort? Consistent Ordering
Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationOptimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute
More informationSorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions
Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place
More informationNachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16
Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos
More informationSorting. Sorted Original. index. index
1 Unt 16 Sortng 2 Sortng Sortng requres us to move data around wthn an array Allows users to see and organze data more effcently Behnd the scenes t allows more effectve searchng of data There are MANY
More informationFor instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationToday s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.
Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:
More informationCACHE MEMORY DESIGN FOR INTERNET PROCESSORS
CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET
More information15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 12: Advanced Caching Prof. Onur Mutlu Carnegie Mellon University Announcements Chuck Thacker (Microsoft Research) Seminar Tomorrow RARE: Rethinking Architectural
More informationVerification by testing
Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationFIRM: Fair and High-Performance Memory Control for Persistent Memory Systems
FIRM: Far and Hgh-Performance Memory Control for Persstent Memory Systems Jshen Zhao, Onur Mutlu, Yuan Xe Pennsylvana State Unversty, Carnege Mellon Unversty, Unversty of Calforna, Santa Barbara, Hewlett-Packard
More informationLecture 7 Real Time Task Scheduling. Forrest Brewer
Lecture 7 Real Tme Task Schedulng Forrest Brewer Real Tme ANSI defnes real tme as A Real tme process s a process whch delvers the results of processng n a gven tme span A data may requre processng at a
More informationThe stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0
The stream cpher MICKEY-128 (verson 1 Algorthm specfcaton ssue 1. Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net
More informationLoop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation
Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop
More informationTHE low-density parity-check (LDPC) code is getting
Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes
More informationCE 221 Data Structures and Algorithms
CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationLoop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)
Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks
More informationCache Sharing Management for Performance Fairness in Chip Multiprocessors
Cache Sharng Management for Performance Farness n Chp Multprocessors Xng Zhou Wenguang Chen Wemn Zheng Dept. of Computer Scence and Technology Tsnghua Unversty, Bejng, Chna zhoux07@mals.tsnghua.edu.cn,
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationPriority queues and heaps Professors Clark F. Olson and Carol Zander
Prorty queues and eaps Professors Clark F. Olson and Carol Zander Prorty queues A common abstract data type (ADT) n computer scence s te prorty queue. As you mgt expect from te name, eac tem n te prorty
More informationTripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment
TrpS: Automated Mult-tered Data Placement n a Geo-dstrbuted Cloud Envronment Kwangsung Oh, Abhshek Chandra, and Jon Wessman Department of Computer Scence and Engneerng Unversty of Mnnesota Twn Ctes Mnneapols,
More informationParallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)
Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)
More informationDistributed Resource Scheduling in Grid Computing Using Fuzzy Approach
Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,
More informationAADL : about scheduling analysis
AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng
More informationDesign and Analysis of Algorithms
Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationIntro. Iterators. 1. Access
Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationReal-Time Systems. Real-Time Systems. Verification by testing. Verification by testing
EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department
More informationChapter 1. Introduction
Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal
More informationAdvanced Computer Networks
Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans
More informationQ.1 Q.20 Carry One Mark Each. is differentiable for all real values of x
Q. Q.0 Carry One Mark Each CS Computer Scence: Gate 007 Paper. Consder the followng two statements about the functon f ( x) = x : P. f ( x) s contnuous for all real values of x Q. f ( x) s dfferentable
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationA fair buffer allocation scheme
A far buffer allocaton scheme Juha Henanen and Kalev Klkk Telecom Fnland P.O. Box 228, SF-330 Tampere, Fnland E-mal: juha.henanen@tele.f Abstract An approprate servce for data traffc n ATM networks requres
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationUtility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs
Utlty-Based Acceleraton of Multthreaded Applcatons on Asymmetrc CMPs José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ECE Department The Unversty of Texas at Austn Austn, TX, USA {joao, patt}@ece.utexas.edu
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationReal-time Scheduling
Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX
More informationCS 268: Lecture 8 Router Support for Congestion Control
CS 268: Lecture 8 Router Support for Congeston Control Ion Stoca Computer Scence Dvson Department of Electrcal Engneerng and Computer Scences Unversty of Calforna, Berkeley Berkeley, CA 9472-1776 Router
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationAMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references
More informationIEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 12, DECEMBER On-Bound Selection Cache Replacement Policy for Wireless Data Access
IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 12, DECEMBER 2007 1597 On-Bound Selecton Cache Replacement Polcy for Wreless Data Access Hu Chen, Member, IEEE, and Yang Xao, Senor Member, IEEE Abstract Cache
More informationAdaptive Scheduling for Systems with Asymmetric Memory Hierarchies
Appears n the Proceedngs of the 51st Annual IEEE/ACM Internatonal Symposum on Mcroarchtecture (MICRO), 218 Adaptve Schedulng for Systems wth Asymmetrc Memory Herarches Po-An Tsa, Changpng Chen, Danel Sanchez
More informationThread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
Thread Cluster Memory Schedulng: Explotng Dfferences n Memory Access Behavor Yoongu Km Mchael Papamchael Onur Mutlu Mor Harchol-Balter yoonguk@ece.cmu.edu papamx@cs.cmu.edu onur@cmu.edu harchol@cs.cmu.edu
More informationCollision Detection. Overview. Efficient Collision Detection. Collision Detection with Rays: Example. C = nm + (n choose 2)
Overvew Collson detecton wth Rays Collson detecton usng BSP trees Herarchcal Collson Detecton OBB tree, k-dop tree algorthms Multple object CD system Collson Detecton Fundamental to graphcs, VR applcatons
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationStoring Matrices on Disk: Theory and Practice Revisited
Storng Matrces on Dsk: Theory and Practce Revsted Y Zhang Duke Unversty yzhang@cs.duke.edu Kamesh Munagala Duke Unversty kamesh@cs.duke.edu Jun Yang Duke Unversty junyang@cs.duke.edu ASTRACT We consder
More informationA Predictable Execution Model for COTS-based Embedded Systems
2011 17th IEEE Real-Tme and Embedded Technology and Applcatons Symposum A Predctable Executon Model for COTS-based Embedded Systems Rodolfo Pellzzon, Emlano Bett, Stanley Bak, Gang Yao, John Crswell, Marco
More informationCircuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)
Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,
More informationCS1100 Introduction to Programming
Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationLECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming
CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)
More informationQ3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache
Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper
More informationSolving Planted Motif Problem on GPU
Solvng Planted Motf Problem on GPU Naga Shalaja Dasar Old Domnon Unversty Norfolk, VA, USA ndasar@cs.odu.edu Ranjan Desh Old Domnon Unversty Norfolk, VA, USA dranjan@cs.odu.edu Zubar M Old Domnon Unversty
More informationConcurrent models of computation for embedded software
Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent
More informationStorage Binding in RTL synthesis
Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationFRES-CAR: An Adaptive Cache Replacement Policy
FRES-CAR: An Adaptve Cache Replacement Polcy George Palls, Athena Vaal, Eythms Sdropoulos Department of Informatcs Arstotle Unversty of Thessalon, 54124, Thessalon, Greece gpalls@ccf.auth.gr, {avaal, eythms}@csd.auth.gr
More informationOutline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011
9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals
More informationClustered Multimedia NOD : Popularity-Based Article Prefetching and Placement
Clustered Multmeda NOD : Popularty-Based Artcle Prefetchng and Placement Y.J.Km, T.U.Cho, K.O.Jung, Y.K.Kang, S.H.Park, K-Dong Chung Department of Computer Scence, Pusan Natonal Unversty, Korea Abstract
More informationBrave New World Pseudocode Reference
Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be
More informationComputer Animation and Visualisation. Lecture 4. Rigging / Skinning
Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume
More informationCSE 326: Data Structures Quicksort Comparison Sorting Bound
CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson
More informationPerformance Study of Parallel Programming on Cloud Computing Environments Using MapReduce
Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationA Statistical Model Selection Strategy Applied to Neural Networks
A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos
More informationSample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam
Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationMultiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities
Multple Sub-Row Buffers n DRAM: Unlockng Performance and Energy Improvement Opportuntes ABSTRACT Nagendra Gulur Texas Instruments (Inda) nagendra@t.com Mahesh Mehendale Texas Instruments (Inda) m-mehendale@t.com
More informationMQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices
MQSm: A Framework for Enablng Realstc Studes of Modern Mult-Queue SSD Devces Arash Tavakkol, Juan Gómez-Luna, and Mohammad Sadrosadat, ETH Zürch; Saugata Ghose, Carnege Mellon Unversty; Onur Mutlu, ETH
More informationVIRTUAL MEMORY READING: CHAPTER 9
VIRTUAL MEMORY READING: CHAPTER 9 9 MEMORY HIERARCHY Core! Processor! Core! Caching! Main! Memory! (DRAM)!! Caching!! Secondary Storage (SSD)!!!! Secondary Storage (Disk)! L cache exclusive to a single
More informationReliability and Energy-aware Cache Reconfiguration for Embedded Systems
Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More information