Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
|
|
- Anis Pierce
- 6 years ago
- Views:
Transcription
1 Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute from L$ nstead from mem ==> 5-5x mprovement At least a factor -x s wthn reach OPT Optmzng for cache performance What can go Wrong? A Smple Example Perform a dagonal copy tmes Keep the actve footprnt small Use the entre cache lne once t has been brought nto the cache Fetch a cache lne pror to ts usage Let the CPU that already has the data n ts cache do the ob... N N OPT 3 OPT
2 Example: Loop order Performance Dfference: Loop order //Optmzed Example A //Unoptmzed Example A for (=; <N; ++) { for (=; <N; ++) { A[][]= A[-][-]; for (=; <N; ++) { for (=; <N; ++) { A[][] = A[-][-];? Speedup vs UnOpt Athlon6 x Pentum D Core Duo Array sde OPT 5 OPT 6 Example: Sparse data //Optmzed Example A for (=; <N; ++) { for (=; <N; ++) { A_data[][]= A_data[-][-]; //Unoptmzed Example A for (=; <N; ++) { for (=; <N; ++) { A[][].data = A[-][-].data; dddd d d d d Performance Dfference: Sparse Data Speedup vs UnOPT Athlon6 x Pentum D Core Duo Array sde OPT 7 OPT 8
3 Loop Mergng Paddng of data structures /* Unoptmzed */ for ( = ; < N; = + ) for ( = ; < N; = + ) a[][] = * b[][]; for ( = ; < N; = + ) for ( = ; < N; = + ) c[][] = K * b[][] + d[][]/ Cachelne:? A lsb A+56*8 A+56**8 ndex 56 = (3) = (3) /* Optmzed */ for ( = ; < N; = + ) for ( = ; < N; = + ) a[][] = * b[][]; c[][] = K * b[][] + d[][]/; 56 & logc Ht? & () Select Multp (: m (3) Data OPT 9 OPT Paddng of data structures Cachelne:? A lsb A+56*8+paddng (7) A+56**8+*paddng ndex 56 (3) = (3) = Blocng /* Unoptmzed ARRAY: x = y * z */ for ( = ; < N; = + ) for ( = ; < N; = + ) {r = ; for ( = ; < N; = + ) r = r + y[][] * z[][]; x[][] = r; ; 56+paddng & & X: Y: Z: allocate more memory than needed logc Ht? S OPT OPT
4 Blocng /* Optmzed ARRAY: X = Y * Z */ for ( = ; < N; = + B) for ( = ; < N; = + B) for ( = ; < N; = + ) for ( = ; < mn(+b,n); = + ) {r = ; for ( = ; < mn(+b,n); = + ) r = r + y[][] * z[][]; x[][] += r; ; X: Partal soluton Y: OPT 3 Z: Frst bloc Second bloc Blocng: the Move! Partal soluton /* Optmzed ARRAY: X = Y * Z */ for ( = ; < N; = + B) /* Loop 5 */ for ( = ; < N; = + B) /* Loop */ for ( = ; < N; = + ) /* Loop 3 */ for ( = ; < mn(+b,n); = + ) /* Loop */ {r = ; for ( = ; < mn(+b,n); = + ) /* Loop */ r = r + y[][] * z[][]; X: x[][] += r; ; +B 5 Y: +B 3 3 OPT Z: +B Second bloc Frst bloc 5 +B Prefetchng Cache Affnty /* Unoptmzed */ for ( = ; < N; ++) for ( = ; < N; ++) x[][] = * x[][]; Schedule the process on the processor t last ran /* Optmzed */ for ( = ; < N; ++) for ( = ; < N; ++) PREFETCH x[+][] x[][] = * x[][]; Allocate and free data buffers n a LIFO order (Typcally, the HW prefetcher wll successfully prefetch sequental streams) OPT 5 OPT 6
5 Optmze for other caches TLB... Avod random accesses to huge data structs (Ex. Huge hashng table) Avod few access per page (very sparse data) Commercal Brea: Acumem s Multcore Tools Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se OPT 7 Acumem SlowSpotter Source: C, C++, Fortran, OpenMP /* Unoptmzed Array Multplcaton: x = y * z N = */ for ( = ; < N; = + ) for ( = ; < N; = + ) {r = ; for ( = ; < N; = + ) r = r + y[][] * z[][]; x[][] = r; /* Unoptmzed Array Multplcaton: x = y * z N = */ for ( = ; < N; = + ) for ( = ; < N; = + ) {r = ; for ( = ; < N; = + ) r = r + y[][] * z[][]; x[][] = r; Any Compler Msson: Fnd the SlowSpots Asses ther mportance Enable for non-experts to fx them Improve the productvty of performance experts Acumem SlowSpotter Source: C, C++, Fortran... /* Unoptmzed Array Multplcaton: x = y * z N = */ for ( = ; < N; = + ) for ( = ; < N; = + ) {r = ; for ( = ; < N; = + ) r = r + y[][] * z[][]; x[][] = r; /* Unoptmzed Array Multplcaton: x = y * z N = */ for ( = ; < N; = + ) for ( = ; < N; = + ) {r = ; for ( = ; < N; = + ) r = r + y[][] * z[][]; x[][] = r; Any Compler What? How? Help! Msson: Fnd the Where? SlowSpots Asses ther mportance Enable for non-experts to fx them Improve the productvty of performance experts Sampler n Fnger Prnt (~MB) Sampler n Fnger Prnt (~MB) Analyss n Advce n Bnary Host System OPT 9 Bnary Host System OPT Target System Parameters
6 A One-Clc Report Generaton Fll n the followng felds: Applcaton to run Input arguments Worng dr (where to run the app) (Lmt, f you le, data gathered here, e.g., start gatherng after after sec. and stop after sec.) Mss rate Fetch rate Cache utlzaton Fracton of cache data utlzed Predcted fetch rate (f utlzaton %) Cache sze Clc ths button to create a report Cache sze of the target system for optmzaton (e.g., L or L sze) OPT OPT Loop Focus Tab Spottng the crme Lst of bad loops Cache sze to optmze for Explanng what to do OPT 3 OPT
7 Bandwdth Focus Tab Resource Sharng Example Spottng the crme Lbquantum A quantum computer smulaton Wdely used n research (download from: ) + lnes of C, farly complex code. Runs an experment n ~3 mn Throughput mprovement: Lst of Bandwdth SlowSpots,5 Explanng what to do Relatve Throughput,5 3 Number of Cores Used OPT 5 OPT 6 6 Utlzaton Analyss Lbquantum Utlzaton Analyss Lbquantum Fetch rate Predcted fetch rate f utlzaton = % Cache utlzaton Fracton of cache data utlzed Orgnal Code.3% Cache sze data status data status data status data 3 status 3 record Only accessng status data n man loop Need 3 MB per thread! Fetch rate Predcted fetch rate f utlzaton = % Orgnal Code Cache utlzaton Fracton of cache data utlzed Cache sze Utlzaton Optmzaton for (=; ++; <MAX) {... = huge_data[].status +... for (=; ++; <MAX) {... = huge_data_status[] +... SlowSpotter s Frst Advce: Improve Utlzaton Change one data structure Involves ~ lnes of code Taes a non-expert 3 mn SlowSpotter s Frst Advce: Improve Utlzaton Change one data structure Involves ~ lnes of code Taes a non-expert 3 mn OPT 7 OPT 8
8 After Utlzaton Optmzaton Lbquantum Utlzaton Optmzaton Old fetch rate Orgnal Code Cache Utlzaton 95% Utlzaton Optmzaton Old fetch rate Orgnal Code Cache Utlzaton 95% Utlzaton Optmzaton Cache sze Predcted fetch rate New fetch rate Cache sze Predcted fetch rate New fetch rate Two postve effects from better utlzaton. Each fetch brngs n more useful data lower fetch rate. The same amount of useful data can ft n a smaller cache shft left OPT 9 OPT 3 Reuse Analyss Lbquantum Effect: Reuse Optmzaton SPEC CPU6-6.lbquantum Fetch rate Utlzaton Optmzaton Utlzaton + Fuson Optmzaton... toffol(huge_data,...) cnot(huge_data, fused_toffol_cnot(huge_data,...)... Old fetch rate Utlzaton Optmzaton New fetch rate Utlzaton + Fuson Optmzaton Second-Ffth SlowSpotter Advce: Improve reuse of data Fuse functons traversng the same data Here: four fused functons created Taes a non-expert < h The mss n the second loop goes away Stll need the same amount of cache to ft all data OPT 3 OPT 3
9 Utlzaton + Reuse Optmzaton Lbquantum Summary Lbquantum Old fetch rate Utlzaton Optmzaton New fetch rate Utlzaton + Fuson Optmzaton 5 Orgnal Utlzaton Optmzaton Utlzaton + Fuson.7x Throughput 3 Fetch rate down to.3% for MB Same as a 3 MB cache orgnally 3 # Cores Used OPT 33 OPT 3 3 Demo Orgnal Cgar Throughput Demo Tme! 3 Throughput Lbquantum: Org code Spatal opt Spat + Loop fuson Performance Edt-comple-analyss cycle mn OPT 35 Throughput scalablty s a dfferent way to loo at the performance of an applcaton. Here, several sngle-threaded nstances of the applcaton s run at the same tme. Even though the dfferent nstances do not explctly depend on each other, they wll nevertheless fght over the shared resources, e.g., runnng four threads on four cores mples that each thread wll get one quarter of the shared cache. A system usng four cores to run four nstances of Cgar wll actually result n a lower throughput than f only three cores were used. 3 # Cores OPT 36
10 Throughput Performance Intel Core (Intel Xeon E535) Throughput Performance (AMD s Istanbul) 33x 7x The optmzaton puts a much lower pressure on the shared cache resultng n a 33x better throughput for four cores. AMDs new sx-core Istanbul processor can enoy a 7x better throughput due to the optmzaton on sx cores OPT 37 OPT 38 Throughput Performance (Intel 7) 5,5 Normalzed Throughput 3x Cache sharng ssues 7,5 5,5 7,5 5,5 Orgnal Optmzed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se # Threads Intel s new four-core 7 (Nehalem) processor enoy a 3x better throughput due to the Optmzaton on four cores. Note that each core can run up to two threads. OPT 39 OPT
11 Fghtng for shared resources Example: Hnts to avod cache polluton (non-temporal prefetches) Bnary Core Bnary cache msses The larger cache, the better $ wasted Mem st Order MC Performance Problems Addtonal multcore ssues: Even less cache resources per applcaton Sharng of cache resources Wasted cache usage x mssrate mssrate 3 One Instance actual/ Four Instances Hnt: Don t allocate! actual cache sze Throughput % faster Org Orgnal Lm=.7MB Hnt: lm= actual/ OPT OPT Example: Hnts for mxed worloads (non-temporal prefetches) Some performance tools Mss rate,5,,5,,5 streamng bgger s better tny M M M 8M 6M áctual 3M 6M Lbquantum LBM bzp Cache sze Free lcenses Oprofle GNU: gprof AMD: code analyst Google performance tools Vrtual Inst: Hgh Productvty Supercomputng ( Sun Studo Performance,,8,6,, Indvdually In mx In mx, patched bzp Lbquantum LBM Geom mean 5% Not free Intel: Vtune and many more Alnea, TotalVew, (for MPI ) Acumem (of course ) HP: Multcore toolt (some free, some not) AMD Opteron OPT 3 OPT
Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.
Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons
More informationCache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?
More informationComputer Architecture ELEC3441
Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold
More informationComplex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.
Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationMotivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:
4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationCache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed
More information4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.
//7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the
More informationOptimizing Made Easy: ThreadSpotter Erik Hagersten, Chief Scientist
Copyright 2012 Rogue Wave Software All Rights Reserved Optimizing Made Easy: ThreadSpotter Erik Hagersten, Chief Scientist Rogue Wave: A Global Company Sweden Germany Moscow, reseller OR UK TX MA CO (HQ)
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationDesign and Analysis of Algorithms
Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationAMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references
More informationGiving credit where credit is due
CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege
More informationA fair buffer allocation scheme
A far buffer allocaton scheme Juha Henanen and Kalev Klkk Telecom Fnland P.O. Box 228, SF-330 Tampere, Fnland E-mal: juha.henanen@tele.f Abstract An approprate servce for data traffc n ATM networks requres
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationActive Contours/Snakes
Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng
More informationKent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming
CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationAADL : about scheduling analysis
AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng
More informationCS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)
CS: Algorthms and Data Structures Prorty Queues and Heaps Alan J. Hu (Borrowng sldes from Steve Wolfman) Learnng Goals After ths unt, you should be able to: Provde examples of approprate applcatons for
More informationCollaboratively Regularized Nearest Points for Set Based Recognition
Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationIntroduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers
1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationVectorization in the Polyhedral Model
Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton
More informationBiostatistics 615/815
The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts
More informationWavefront Reconstructor
A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes
More informationQuantifying Responsiveness of TCP Aggregates by Using Direct Sequence Spread Spectrum CDMA and Its Application in Congestion Control
Quantfyng Responsveness of TCP Aggregates by Usng Drect Sequence Spread Spectrum CDMA and Its Applcaton n Congeston Control Mehd Kalantar Department of Electrcal and Computer Engneerng Unversty of Maryland,
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 5 Luca Trevisan September 7, 2017
U.C. Bereley CS294: Beyond Worst-Case Analyss Handout 5 Luca Trevsan September 7, 207 Scrbed by Haars Khan Last modfed 0/3/207 Lecture 5 In whch we study the SDP relaxaton of Max Cut n random graphs. Quc
More informationSimulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010
Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement
More informationMemory Technology. Erik Hagersten Uppsala University, Sweden
Memory Technology Erik Hagersten Uppsala University, Sweden eh@it.uu.se Main memory characteristics DRAM: Main memory is built from DRAM: Dynamic RAM 1 transistor/bit ==> more error prone and slow Refresh
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationQuantifying Performance Models
Quantfyng Performance Models Prof. Danel A. Menascé Department of Computer Scence George Mason Unversty www.cs.gmu.edu/faculty/menasce.html 1 Copyrght Notce Most of the fgures n ths set of sldes come from
More informationInsertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array
Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then
More information3D vector computer graphics
3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres
More informationCS1100 Introduction to Programming
Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationEnd-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video
End-to-end Dstorton Estmaton for RD-based Robust Delvery of Pre-compressed Vdeo Ru Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrcal and Computer Engneerng Unversty of Calforna, Santa
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationCPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction
Chapter 2 Desgn for Testablty Dr Rhonda Kay Gaede UAH 2 Introducton Dffcultes n and the states of sequental crcuts led to provdng drect access for storage elements, whereby selected storage elements are
More informationGateway Algorithm for Fair Bandwidth Sharing
Algorm for Far Bandwd Sharng We Y, Rupnder Makkar, Ioanns Lambadars Department of System and Computer Engneerng Carleton Unversty 5 Colonel By Dr., Ottawa, ON KS 5B6, Canada {wy, rup, oanns}@sce.carleton.ca
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More informationSample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam
Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans
More informationMATHEMATICS FORM ONE SCHEME OF WORK 2004
MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationMulticore from an Application s Perspective. Erik Hagersten Uppsala Universitet
Multicore from an Application s Perspective Erik Hagersten Uppsala Universitet Communication in an SMP A: B: Shared Memory $ $ $ Thread Thread Thread Read A Read A Read A... Read A Write A Read B Read
More informationIntro. Iterators. 1. Access
Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy
More informationGreedy Technique - Definition
Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:
More informationCHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION
24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and
More informationCACHE MEMORY DESIGN FOR INTERNET PROCESSORS
CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET
More informationA DATA ANALYSIS CODE FOR MCNP MESH AND STANDARD TALLIES
Supercomputng n uclear Applcatons (M&C + SA 007) Monterey, Calforna, Aprl 15-19, 007, on CD-ROM, Amercan uclear Socety, LaGrange Par, IL (007) A DATA AALYSIS CODE FOR MCP MESH AD STADARD TALLIES Kenneth
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationUppsala University, Sweden
Main memory characteristics Memory Technology Erik Hagersten Uppsala University, Sweden eh@it.uu.se Performance of main memory (from 3 rd Ed faster today) Access time: time between address is latched and
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationThe AVL Balance Condition. CSE 326: Data Structures. AVL Trees. The AVL Tree Data Structure. Is this an AVL Tree? Height of an AVL Tree
CSE : Data Structures AL Trees Neva Cernavsy Summer Te AL Balance Condton AL balance property: Left and rgt subtrees of every node ave egts dfferng by at most Ensures small dept ll prove ts by sowng tat
More informationIsosurface Extraction in Time-varying Fields Using a Temporal Hierarchical Index Tree
Isosurface Extracton n Tme-varyng Felds Usng a Temporal Herarchcal Index Tree Han-We Shen MRJ Technology Solutons / NASA Ames Research Center Abstract Many hgh-performance sosurface extracton algorthms
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationAPPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT
3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ
More informationAdvanced Computer Networks
Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans
More informationADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Page Table Structures Page table structures Page table structures Problem: smple lnear table s too bg Problem:
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationReal-time interactive applications
Real-tme nteractve applcatons PC-2-PC phone PC-2-phone Dalpad Net2phone vdeoconference Webcams Now we look at a PC-2-PC Internet phone example n detal Internet phone over best-effort (1) Best effort packet
More informationApplication of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions
Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationLECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming
CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)
More informationReading. 14. Subdivision curves. Recommended:
eadng ecommended: Stollntz, Deose, and Salesn. Wavelets for Computer Graphcs: heory and Applcatons, 996, secton 6.-6., A.5. 4. Subdvson curves Note: there s an error n Stollntz, et al., secton A.5. Equaton
More informationBeautiful & practical
Tps for purchasng your ktchen The 2 sdes of a ktchen Beautful & practcal For long-lastng joy n your new ktchen Experence shows that a ktchen wll last about 15 years or longer. However, t stll has to prove
More informationFace Recognition University at Buffalo CSE666 Lecture Slides Resources:
Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural
More informationRandom Kernel Perceptron on ATTiny2313 Microcontroller
Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department
More informationAll-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University
Approxmate All-Pars shortest paths Approxmate dstance oracles Spanners and Emulators Ur Zwck Tel Avv Unversty Summer School on Shortest Paths (PATH05 DIKU, Unversty of Copenhagen All-Pars Shortest Paths
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationDESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT
DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,
More informationMIXED INTEGER-DISCRETE-CONTINUOUS OPTIMIZATION BY DIFFERENTIAL EVOLUTION Part 1: the optimization method
MIED INTEGER-DISCRETE-CONTINUOUS OPTIMIZATION BY DIFFERENTIAL EVOLUTION Part : the optmzaton method Joun Lampnen Unversty of Vaasa Department of Informaton Technology and Producton Economcs P. O. Box 700
More information12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification
Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero
More informationWhy visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information
Why vsualsaton? IRDS: Vsualzaton Charles Sutton Unversty of Ednburgh Goal : Have a data set that I want to understand. Ths s called exploratory data analyss. Today s lecture. Goal II: Want to dsplay data
More informationInter-protocol fairness between
Inter-protocol farness between TCP New Reno and TCP Westwood+ Nels Möller, Chad Barakat, Konstantn Avrachenkov, and Etan Altman KTH, School of Electrcal Engneerng SE- 44, Sweden Emal: nels@ee.kth.se INRIA
More informationShared Running Buffer Based Proxy Caching of Streaming Sessions
Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen, Bo Shen, Yong Yan, Sujoy Basu Moble and Meda Systems Laboratory HP Laboratores Palo Alto HPL-23-47 March th, 23* E-mal: sqchen@cs.wm.edu,
More informationRADIX-10 PARALLEL DECIMAL MULTIPLIER
RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com
More information