Parallel matrixvector multiplication


 Harold Cooper
 2 years ago
 Views:
Transcription
1 Appendx A Parallel matrxvector multplcaton The reduced transton matrx of the threedmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more than L = 12. Parallel machnes often have more memory than commonly used sequental machnes such as workstatons or PCs and ths memory can be used to solve larger problems. Our task s then to dstrbute the matrx over the processors, such that the problem can be solved as effcently as possble, hopefully also mprovng the performance by a factor close to the number of processors used. A.1 BSP A bulk synchronous parallel (BSP) program operates by alternatng between a phase where all processors smultaneously compute local results and a phase where they communcate wth each other. A superstep n a BSP algorthm conssts of a computaton phase followed by a communcaton phase. Before and after each communcaton phase a global synchronzaton s carred out. The BSPlb lbrary (for the programmng language C) [78, 79] conssts of only 20 prmtves and s based on onesded communcatons. Onesded communcatons, as opposed to twosded communcatons, cannot create deadlock stuatons. The communcaton mechansms bult nto the BSP lbrary are remote wrte, remote read and bulk synchronous message passng. In all three cases the remote processor s, at least conceptually, passve n the current superstep. The basc communcaton prmtves are summarzed below. Remote wrte: the processor that executes a put statement copes a block 69
2 70 APPENDIX A. PARALLEL MATRIXVECTOR MULTIPLICATION of memory to a remote memory address at the tme of the next synchronzaton. Remote read: the processor that executes a get statement copes a block of memory from a remote memory address at the tme of the next synchronzaton. Bulk synchronous message passng: the processor that executes a send statement sends a message, consstng of a tag and a payload part, to the buffer of a remote processor at the tme of the next synchronzaton. The messages can be read from the buffer by a move operaton after the next synchronzaton. The BSP cost model conssts of four parameters: the number of processors p, the speed of the processors s, the communcaton tme g and the synchronzaton tme l. The speed of the processors s measured as the number of floatng pont operatons per second. The communcaton tme s measured as the average tme taken to communcate a sngle word to a remote processor, when all the processors are smultaneously communcatng; the unt of tme s the tme per floatng pont operaton (flop). The synchronzaton tme s the amount of tme needed for all processors to synchronze, also measured n flop tme. As mentoned earler a BSP program s ether n a computng phase or n a communcaton phase. Ths makes predctng the performance of algorthms much easer than n the case of parallel programmng models where computaton and communcaton are nterleaved n a less structured fashon. The analyss of the cost of a superstep s relatvely smple. For each processor we count the number of flops w, the number of words sent to other processors h (s) and the number of words receved h (r). The tme taken by processor for computaton s w and for communcaton s h = Max(h (s), h (r) ). The cost of the superstep s Max (w ) + Max (h )g + l. Ths shows that optmally we should dvde the problem to be solved n equal parts, n the sense that the calculatons and communcatons are evenly dstrbuted over the avalable processors. Of course, we should also take care to reduce the total amount of communcaton. A.2 Matrx dstrbuton A good way to dstrbute an n n dense matrx over p = MN processors s a generalzed M N block/cyclc dstrbuton: the rows are dvded nto p row blocks of equal sze and the columns nto N column blocks of equal sze; then
3 A.2. MATRIX DISTRIBUTION Fgure A.1: M N generalzed block/cyclc dstrbuton for matrces on p = MN = 6 processors. The rows have a blockcyclc dstrbuton, wth p blocks whch are cyclcly numbered 0, 1,..., M 1, 0, 1,..., and the columns have a block dstrbuton, N blocks numbered 0, 1,... N 1. From left to rght: M = 6, N = 1; M = 3, N = 2; M = 2, N = 3 and M = 1, N = 6. the matrx elements a j are assgned to the processors as follows: φ 0 () = ( dv n p ) mod M, φ 1 (j) = j dv n N, a j P (φ 0 () + Mφ 1 (j)), (A.1) as shown n fgure A.1. The vector elements are best dstrbuted to the same processor as the dagonal of the matrx. Note that for each generalzed block/cyclc dstrbuton: all processors have an equally large part of the matrx; each column s dstrbuted over M processors; each row s dstrbuted over N processors; each processor has the same number of submatrces and each processor has the same number of dagonal elements. Ths scheme fts wthn the general Cartesan framework of the work of Bsselng and McColl [80]; t s smlar but not dentcal to the block/cyclc dstrbuton. The approach of Bsselng and McColl to the matrx vector product r = A x can be dvded nto four stages: fanout: the elements x j are communcated to the processors contanng the values a j ; local matrxvector multplcatons: the partal results u t = j a jx j are computed, wth the sum taken over only the local values of a j, whch all have the same t = φ 1 (j); fann: the partal results, u φ1 (j), of the processors are sent to the processor that possesses the correspondng element r ; summaton of the partal results: r = N 1 t=0 u t.
4 72 APPENDIX A. PARALLEL MATRIXVECTOR MULTIPLICATION If the matrx s dvded nto rows (whch s the specal case N = 1 for our generalzed block/cyclc dstrbuton), the fann and summaton of partal sums s avoded; ths saves some communcaton, but all processors then have to communcate wth all other processors n the fanout part. On the other hand, f the matrx s dvded nto columns (M = 1), then the fanout communcaton s avoded and the fann communcaton s an alltoall operaton. For the general M N dstrbuton, the fanout s an MtoM communcaton and the fann an NtoN communcaton. The communcaton then takes O((M + N) n p )g tme, nstead of O(M N n p )g. The communcaton s mnmal f M = N = p s used. For a sparse matrx, the algorthm s adapted to avod computatons and communcatons nvolvng zero elements: elements x j are only sent f the correspondng a j 0; partal sums are only computed usng products a j x j wth a j 0 and the partal sums are only sent and summed f they are nonzero. The next secton shows how advantage s taken of the specfc sparsty structure of the matrx. A.3 Explotng the sparsty structure In our problem, for L > 12, we cannot afford to store the complete matrx on a sngle processor, so we need to dstrbute t over a number of processors. The matrx we have to deal wth s sparse and we explot ths n our computatons, snce we only handle nonzero elements A j. The standard approach to communcate a subset of elements of a vector s to gather all elements and ther global ndces n separate arrays, and then sendng those arrays to the processors that need them. The overhead of repeatedly sendng the same arrays wth ndces may be removed by sendng them only the frst tme the matrx vector multplcaton s performed, but the overhead of repeatedly packng and unpackng the vector elements cannot be removed n general. Our transton matrx has a partcular structure wth patches wth many nonzero elements. We explot ths to make communcatons faster by sendng contguous subvectors, avodng the packng and unpackng overhead. Consder a rectangular patch (.e., a contguous submatrx). A value x j must be sent to the owner of the patch f an element A j n column j of the patch s nonzero. It s lkely that most columns of the patch have at least one nonzero, so we mght as well send all x j for that patch. Ths makes t possble to send a contguous subvector of x, whch s more effcent than sendng separate components; ths comes at the expense of a few unnecessary communcatons. The tradeoff can be shfted by ncreasng or decreasng the patch sze. To fnd sutable patches, we frst dvde the state vector nto contguous
5 A.3. EXPLOITING THE SPARSITY STRUCTURE 73 Fgure A.2: Reduced transton matrx for polymer length L = 5. The sze of the matrx s and t has 233 nonzero elements, shown as black squares. To the left of each row s the correspondng knk representaton wrtten as a bnary number, wth black crcles denotng 1 and open ones 0. The horzontal lnes on the left show the ntal dvson of the reduced state vector nto eght contguous parts, optmzed to balance the number of nonzeros n the correspondng matrx rows. The jumps of these lnes ndcate slght adjustments to make the dvson ft the nonzero structure of the matrx. The resultng vector dvson nduces a dvson of the rows and columns of the matrx, and hence a parttonng nto 64 submatrces, shown by the gray checkerboard pattern. Complete submatrces are now assgned to the processors of a parallel computer. subvectors. We use a heurstc to partton the matrx nto blocks of rows wth approxmately the same number of nonzeros. If we use P processors, and we want each processor to have K subvectors, we have to dvde the vector nto KP subvectors. (The factor K s the overparttonng factor.) Ths ntal dvson tres to mnmze the computaton tme. Next, we adjust the dvsons to reduce communcaton: a sutable patch n the matrx corresponds to an nput subvector of knk representatons where only the last few bts dffer, and also to an output subvector wth that property. Therefore, we search for a par of adjacent knk representatons that has a dfferent bt as much as possble to the left. Ths s a sutable place to splt. We try to keep the dstance from the startng pont as small as possble. As an example of the structure of the reduced transton matrces and the dvson nto submatrces, we show the nonzero structure of the matrx for L = 5 n fgure A.2 and ts correspondng communcaton matrx n fgure A.3 (left). The communcaton matrx s bult from the parttoned transton matrx, by consderng each submatrx as a sngle element. It s a sparse matrx of much
6 74 APPENDIX A. PARALLEL MATRIXVECTOR MULTIPLICATION Fgure A.3: Communcaton matrx for L = 5 (left) and L = 13 (rght). Note that the matrx for L = 5 can be obtaned by replacng each nonempty submatrx n Fg. A.2 by a sngle nonzero element. The communcaton matrx for L = 13, of sze , s dstrbuted over 16 processors n a row dstrbuton. smaller sze whch determnes the communcaton requrements. Our communcaton matrx for L = 13 s gven n fgure A.3 (rght). A.4 Tmngs Our computatons were performed on a Cray T3E computer. The peak performance of a sngle node of the Cray T3E s 600 Mflop/s for computatons. The bsp probe benchmark shows a performance of 47 Mflop/s per node [78]. The peak nterprocessor bandwdth s 500 Mbyte/s (bdrectonal). The bsp probe benchmark shows a sustaned bdrectonal performance of 94 Mbyte/s per processor when all 64 processors communcate at the same tme. Ths s equvalent to a BSP parameter g = 3.8, where g s the cost n flop tme unts of one 64bt word leavng or enterng a processor. The measured global synchronzaton tme for 64 processors s 48 µs, whch s equvalent to l = flop tme unts. Table A.1 presents the executon tme of one teraton of the algorthm n two forms: the BSP cost a + bg + cl counts the flops and the communcatons and thus gves the tme on an arbtrary computer wth BSP parameters g and l, whereas the tme n mllseconds gves the measured tme on ths partcular archtecture, splt nto computaton and communcaton tme. (The total mea
7 A.4. TIMINGS 75 L P BSP cost tme (ms) effcency speedup g + 2l % g + 2l % g + 2l % g + 2l % 42.9 Table A.1: BSP cost, tme, effcency, and speedup for one matrxvector multplcaton. sured synchronzaton tme s neglgble.) The BSP cost can be used to predct the run tme of our algorthm on dfferent archtectures. Table A.1 also gves the effcency and speedup relatve to a sequental program. Peak computaton performance s often only reached for dense matrxmatrx multplcaton; the performance for sparse matrxvector multplcaton s always much lower. Comparng the flop count and the measured computaton tme for the largest problem L = 15, we see that we acheve about 10.5 Mflop/s per processor. Comparng the communcaton count wth the measured communcaton tme, we obtan a gvalue of 8.1 µs, (or g = 3.8 flop unts; see above). Ths means that we attan the maxmum sustanable communcaton speed. Ths s due to the desgn of our algorthm, whch communcates contguous subvectors nstead of sngle components. Furthermore, the results show that our choce to optmze manly the computaton (by choosng a row dstrbuton) s justfed for ths archtecture: the communcaton tme s always less than a thrd of the total tme. For a dfferent machne, wth a hgher value of g, more emphass must be placed on optmzng the communcaton, leadng to a twodmensonal dstrbuton. Each teraton of our computaton contans one matrxvector multplcaton. The number of teratons needed for convergence depends on the length of the polymer, and on the appled electrc feld. The teraton was stopped when ether the accuracy was better than 10 10, or the number of teratons exceeded In the latter case, the accuracy was computed at termnaton. Typcally, for L = 15 and a low electrc feld strength, teratons are needed, takng about 6 hours per data pont. Only computed values wth accuracy 10 4 or better are shown n fgure 5.3. For L = 12, we compared the output for the parallel program wth that of the sequental program and found the dfference to be wthn roundng errors. The total speedup for L = 15, compared to a nave mplementaton (for whch one would need 38.5 Tbyte of memory), s a factor : a factor of by usng a reduced state space, a factor of 2 by shftng the egenvalues of the reduced transton matrx, and a factor 42.9 by usng a parallel program on 64 processors.
Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationParallelism for Nested Loops with Nonuniform and Flow Dependences
Parallelsm for Nested Loops wth Nonunform and Flow Dependences SamJn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseodong, Cheonan, Chungnam, 33080, Korea. seong@cheonan.ac.kr
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationAMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton GaussSedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationA SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES
A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationFor instance, ; the five basic numbersets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationExercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005
Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed
More informationCHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar
CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationAn Application of the DulmageMendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 25492554 An Applcaton of the DulmageMendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationA Fast ContentBased Multimedia Retrieval Technique Using Compressed Data
A Fast ContentBased Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationOutline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011
9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and multobjectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationAnalysis of Continuous Beams in General
Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,
More informationRelatedMode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 RelatedMode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationGraphbased Clustering
Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of BozenBolzano Faculty of Computer Scence Academc Year 01101 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationContent Based Image Retrieval Using 2D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSRJECE) eissn: 78834,p ISSN: 788735.Volume 9, Issue, Ver. IV (Mar  Apr. 04), PP 007 Content Based Image Retreval Usng D Dscrete Wavelet wth
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAXSAT wth weghts, such that the
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationComplex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.
Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal
More informationAP PHYSICS B 2008 SCORING GUIDELINES
AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the freeresponse questons and the allocaton of ponts for
More informationLoad Balancing for HexCell Interconnection Network
Int. J. Communcatons, Network and System Scences,,,  Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for HexCell Interconnecton Network Saher Manaseer,
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationCHAPTER 2 DECOMPOSITION OF GRAPHS
CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng
More informationChapter 1. Introduction
Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal
More informationAPPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT
3.  5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 8893 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationRADIX10 PARALLEL DECIMAL MULTIPLIER
RADIX10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda Emal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro GómezMoreno, Saturnno MaldonadoBascón, Francsco LópezFerreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. MadrdBarcelona
More informationAssembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.
IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationLecture 3: Computer Arithmetic: Multiplication and Division
8447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3 Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754985 Handouts:
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationSorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions
Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) Inplace Merge Sort T(n) = Θ(n lg(n)) Not nplace Selecton Sort (from homework) T(n) = Θ(n 2 ) Inplace
More informationVery simple computational domains can be discretized using boundaryfitted structured meshes (also called grids)
Structured meshes Very smple computatonal domans can be dscretzed usng boundaryftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes
More informationCircuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)
Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,
More informationON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE
Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton
More informationA OneSided Jacobi Algorithm for the Symmetric Eigenvalue Problem
PQ A OneSded Jacob Algorthm for the Symmetrc Egenvalue Problem B. B. Zhou, R. P. Brent Emal: bng,rpb@cslab.anu.edu.au Computer Scences Laboratory The Australan Natonal Unversty Canberra, ACT 000, Australa
More informationFloatingPoint Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier
FloatngPont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationKinematics of pantograph masts
Abstract Spacecraft Mechansms Group, ISRO Satellte Centre, Arport Road, Bangalore 560 07, Emal:bpn@sac.ernet.n Flght Dynamcs Dvson, ISRO Satellte Centre, Arport Road, Bangalore 560 07 Emal:pandyan@sac.ernet.n
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationOn Some Entertaining Applications of the Concept of Set in Computer Science Course
On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,
More informationCHAPTER 10: ALGORITHM DESIGN TECHNIQUES
CHAPTER 10: ALGORITHM DESIGN TECHNIQUES So far, we have been concerned wth the effcent mplementaton of algorthms. We have seen that when an algorthm s gven, the actual data structures need not be specfed.
More informationLoadBalanced Anycast Routing
LoadBalanced Anycast Routng ChngYu Ln, JungHua Lo, and SyYen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For faulttolerance and loadbalance
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationTPLAware Displacementdriven Detailed Placement Refinement with Coloring Constraints
TPLware Dsplacementdrven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More information[33]. As we have seen there are different algorithms for compressing the speech. The
49 5. LDCELP SPEECH CODER 5.1 INTRODUCTION Speech compresson s one of the mportant doman n dgtal communcaton [33]. As we have seen there are dfferent algorthms for compressng the speech. The mportant
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15
CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc
More informationy and the total sum of
Lnear regresson Testng for nonlnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono SantaRosa 1 Federal Polce Department, Brasla, Brazl. Emals: quntlano.pqs@dpf.gov.br and
More informationRange images. Range image registration. Examples of sampling patterns. Range images and range surfaces
Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationLECTURE : MANIFOLD LEARNING
LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are realvalued vectors
More informationRepeater Insertion for TwoTerminal Nets in ThreeDimensional Integrated Circuits
Repeater Inserton for TwoTermnal Nets n ThreeDmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI  EPFL, CH5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn emal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of coregulated and functonally
More informationMotivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:
4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://etellgentnternetmarketng.com/webste/frustratedcomputeruser2/
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley ChunHung
More informationFast Computation of Shortest Path for Visiting Segments in the Plane
Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 49 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang
More informationHarvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)
Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst
More informationPreconditioning Parallel Sparse Iterative Solvers for Circuit Simulation
Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationOverview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION
Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationRandom Kernel Perceptron on ATTiny2313 Microcontroller
Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department
More informationRealtime Motion Capture System Using One Video Camera Based on Color and Edge Distribution
Realtme Moton Capture System Usng One Vdeo Camera Based on Color and Edge Dstrbuton YOSHIAKI AKAZAWA, YOSHIHIRO OKADA, AND KOICHI NIIJIMA Graduate School of Informaton Scence and Electrcal Engneerng,
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationDESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT
DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,
More informationCHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION
48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stockmarket
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and ZhHua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationSelftuning Histograms: Building Histograms Without Looking at Data
Selftunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn  Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8, 7 ISSN 5966 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. AlRababah and Mohammad A. AlRababah Faculty of IT, AlAhlyyah Amman Unversty,
More informationConvolutional interleaver for unequal error protection of turbo codes
Convolutonal nterleaver for unequal error protecton of turbo codes Sna Vaf, Tadeusz Wysock, Ian Burnett Unversty of Wollongong, SW 2522, Australa Emal:{sv39,wysock,an_burnett}@uow.edu.au Abstract: Ths
More informationHeterogeneous Parallel Computing: from Clusters of Workstations to Hierarchical Hybrid Platforms
Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal Hybrd Platforms A.L. Lastovetsky 1 DOI: 10.14529/jsf140304 c The Author 2014. Ths paper s publshed wth open access at SuperFr.org
More informationTHE lowdensity paritycheck (LDPC) code is getting
Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space lowdensty partycheck (LDPC) codes
More informationComparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments
Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,
More informationCommunicationMinimal Partitioning and Data Alignment for Af"ne Nested Loops
CommuncatonMnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUKJAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal
More informationOn a RegistrationBased Approach to Sensor Network Localization
1 On a RegstratonBased Approach to Sensor Network Localzaton R. Sanyal, M. Jaswal, and K. N. Chaudhury arxv:177.2866v2 [math.oc] 9 Nov 217 Abstract We consder a regstratonbased approach for localzng
More informationLecture #15 Lecture Notes
Lecture #15 Lecture Notes The ocean water column s very much a 3D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal
More informationGSLM Operations Research II Fall 13/14
GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are
More informationOutline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014
Mdterm Revew March 4, 4 Mdterm Revew Larry Caretto Mechancal Engneerng 9 Numercal Analyss of Engneerng Systems March 4, 4 Outlne VBA and MATLAB codng Varable types Control structures (Loopng and Choce)
More information