arxiv: v3 [cs.na] 18 Mar 2015

Size: px
Start display at page:

Download "arxiv: v3 [cs.na] 18 Mar 2015"

Transcription

1 A Fast Block Low-Rank Dense Solver wth Applcatons to Fnte-Element Matrces AmrHossen Amnfar a,1,, Svaram Ambkasaran b,, Erc Darve c,1 a 496 Lomta Mall, Room 14, Stanford, CA, 9435 b Warren Weaver Hall, Room-115A, 51, Mercer Street, New York, NY 11 c 496 Lomta Mall, Room 9, Stanford, CA, 9435 arxv: v3 [cs.na] 18 Mar 15 Abstract Ths artcle presents a fast solver for the dense frontal matrces that arse from the multfrontal sparse elmnaton process of 3D ellptc PDEs. The solver reles on the fact that these matrces can be effcently represented as a herarchcally off-dagonal low-rank (HODLR) matrx. To construct the low-rank approxmaton of the off-dagonal blocks, we propose a new pseudo-skeleton scheme, the boundary dstance low-rank approxmaton, that pcks rows and columns based on the locaton of ther correspondng vertces n the sparse matrx graph. We compare ths new low-rank approxmaton method to the adaptve cross approxmaton (ACA) algorthm and show that t acheves betters speedup specally for unstructured meshes. Usng the HODLR drect solver as a precondtoner (wth a low tolerance) to the GMRES teratve scheme, we can reach machne accuracy much faster than a conventonal LU solver. Numercal benchmarks are provded for frontal matrces arsng from 3D fnte element problems correspondng to a wde range of applcatons. Keywords: Fast drect solvers, Iteratve solvers, Numercal lnear algebra, Herarchcally off-dagonal low-rank matrces, multfrontal elmnaton, Adaptve cross approxmaton. 1. Introducton In many engneerng applcatons, solvng large fnte element systems s of great sgnfcance. Consder the system Ax = b arsng from the fnte element dscretzaton of an ellptc PDE, where A R N N s a sparse matrx wth a symmetrc pattern. In many practcal applcatons, the matrx A mght be ll-condtoned and thus, challengng for teratve methods. On the other hand, conventonal drect solver algorthms, whle beng robust n handlng ll-condtoned matrces, are computatonally expensve (O(N 1.5 ) for D meshes and O(N ) for 3D meshes). Snce one of the man bottlenecks n the drect multfrontal solve process s the hgh computatonal cost of solvng dense frontal matrces, we manly focus on solvng these matrces n ths artcle. Our goal s to buld an teratve solver, whch utlzes a fast drect solver as a precondtoner for the dense frontal matrces. The drect solver n ths scheme acts as a hghly accurate Correspondng author Emal address: amnfar@stanford.edu (AmrHossen Amnfar) 1 Mechancal Engneerng Department, Stanford Unversty Courant Insttute of Mathematcal Scences, New York Unversty Preprnt submtted to Journal of Computatonal Physcs March, 15

2 pre-condtoner. Ths approach combnes the advantages of the teratve and drect solve algorthms,.e., t s fast, accurate and robust n handlng ll-condtoned matrces. To be consstent wth our prevous work, we adopt the notatons used n [3]. We should also menton that n refers to the sze of dense matrces and N refers to the sze of sparse matrces (e.g., number of degrees of freedom n a fnte-element mesh). In the next secton, we revew the prevous lterature on both dense structured solvers and sparse multfrontal solvers. We then ntroduce a herarchcal off-dagonal low-rank (from now on abbrevated as HODLR) drect solver n Secton 4. In Secton 5, we ntroduce the boundary dstance low-rank (BDLR) algorthm as a robust low-rank approxmaton scheme for representng the off-dagonal blocks of the frontal matrces. Secton 6 dscusses the applcaton of the teratve solver wth a fast HODLR drect solver precondtoner to the sparse multfrontal solve process and demonstrates the solver for a varety of 3D meshes. We also show an applcaton n combnaton wth the FETI-DP method [], whch s a famly of doman decomposton algorthms to accelerate fnte-element analyss on parallel computers. We present the results and numercal benchmarks n Secton 7.. Prevous Work.1. Fast Drect Solvers for Dense Herarchcal Matrces Herarchcal matrces are data sparse representaton of a certan class of dense matrces. Ths representaton reles on the fact that these matrces can be sub-dvded nto a herarchy of smaller block matrces, and certan sub-blocks (based on the admssblty crteron) can be effcently represented as a low-rank matrx. We refer the readers to [7, 31, 6, 8, 11, 14, 1] for more detals. These matrces were ntroduced n the context of ntegral equatons [7, 31, 6, 41] arsng out of ellptc partal dfferental equatons and potental theory. Subsequently, t has also been observed that dense fll-ns n fnte element matrces [58], radal bass functon nterpolaton [3], kernel densty estmaton n machne learnng, covarance structure n statstc models [15], Bayesan nverson [3, 5, 6], Kalman flterng [43], and Gaussan processes [4], can also be effcently represented as data-sparse herarchcal matrces. Broadly speakng, these matrces can be grouped nto two general categores based on the admssblty crteron: () Strong admssblty: subblock that correspond to the nteracton between well-separated clusters are low-rank; () Weak admssblty: sub-block correspondng to non-overlappng nteractons are low-rank. Ambkasaran [1] provdes a detaled descrpton of these dfferent herarchcal structures. We revew some of the prevously developed structured dense solvers for herarchcal matrces and dscuss them n relaton to our work. Hackbusch [7, 6] ntroduced the concept of H-matrces, whch are the most general class of herarchcal matrces wth the strong admssblty crteron [7, 6, 8, 3, 9, 31, 3, 8, 9, 11]. Contrary to the HODLR matrx structure, n whch the off-dagonal blocks are low-rank, n H-matrces, the offdagonal blocks are further decomposed nto low-rank and full-rank blocks. Thus, the rank can be kept small. In HODLR, we make a sngle low-rank approxmaton for the off-dagonal blocks and the rank s larger as a result. Hence, the HODLR structure makes for a much smpler representaton and s often used because of ts smplcty compared to the H-matrx structure. Hackbusch [6] suggests a recursve block low-rank factorzaton scheme for H- matrces. Ths method s based on the dea that all the dense matrx algebra (matrx multplcaton and matrx addton) can be replaced by H-matrx algebra. As a result, the nverse of an H-matrx can also be approxmated as an H-matrx tself. Ths results n a computatonal complexty of O(n log (n)) for an H-matrx factorzaton.

3 We note that the approach n ths paper s based on the Woodbury matrx dentty. It s therefore dfferent from the algorthm n Hackbusch [6] for example. The latter s based on a block LU factorzaton, whle the Woodbury dentty reduces the global solve to block dagonal solves followed by a correcton update. The HODLR matrx structure s the most general off-dagonal low-rank structure wth weak admssblty. Solvers for ths matrx class have a computatonal cost of O(n log n). In an HODLR matrx, the off-dagonal low-rank bases do not have a nested structure across dfferent levels [3]. The HSS matrx s an HODLR matrx but, n addton, has a nested offdagonal low-rank structure. Solvers for the HSS matrces have an O(n) complexty [59, 13]. Martnsson and Rokhln [49] dscuss an O(n) drect solver for boundary ntegral equatons based on the HSS structure. Ther method s based on the fact that for a matrx of rank r, there exsts a well-condtoned column operaton, whch leaves r columns unchanged and sets the remanng columns to zero. Usng ths dea, they derve a two-sded compressed factorzaton of the nverse of the HSS matrx. Ther generc algorthm requres O(n ) operatons to construct the nverse. However, they accelerate ther algorthm to O(n log κ (n)) when appled to two-dmensonal contour ntegral equatons. Chandrasekaran et al. [14] present a fast O(n) drect solver for HSS matrces. In ther artcle, they construct an mplct ULV H factorzaton of an HSS matrx, where U and V are untary matrces, L s a lower trangular matrx and H s the transpose conjugate operator. Ther method s based on the Woodbury matrx dentty and the fact that for a low-rank representaton of the form UBV H, where U and V are thn matrces wth r columns, there exsts a untary transformaton Q, n whch only the last r rows of QU are nonzero. They use ths observaton to recursvely solve the lnear system of equatons. Snce ths method requres constructng an HSS tree, the authors suggest an algorthm that uses the SVD or the rank revealng QR decomposton, recursvely, to construct the HSS tree n O(n ) tme. Gllman et al. [4] dscuss an O(n) algorthm for drectly solvng ntegral equatons n one-dmensonal domans. The algorthm reles on applyng the Sherman-Morrson- Woodbury formula (see for example [3]) recursvely to an HSS tree structure to acheve O(r n) solve complexty, where r s the rank of the off dagonal blocks n the HSS matrx. They also descrbe an O(r n) algorthm for constructng an HSS representaton of the matrx resultng from a Nyström dscretzaton of a boundary ntegral equaton. Ho and Greengard [36] present a fast drect solver for HSS matrces. They use the nterpolatve decomposton (ID) (see for example [16]) algorthm wth random samplng to obtan the low-rank representatons of the off-dagonal blocks. The computatonal complexty of the low-rank approxmaton algorthm s O(mn log r + r n) for a matrx K R m n. After obtanng the herarchcal matrx representaton of the orgnal dense matrx, new varables and equatons are ntroduced nto the system of equatons. Fnally, all equatons are assembled nto an extended sparse matrx and a conventonal sparse solver s used to factorze the sparse matrx. Ths method has a computatonal complexty of O(n) for both the pre-computaton and soluton phases for boundary ntegral equatons n D, whle n 3D, these phases cost O(n 1.5 ) and O(n log(n)) respectvely. Kong et al. [4] have developed an O(n ) dense solver for HODLR matrces. Smlar to [49], they accelerate ther algorthm to O(n log (n)), when appled to boundary ntegral equatons. Ther method uses the Sherman-Morrson-Woodbury formula to construct a one-sded herarchcal factorzaton of the nverse of these matrces, n whch each factor s a block dagonal matrx. The low-rank approxmaton scheme n ther paper s based on the rank revealng QR algorthm. The authors use the pvoted Gram-Schmdt algorthm to obtan r orthogonal bass vectors for the low-rank matrx n queston. For a matrx 3

4 K R m n wth rank r, ths low-rank approxmaton scheme requres O(mnr) operatons. Then, they use a randomzed algorthm from [55] to accelerate ther low-rank approxmaton scheme. Ths accelerated low-rank approxmaton algorthm costs O(mn log(l +lnr)) n the general case where r < l < mn(m, n). Ambkasaran and Darve [3] present an O(n log (n)) solver for HODLR matrces and an O(n log(n)) solver for p-hss matrces. Ths approach dffers from the approach mentoned n [4] n the fact that, whle [4] constructs the nverse, [3] constructs a factorzaton of the matrx. Each factor n ths factorzaton scheme s a block dagonal matrx wth each block beng a low-rank perturbaton of the dentty matrx. The authors then use the Sherman- Morrson-Woodbury formula to nvert each block n the block dagonal factors. The artcle uses the Chebyshev low-rank approxmaton scheme to factorze the off-dagonal blocks. As mentoned above, solvers for the HSS matrx structure have the lowest computatonal complexty O(r n), r beng the rank of approxmaton among other herarchcally offdagonal low-rank matrx structures. Whle the HSS structure s attractve, the nested structure makes t more complcated and more dffcult to work wth, compared to the smpler HODLR structure. Furthermore, the off-dagonal rank ncreases from root to leaves n the HSS tree, whereas the off-dagonal ranks at each level are ndependent from each other n the HODLR structure. Ths often leads to lower average off-dagonal rank n the HODLR structure compared to HSS. A pont worth mentonng s that the solver dscussed n the current artcle reles on purely algebrac technque (nstead of analytc or geometry based technques) to construct the low-rank approxmaton of the off-dagonal blocks. Analytc low-rank approxmaton technques lke the Chebyshev low-rank approxmaton, multpole expansons, etc., are only applcable when the matrx defnton nvolves an analytcal kernel functon. In ths artcle, we propose a boundary dstance low-rank approxmaton (from now on abbrevated as BDLR), whch reles on the underlyng sparse matrx graph to choose the desred rows and columns n constructng a low-rank representaton. We also compare wth the adaptve cross approxmaton algorthm [51] (from now on abbrevated as ACA), whch s also a purely algebrac scheme to construct low-rank approxmatons of the off-dagonal blocks. Due to ts black-box nature, the solver can handle a wde range of dense matrces arsng from boundary ntegral equatons, covarance matrces n statstcs, frontal matrces arsng n the context of fnte-element matrces, etc. Table 1 summarzes the dense solver algorthms mentoned above... Fast Drect Solvers for Sparse Matrces As mentoned n Secton.1, we are nterested n acceleratng the drect solve process for fnte-element matrces. In ths artcle, we focus on the fnte-element dscretzaton of ellptc PDEs. One common way of factorzng such matrces s usng the sparse Cholesky factorzaton. The effcency of ths algorthm strongly depends on the orderng of mesh nodes [54]. Sparse Cholesky factorzaton takes O(N ) flops n D wth a typcal row-wse or column-wse mesh orderng, where N s the number of degrees of freedom [58]. The most effcent method for solvng such matrces s the multfrontal method wth nested dssecton [], whch takes O(N 1.5 ) flops for two-dmensonal and O(N ) for three dmensonal meshes [54]. The multfrontal method was orgnally ntroduced by Duff & Red [19], George [] and Lu [45], as an extenson to the frontal method of Irons [38]. In ths algorthm, the overall factorzaton s done by factorzng smaller dense frontal matrces [44]. For each node or super-node n the elmnaton tree, the frontal matrx s obtaned usng an update process 4

5 Artcle Matrx Class Factorzaton Applcaton Hackbusch [7, 6] H Recursve block factorzaton of the matrx Martnsson Rokhln [49] Chandrasekaran et al. [14] and HSS Two sded compressed factorzaton of the nverse HSS ULV H factorzaton of the matrx Gllman et al. [4] HSS Data sparse factorzaton of the nverse Ho and Greengard [36] HSS Factorzaton of the extended sparse system Kong et al. [4] HODLR One sded herarchcal factorzaton of the nverse Ambkasaran and Darve [3] HODLR, p-hss Block-dagonal of the matrx factorzaton Ths artcle HODLR Recursve block LU factorzaton of the matrx Table 1: Summary of fast dense structured solvers. BEM ntegral operators D boundary ntegral equatons Radal bass functon matrces 1D ntegral equatons wth Nyström dscretzaton D and 3D boundary ntegral equatons Boundary ntegral equatons Interpolaton usng radal bass functons Fnte-element matrces called the extend-add process, whch nvolves updates from the prevously elmnated nodes. Martnsson [47] uses a spral elmnaton approach along wth HSS compresson of Schur complements to acheve O(N log N) tme complexty. Ths approach s not based on the multfrontal method and requres a mesh that can be parttoned nto concentrc annul. Gllman et al. [3] proposed an accelerated nested dssecton algorthm for obtanng the Drchlet-to-Neumann operator assocated wth a D ellptc boundary value problem. The authors approxmate the Schur complements that appear n the elmnaton process as herarchcally block separable (HBS) matrces, a structure smlar to HSS matrces. Usng ths matrx structure, they are able to obtan the Drchlet-to-Neumann operator wth a cost of O(N) compared to O(N 1.5 ) of the conventonal multfrontal method wth nested dssecton. There have been some recent efforts to reduce the computatonal cost of the multfrontal method wth nested dssecton. Xa et al. [58] observed that frontal and update matrces n the multfrontal elmnaton process can be approxmated wth herarchcally sem-separable (HSS) matrces. The authors develop a structured extend-add process to facltate the formaton of the frontal matrces usng the HSS structure. Next, they perform a structured dense Cholesky factorzaton on the obtaned frontal matrx. The authors use the algorthm n [13] to compute the explct factorzatons of HSS matrces. Usng ths procedure, they are able to acheve nearly lnear tme complexty for D meshes. However, only regular well shaped meshes n D are consdered n the artcle. Schmtz et al. [54] extend the approach of [58] to a more general settng of unstructured and adaptve grds n D. Xa [56] ntroduced an effcent multfrontal factorzaton for general sparse matrces. The author approxmates the frontal matrces usng the HSS structure and ntroduces the concept of reduced HSS matrces that reduce the computatonal cost of operaton on HSS matrces. For smplcty, ths approach keeps the update matrces as dense matrces whch leads to hgh memory consumpton for large sparse matrces. Xa [57] ntroduces a new 5

6 algorthm that overcomes ths defcency by randomzaton. That s, nstead of passng dense update matrces along the elmnaton tree, ths approach passes a sknny randomzed matrx vector product. In addton to savng memory, ths approach only requres sknny extend adds (extend adds on all rows and only a subset of columns) whch leads to mprovements n effcency. Ths method s based on the work of Martnsson [48] whch provdes an algorthm for constructng HSS matrces usng randomzed matrx vector products. Amestoy et al. [7] ntroduce a new low-rank matrx format called the Block Low-Rank (BLR) structure, a flat, non-herarchcal block matrx structure, for representng frontal matrces obtaned n the multfrontal elmnaton process. The authors show that BLR s a good alternatve to herarchcal structures lke H and HSS matrces n terms of storage costs, flop count and parallelzaton for representng frontal matrces. The artcle demonstrates that the BLR format reduces the flop count and storage requrements for factorzng frontal matrces arsng from a varety of large matrces comng from dfferent physcs applcatons. However, there s no dscusson of the extend-add operatons for BLR matrces. Furthermore, the artcle does not demonstrate a full multfrontal solver based on the BLR frontal matrx representaton. The approach presented here s based on the multfrontal method [44]. It does not requre constructng and mantanng HSS trees and can be appled to any mesh structure. Our method s based on the observaton that the frontal matrces obtaned durng the multfrontal elmnaton process have an HODLR structure. Ths observaton was also made by [58]. In order to factorze (elmnate) these frontal matrces, we present a dense HODLR structured solver. If the rank r s O(1) (that s functon of ɛ only), the algorthm has a computatonal cost of O(r n log n) for an n n frontal matrx. When solvng 3D PDEs, we typcally have that r O(n 1/ ). In that case, the computatonal cost s O(r n), where r s the largest rank found, at the top of the tree. Ths cost s, n fact, slghtly favorable compared to what s reported for HSS n [57] (see Table 4.3, p. 19), at least asymptotcally for n. The log n factor dsappears because the rank s bounded by a geometrc seres assocated wth the rank. We wll benchmark the structured elmnaton (solve) process for frontal matrces correspondng to separators at varous levels of the sparse elmnaton tree, for many dfferent types of sparse matrces. It s worth mentonng that contrary to prevous works whch have manly benchmarked matrces n the Unversty of Florda Sparse Matrx Collectons [17], we focus on frontal matrces arsng from large and complcated mesh structures. These matrces are often very ll-condtoned and cannot be solved usng tradtonal teratve technques lke GMRES [5] wth dagonal precondtonng. Our benchmarks show that obtanng a good precondtoner for unstructured meshes s sgnfcantly harder compared to structured meshes. Furthermore, solvng 3D problems s an order of magntude more dffcult than D problems as the off-dagonal rank s sgnfcantly hgher n 3D. Hence, ths artcle manly focuses on 3D meshes. Table shows a summary of varous fast sparse matrx solvers n the lterature. 3. An Iteratve Solver wth Drect Solver Precondtonng In ths paper, we nvestgate usng a fast HODLR drect solver as a precondtoner to the GMRES [5] teratve scheme. In ths case, we use a relatvely low accuracy for the drect solver. 6

7 Artcle Methodology Test Cases & Applcaton Martnsson [47] HSS compresson and spral elmnaton Meshes that can be parttoned nto concentrc annul Gllman et al. [3] Approxmatng Schur complements as HBS matrces and usng HBS algebra. Xa et al. [58] HSS approxmaton of frontal matrces and structured extend-add Schmtz et al. [54] Modfed [58] to accommodate adaptve and unstructured grds Xa [56] Introducton of reduced HSS matrces that reduce the operaton cost on HSS matrces. For smplfcatons, the update matrces are kept as dense matrces. Xa [57] HSS compresson usng randomzaton technques n [48]. Passng randomzed matrx vector products nstead of dense update matrces and performng sknny extend-add operatons. Amestoy et al. [7] BLR format for representng frontal matrces. No dscusson of BLR extend-add process. D ellptc boundary value problems dscretzed usng a 5 pont stencl on a regular square grd. D structured meshes D adaptve and unstructured meshes that roughly follow the pattern of a regular mesh Helmholtz Equaton n D and Unversty of Florda Sparse Matrx Collectons [17] Helmholtz Equaton n D and Unversty of Florda Sparse Matrx Collectons [17] Large matrces comng from dfferent physcs applcatons Table : Summary of fast sparse drect solvers. We wll show that ths approach s much faster than both a conventonal LU solver and a hgh accuracy drect HODLR solver. We should also menton that ths precondtonng method can be appled to any teratve solver (conjugate gradent (CG) [35], etc..). 4. A Fast Drect Solver for HODLR Matrces One bottleneck of sparse solvers s the factorzaton of the dense frontal matrces that appear durng the multfrontal elmnaton process. To accelerate the factorzaton of dense frontal matrces, we approxmate them as HODLR matrces. As mentoned n Secton.1, HODLR matrces can be factorzed n O(n log n) whch s a sgnfcant mprovement over conventonal dense factorzatons whch typcally scale as O(n 3 ) HODLR Matrces A HODLR matrx has low-rank off-dagonal blocks at multple levels. As descrbed n [3], a -level HODLR matrx, K R n n, can be wrtten as shown n Equaton (): [ ] K = = K (1) 1 U (1) 1 U (1) [ K () 1 U () 1 U () (V (1) 1, )T (1) (V,1 )T K (1) (V () 1, )T () (V,1 )T K () U (1) (1) (V ],1 )T [ U () 4 U (1) (1) 1 (V 1, )T K () 3 (U () 3 )T (V () 3,4 )T (1) (V 4,3 )T K () 4 ] (1) () 7

8 where for a p-level HODLR matrx, K (p) R n/p n/ p, U (p) 1, U (p), V (p) 1,, V (p), 1 R n/p r and r n. Further nested compresson of the off-dagonal blocks wll lead to HSS structures [3]. 4.. Solver Dervaton and Algorthm Contrary to the method ntroduced by Hackbusch [7] whch utlzes sequental block LU factorzaton, the HODLR drect solve algorthm presented n ths secton s based on the Woodbury matrx dentty (see for example [33, 3]). Although we do not use the formula explctly, we perform the exact same operatons. Lookng at Equaton (4), our method assumes that both dagonal blocks are nonsngular and factorzes them ndependently. However, Hackbusch [7] only assumes that top dagonal block s nvertble and factorzes the top dagonal block frst. He then constructs the remanng Schur complement and contnues on wth the factorzaton. In comparng the two methods, one can see that because of the ndependent factorzaton of the dagonal blocks, the method presented n ths secton s better suted to parallel mplementatons. Consder the followng lnear equaton: Kx = F (3) where K R n n s an HODLR matrx and x, F R n s. Now let s wrte K as a one-level HODLR matrx and rewrte Equaton (3) : [ ] [ ] K (1) 1 U (1) 1 K = V (1) 1, T x (1) [ ] U (1) V (1),1 T K (1) 1 F1 x (1) = (4) F where x (1), F (1) R ( n s). We now ntroduce two new varables y (1) 1 and y (1) : y (1) 1 = V (1),1 T x (1) 1 (5) y (1) = V (1) 1, T x (1) (6) Rearrangng (4), we have: K (1) 1 U (1) 1 x (1) K (1) U (1) 1 F 1 x (1) V (1),1 T I y (1) = F V (1) 1 1, T I y (1) }{{}}{{}}{{} K x F We now factorze the top dagonal block of K whch conssts of K (1) 1 and K (1). Snce ths subblock of K s a block dagonal matrx, ths means that we only need to factorze K (1) 1 and K (1). After elmnatng the top off dagonal block, we are left wth the Schur complement: [ ] S (1) I V (1),1 T (K (1) 1 = ) 1 U (1) 1 V (1) 1, T (K (1) ) 1 U (1) (8) I All we have to do now, s to solve the Schur complement: [ ] S (1) y (1) = V (1) T (1),1 (K 1 y (1) V (1) 1, 8 1 ) 1 F 1 T (K (1) ) 1 F (7) (9)

9 At ths pont, we can wrte x (1) 1 and x (1) n terms of (K (1) 1 ) 1 and (K (1) [ ] ] x (1) 1 x (1) = [ (K (1) 1 ) 1 (K (1) ) 1 ] [ F 1 U (1) 1 y(1) F U (1) y(1) 1 ) 1 : (1) Snce, both K (1) 1 and K (1) are HODLR matrces, we can apply the same procedure for factorzng them. Thus, we have arrved at a recursve algorthm for solvng (7). The factorzaton step corresponds to the computaton and storage of all the terms that are ndependent of the rght hand sde (.e., the Schur complements at all levels) Algorthm Summary We now summarze the recursve HODLR drect solver algorthm. For a matrx such as K R n n, we have to carry out the followng procedure at each recurson level (p) for all 1 p : Factorze 1. Fnd the low-rank approxmaton of the off-dagonal blocks (U (p) 1, U (p), V (p) 1,, V (p), 1 ).. Defne Z 1 =. For each level p, startng at the top level (p = ), let: [ ] [ ] Z (p+1) 1 U (p+1) Z (p+1) = 1 Z (p) U (p+1) (11) In the equaton above, on the rght-hand sde, we are vertcally concatenatng two matrces to form a matrx at level p Recursvely solve the followng equatons: [ ] d (p+1) 1 [ d (p+1) c (p+1) 1 c (p+1) ] = (K (p+1) 1 ) 1 Z (p+1) 1 (1) = (K (p+1) ) 1 Z (p+1) (13) where d (p+1) and c (p+1) correspond to the U (p+1) and Z (p) porton of the rght hand sdes respectvely. 4. Obtan S (p), usng Equatons (8) and (9): [ S (p) I = (V (p+1) 1, )T d (p+1) ] (V (p+1), 1 )T d (p+1) 1 I (14) 5. Obtan d (p), c (p) for p 1 usng: [ d (p) ] c (p) = I [ d (p+1) 1 d (p+1) ] (S (p) ) 1 V (p+1) T, 1 V (p+1) 1, T [ ] c (p+1) 1 c (p+1) (15) 9

10 4.3.. Solve 1. Defne z1 = F. For each level p, startng at the top level (p = ), let: [ ] z (p+1) 1 z (p+1) = z p (16). Recursvely solve the followng equatons: x (p+1) 1 x (p+1) = (K(p+1) 1 ) 1 z (p+1) 1 (17) = (K (p+1) ) 1 z (p+1) (18) 3. Obtan x (p) x (p) = for p usng: [ I d (p+1) 1 d (p+1) ] (S (p) ) 1 V (p+1) T, 1 V (p+1) 1, T [ ] x (p+1) 1 x (p+1) (19) Note that (S (p) ) 1 was prevously computed and ths step s therefore only a seres of matrx-matrx products. Hence, the computatonal cost s small compared to the prevous factorzaton Solver Computatonal Cost Assumng we use a fast (O(n)) low-rank approxmaton scheme, the cost of constructng and storng an HODLR matrx s O(nr log(n)) [3], where r s the rank of approxmaton. Lookng at the procedure descrbed n Secton 4.3, we can wrte the followng: ( C (p) (r, s, n) = C (p+1) r, s + r, n ) + O(nr ) + O(nsr) () where C (p) (r, s, n) s the computatonal cost assocated wth solvng an n n HODLR matrx at level p wth s rght hand sdes and off-dagonal blocks of rank r. Equaton () suggests that the cost of solvng a HODLR matrx at level p wth s rght hand sdes s made up of three contrbutons. The frst contrbuton s assocated wth solvng the two dagonal blocks at the lower level (p + 1) wth s + r rght hand sdes. The second contrbuton comes from constructng the Schur complement S (p) (Equaton (8)) and the thrd contrbuton s the cost of constructng the rght hand sde of Equaton (9). Wrtng Equaton () as a sum, we have: log( n r ) C () (r, s, n) = O(pnr + nsr) (1) p=1 If the off-dagonal rank s constant throughout varous levels n the HODLR tree, the computatonal cost of the algorthm s O(r n log (n)) accordng to Equaton (1). However, n many practcal cases, the rank decays from root to leaves n the HODLR tree. Assume we can approxmate r as O(np 1/ ) where n p s the sze of a block at level p. Then, we have: r p = O( r 1 ), where r p/ 1 s the rank at the top level. Accordng to 1

11 Equaton (), the total computatonal cost nvolves two sums: log( n r ) p=1 r p q= log( n r ) p=1 r p = O(r 1) log( n r ) p (s + r q ) = O( p=1 r p (s + r 1 )) = O(r 1 (s + r 1 )) Note n partcular that the second sum s O(r 1 ) nstead of O(r log n). Fnally: C () (r, s, n) = O(nr ) () Ths result shows that n cases where the off-dagonal rank s decreasng, HODLR solvers can become very effcent and can compete wth HSS solvers. 5. Low-Rank Approxmaton Schemes In ths secton, we dscuss the varous low-rank approxmatons schemes used for obtanng a low-rank representaton of the off-dagonal blocks of the HODLR matrces n consderaton. Although a varety of low-rank approxmaton algorthms (SVD, rank revealng LU, rank revealng QR, randomzed algorthms, etc) are avalable, we requre a scheme that has a computatonal cost of O(rn) where r s the rank of approxmaton and n s the sze of the matrx. In the context of ths work, we cannot use randomzed SVD methods snce no fast matrx-vector product algorthm apples n our benchmark settngs. Ths lmts our choces to methods lke Chebyshev, partal pvotng ACA (Secton 5.1) and the pseudo-skeleton low-rank approxmaton algorthm (Secton 5.3). Each of these methods has certan drawbacks: The Chebyshev low-rank approxmaton algorthm s only suted to cases dealng wth nteracton of ponts va smooth kernels. The partal pvotng ACA algorthm works well when the leverage score of the matrx [46] s unform. That s, all rows and columns have farly the same mportance when constructng the low-rank approxmaton. However, n cases where certan rows or columns play a specal role and are crtcal to nclude n the low-rank approxmaton, ACA mght fal to properly dentfy them, resultng n an naccurate low-rank approxmaton. The accuracy of the pseudo-skeleton low-rank approxmaton scheme strongly depends on the method used for selectng rows and columns. In order to construct a fast and robust low-rank approxmaton scheme, we ntroduce a method for selectng rows and columns n the pseudo-skeleton low-rank approxmaton algorthm. We call ths new method the boundary dstance low-rank approxmaton scheme (BDLR). 11

12 5.1. ACA Low-Rank Approxmaton We use the ACA algorthm wth partal pvotng as descrbed by Rjasanow [51]. Ths algorthm s an algebrac low-rank approxmaton scheme and works on any dense matrx wthout any pror knowledge of the matrx. Both full pvotng and partal pvotng ACA search the matrx or the remanng Schur complement for the largest entry and use ths entry as the pvot. The full pvotng algorthm, smlar to rank revealng LU, scans all the matrx entres. Partal pvotng ACA avods ths expensve search by lookng at the largest entry n a sngle row/column at each step. The partal pvotng ACA algorthm has a cost of O(r(m + n)), for a matrx A R m n [51], where r s the rank of approxmaton. 5.. Randomzed Algorthms Randomzed algorthms as descrbed by [53, 34, 18, 1] arrve at a low-rank approxmaton of matrx A by formng a lower dmensonal matrx Y obtaned from samplng rows and/or columns of the orgnal matrx or by applyng random projectons to matrx A. They then obtan the orthonormal bass Q for the range of Y and approxmate A as: A QQ T A (3) For a matrx of sze n n, and wthout a fast matrx-vector product, these methods have a computatonal cost of O(n ). Otherwse, the cost can be brought down to O(n) or O(n log n) Pseudo-Skeleton and Boundary Dstance Low-Rank Approxmaton In order to construct a fast and accurate solver, we need an accurate and robust method to construct low-rank approxmatons. As we wll show, BDLR s very robust and leads to accurate low-rank approxmatons. It works well n problems where the matrx can be related to a Green s functon. (Ths s true for all lnear PDE problems. Note that the Green s functon needs to be smooth, wth a sngularty at the orgn). In that case, large entres correspond to ponts close n space, whch we assocate as a smplfcaton to nodes n the graph that are connected by few edges. Although ths s a smple heurstc, t worked very well n our examples and allowed us to effcently form accurate low-rank approxmatons. The BDLR algorthm s a row and column selecton algorthm n the pseudo-skeleton low-rank approxmaton scheme. The pseudo-skeleton algorthm allows us to construct a low-rank approxmaton of a matrx by choosng a subset of rows and columns of that matrx. As mentoned n [5], for a low-rank matrx A, f we pck a set of row ndces ( I = { 1,..., r }) and a set of column ndces (j J = {j 1,..., j r }) and defne matrces C and R such that : Then, we can approxmate A to be : R = A(I, :) (4) C = A(:, J) (5) A C 1 R (6) where  = A(I, J). If  s not a square matrx or rank defcent, the Moore-Penrose pseudonverse s needed for  1. In order to acheve a certan accuracy, one can ncrease the number of chosen rows and columns untl the desred accuracy s reached. To montor 1

13 the error n the scheme, we pck rows and columns that are not n the set of rows and columns already chosen for low-rank approxmaton. We then montor the relatve Frobenus norm error on these rows and columns and ncrease the rank of the approxmaton untl the relatve Frobenus norm error falls below a certan tolerance. For a rank r pseudo-skeleton low-rank approxmaton, the nverson of  has a computatonal cost of O(r 3 ). Montorng the error has a computatonal cost of O(mr + nr r ) for A R m n. Thus, ths method has an asymptotc complexty of O(nr). As mentoned n Secton 1, we are predomnantly nterested n solvng dense frontal matrces arsng from the multfrontal elmnaton process of sparse fnte-element matrces. In ths case, every frontal matrx has a correspondng sparse matrx, whch s a dagonal subblock of the orgnal fnte-element matrx. Ths sparse matrx descrbes a graph that has rows and columns of the dense matrx as ts vertces and the edges n ths graph correspond to nonzero entres n the sparse matrx and descrbe the connecton between these ponts. We use ths graph n constructng the low-rank approxmaton of the off-dagonal blocks. Entres n dense matrx blocks that correspond to FEM or BEM applcatons can be related to the nverse of a Green s functon. The Green s functon s large at short dstances and then decays smoothly. We have a smlar behavor for our dense blocks. Hence, we want to dentfy row/column pars correspondng to large entres. These correspond to nodes n the graph that are close, that s connected by few edges. Therefore we use the dstance between a row vertex n the graph and the column vertex set (e.g., f the vertex corresponds to a row, we consder the dstance to the set of vertces assocated wth the columns, and vce versa) as a good crteron to determne whether to pck a row/column or not. For a set of row (column) vertces, we defne the boundary vertces as the subset of vertces for whch there exsts an edge n the nteracton graph connectng them to a vertex n the column (row) set. Fgure 1(a) shows an example of a matrx whch corresponds to the nteractons of a set of row ponts wth a set of column ponts. In ths partcular example, the blue vertces are the boundary vertces. That s, they are the vertces closest to the boundary between the row and column set of ponts. Now that we have defned the boundary nodes, we can desgnate an ndex d for every vertex n the row (column) set. Ths ndex s defned as the dstance of a vertex to the vertces n the boundary set. In order to construct the low-rank approxmaton, we choose rows and columns based on ther d ndex value. That s, we frst choose rows (columns) that are n the boundary set (d = ). We then add rows (columns) wth a dstance of one to the boundary (d = 1). For example, n Fgure 1(a), the green nodes are labeled (d = 1) as they are separated from the blue boundary nodes (d = ) wth only one edge. We contnue addng ponts based on the d ndex, untl we reach the desred accuracy. Fgure 1(b) shows that the BDLR algorthm approxmates the nteracton of a set of row and column nodes wth the nteracton of the ones that are closest to the boundary (nteracton of blue nodes). As mentoned above, calculatng the pseudo skeleton low-rank approxmaton requres us to calculate the pseudonverse of Â. For the BDLR algorthm, nstead of usng the SVD for calculatng the pseudonverse ( 1 ), we use a full pvotng LU factorzaton, whch s slghtly cheaper:  = P 1 LUQ 1 (7) where P and Q are permutaton matrces. Let r be the rank of Â. Defne R and C as: C = (CQ)(:, 1 : r)(u(1 : r, 1 : r)) 1 (8) R = (L(1 : r, 1 : r)) 1 (P R)(1 : r, :) (9) 13

14 where C and R are the subset of columns and rows we have pcked usng the BDLR scheme. We then have: A C R (3) (U(1 : r, 1 : r)) 1 and (L(1 : r, 1 : r)) 1 correspond to lower-trangular solves. The nverse matrces are not explctly computed. Row Set d= d=1 d= d= d=1 d= Column Set Row Set Column Set (a) Full Matrx Representaton Row Set d= d=1 d= d= d=1 d= Column Set Row Set Column Set (b) Low-Rank Matrx Representaton Fgure 1: Classfcaton of vertces based on dstance from the other set. 6. Applcaton for Multfrontal Solve Process In ths secton, we demonstrate how our fast dense solver algorthm can be appled to a sparse multfrontal solve process. We wll not explan the multfrontal algorthm n detal. For a detaled revew of the multfrontal method see [44]. We appled our fast solver as descrbed n Secton 3 to a varety of 3D fnte-element problems. We nvestgate frontal matrces at varous levels of the sparse matrx elmnaton tree correspondng to the elastcty equaton. We use SCOTCH [5] to do the reorderng n the sparse multfrontal solver. Our goal s to apply our fast dense solver to the dense frontal matrces obtaned n the multfrontal elmnaton process of a sparse fnte-element matrx, and speed up the multfrontal algorthm to approxmately O(N 4/3 ). The results shown n ths paper can be vewed as a proof of concept of ths dea. We should also menton that the approach presented n ths artcle s fully general. We use SCOTCH [5], (whch can partton any 14

15 graph) to obtan the separators and the resultng separators can always be handled by our algorthm, wthout any change Elastcty Equaton for a 3D Beam and a Cylnder Head Geometry We consder the 3D Naver-Cauchy elastostatcs equatons wth a beam geometry (fgure (a)): (λ + µ) ( u) + µ u + F = (31) where u s the dsplacement vector and λ and µ are Lamé parameters. For the beam geometry, we use 1-node tetrahedral elements (see for example Secton 1. of ths document 3 ) to dscretze the above equaton. For the cylnder head geometry, the mesh s composed of 8-node hexahedral, 6-node pentahedral and 4-node tetrahedral sold elements, and also 3- node shell elements. Fgures (a) and (b) show a sample beam and cylnder head geometry respectvely. As can be seen, the meshes are unstructured for both geometres. (a) Beam (b) Cylnder Head Fgure : 3D unstructured mesh for the beam and cylnder head geometres FETI-DP Solver for a 3D Elastcty Problem Doman decomposton (DD) methods solve a problem by splttng t nto several subdomans. Local problems are solved on each subdoman and a global lnear system s used to couple these local solutons nto a global soluton for the entre problem [1]. FETI methods [, 4] are a famly of doman decomposton algorthms wth Lagrange multplers that have been developed for the fast sequental and parallel teratve soluton of large-scale systems of equatons arsng from the fnte-element dscretzaton of partal dfferental equatons []. In ths artcle, we consder two sparse local FETI-DP matrces arsng from the fnteelement dscretzaton of an elastcty problem n three dmensons. The frst matrx corresponds to solvng the elastcty equaton wth a structured mesh n three dmensons (fgure 3(a)) whle the second matrx corresponds to solvng the same problem usng the

16 (a) Structured Mesh (b) Unstructured Mesh Fgure 3: FETI-DP benchmark meshes. Fgure (a) shows a structured and fgure (b) shows an unstructured 3D FETI-DP mesh. geometry of an engne n an unstructured mesh (fgure 3(b)). Both matrces correspond to the stffness matrx of one subdoman of a lnear elastc 3D sold fnte element model (Equaton (31)) of ther respectve geometry. The dscretzaton for the cube geometry uses 8-node (trlnear) hexahedral elements (see for example Secton 11.3 of ths onlne document 4 ) whle the dscretzaton for the engne geometry uses 1-node tetrahedral elements (see for example Secton 1. of ths document 5 ). 7. Numercal Benchmarks In ths secton we show some numercal results and benchmarks of our code. As our code uses the Egen C++ lbrary for matrx manpulatons, we use the Egen drect solvers as benchmark references Elastcty Equaton for a 3D Beam and a Cylnder Head Geometry We apply our solvers to frontal matrces arsng from the multfrontal elmnaton of 3D elastostatcs sparse matrces (Fgures (a), (b)). We compare the fast BDLR drect solver and the ACA drect solver as precondtoners to the GMRES teratve scheme. Because of the partcular geometry of the beam mesh, all frontal matrces are relatvely small ( K) for ths partcular case. As can be seen n Fgure 5, the sngular values of a sample frontal matrx off-dagonal block decay rapdly and the block s n fact low-rank. Fgures 4(a) and 4(b) show the dstance of row (column) ndex of each pvot obtaned n the full pvotng LU factorzaton from the boundary between the row and column sets of vertces n the nteracton graph for the beam problem. As we expected, larger pvots correspond to rows and columns that are closer to the boundary. Fgures 6(a) and 6(b) compares the relatve error n approxmatng the top off-dagonal block usng SVD versus the BDLR approxmaton for the beam and

17 cylnder head geometry respectvely. That s, each pont (x,y) n ths plot represents the relatve error n approxmaton (y) f we wanted a rank (x) approxmaton usng one of the low-rank approxmaton algorthms. Needless to say, ths corresponds to choosng the top sngular values n the SVD decomposton and choosng rows and columns that are closest to the boundary n the BDLR approxmaton. As can be seen n the plot, the curves assocated wth the BDLR scheme have a tolerance (ɛ). Ths means that after the LU factorzaton of  (see Secton 5.3), we only keep rows and columns correspondng to pvots that are larger than ɛ tmes the magntude of the largest pvot. We use ths conventon for all BDLR approxmatons n ths paper. We can observe that as we decrease ɛ, we obtan a more accurate low-rank representaton va the BDLR algorthm for the beam geometry. For the more complcated cylnder head geometry, we see that n order to obtan a good approxmaton for low values of ɛ, more rows and columns need to be ncluded n the low-rank approxmaton whch corresponds to a hgher depth parameter (d) n the BDLR scheme. Fgures 7(a) and 7(b) show a level by level tmng of the factorzaton, solve and lowrank approxmaton of the BDLR solver appled to sample frontal matrces correspondng to the beam and cylnder head geometres respectvely. As can be seen, the off-dagonal rank decays from root to leaf whch confrms our assumptons n Secton 4.4. Fgures 8(a) and 8(b) show a detaled convergence analyss and comparson between the BDLR and ACA solvers as precondtoners to the GMRES teratve scheme. 7.. FETI-DP Solver for a 3D Elastcty Problem We apply the BDLR and ACA drect solver precondtoner to frontal matrces arsng from the multfrontal elmnaton of local matrces n a FETI-DP solver. We consdered two dfferent classes of problems. One corresponds to solvng the elastcty equaton (Equaton (31)) n a cube geometry wth a structured mesh. The other corresponds to solvng the same equaton n an engne geometry wth an unstructured mesh. Fgures 4(c) and 4(d) show that the largest pvot values of a sample off-dagonal block of a frontal matrx arsng from the cube geometry correspond to rows and columns that are closer to the boundary. Fgures 4(e) and 4(f) show that for the unstructured engne mesh, although most large pvots correspond to rows and columns near the boundary, there are some mportant rows and columns that are not ncluded n the ponts closest to the boundary. Fgures 6(c) show that the error n the BDLR method s comparable to the SVD (optmal) algorthm for the structured cube problem. Fgure 6(d) shows that smlar to Fgure 6(b), we need to nclude more ponts (rows and columns), n order to acheve an accurate low-rank approxmaton for ɛ = 1 1. In other words, f there are nsuffcent rows and columns n the BDLR approxmaton, the matrx  (see Secton 5.3) becomes low-rank and results n a LU factorzaton wth vey small pvots. These small pvots are the cause of the large relatve error as they become very large when nverted. Fgures 8(c) and 8(d) show the convergence rate of varous BDLR and ACA drect solver precondtoners for a sample frontal matrx arsng from the cube and engne mesh respectvely Summary Table 3 summarzes the solver tmngs for varous frontal matrces that we benchmarked. As can be seen, the teratve solve scheme wth both a fast BDLR and ACA drect solver precondtoner can reach near machne accuracy much faster than a conventonal LU solver 17

18 n almost all cases. Furthermore, both BDLR and ACA acheve a relatvely good speedup for all cases. However, for very large cases (1.5M structured cube and.3m unstructured cylnder head), one can observe that BDLR acheves hgher speedup compared to ACA. One mportant pont to note, s that convergence of both BDLR and ACA depends on the chosen parameters. For ACA, one can get better results by decreasng the tolerance. For BDLR, n order to acheve a gven tolerance, one has to ncrease the depth parameter (d). It s possble for BDLR not to converge for a certan tolerance and a depth parameter. Ths s because the depth and accuracy are related. In partcular, the effcency of the method s sometmes found to degrade f we reduce ɛ too much wthout ncreasng d suffcently. Ths corresponds to the fact that we are tryng to get a more accurate low-rank approxmaton but the pool of sample ponts s not suffcently large to provde the desred accuracy. In that case, reducng ɛ may, n fact, lead to a degradaton n the precondtoner, rather than an mprovement. An mportant advantage of the BDLR algorthm s that the rows and columns requred for constructng the low-rank approxmaton are known a pror based on the structure of the separator graph. As we wll demonstrate n a future artcle, ths wll allow us to sgnfcantly accelerate the extend-add process and allows us to avod constructng large dense frontal and update matrces as we wll only keep track of rows and columns requred by the BDLR algorthm. Matrx Sze ACA BDLR Speed-up Matrx Mesh Level 1e-1 1e-3 1e-5 1e-1 1e-3 1e-5 LU Type Type Sparse Dense ACA BDLR T I T I T I T I T I T I 1st 1.5M 3K 1.3e 3.85e 7 7.8e e 34.9e e 7 7.9e st 7.5K 6.99e e1 4.9e e 3.8e e1 7.38e nd 5.K.9e e e 3 3.3e e e e nd 5.K.5e e e1 3.38e e 6.6e e rd.K.77e e e e e e e Cube FETI Local 3rd 4K.8K 4.34e e e e e 7 3.7e e rd.K.e e e.91e e e 4 6.9e rd.5K 3.95e e1 7.61e 5.74e e 5.65e 4 1.e th.5K 4.65e e e 8.83e e e 5 1.e th.K 3.6e e 7.46e 6.6e e 5 3.3e 4 6.5e th 3.8K 5.7e e 68 4.e 16.e e 4 3.9e 3 3.4e Engne 9th 4K.8K 1.31e e e e e e e th.5K 1.34e e e e e e e st 1.9K x x 5.67e e e e e e nd 1.9K 1.31e e e-1 4.9e e e 4 4.5e Beam nd 3K 1.9K x x 4.88e e e e e 5 4.1e Stffness 3rd 1.9K 6.67e e e e e e 4 4.3e rd 1.9K 1.19e e e e e e e th.3M 4K 3.84e 89 x x 1.5e x x 8.7e e e CHead nd 4.8K 4.69e e 4 1.7e1 3.45e 11.97e e e K 4th.6K 4.61e e e e e e e.3.54 Table 3: Summary of solver speed for varous benchmark cases. All tmngs are measured n seconds. The GMRES accuracy and maxmum number of teratons was set to 1 1 and 1 respectvely for all cases. The letters x depcts that the solver dd not converge wthn 1 teratons. All LU tmngs are obtaned usng Egen s [39] partal pvotng LU solver. Level ndcates the level of the dense frontal matrx n the sparse elmnaton tree. T and I refer to the total solve tme and the number of teratons n the teratve solver respectvely. Iteratve solver tmes depcts total solve tme for the teratve solver wth a fast drect BDLR (ACA) solver precondtoner (low-rank computaton, drect solve, teraton, etc). For BDLR, we used a depth of 1, 3 and 5 for tolerances 1 1, 1 3 and 1 5 respectvely. For the 4.8K and 3K cylnder head matrces, the results on the last BDLR column were obtaned usng a tolerance of 1 4 and a depth of 1. We have calculated the speedups by comparng the runtme of the conventonal LU solver to the lowest runtme for each case. 8. Concluson and Future Work To reach our fnal goal of constructng a fast multfrontal solver, we need to mprove the slow dense solves for the frontal matrces, whch we demonstrate through varous bench- 18

19 Row dstance from boundary Col dstance from boundary , , Pvot sze (largest to smallest) Pvot sze (largest to smallest) (a) Row Dstance (Unstructured Beam) (b) Col Dstance (Unstructured Beam) Row dstance from boundary 3 1 Col dstance from boundary ,, 3, 4, 1,, 3, 4, Pvot sze (largest to smallest) Pvot sze (largest to smallest) (c) Row Dstance (Structured Cube) (d) Col Dstance (Structured Cube) Row dstance from boundary 3 1 Col dstance from boundary , 1,5, 5 1, 1,5, Pvot sze (largest to smallest) Pvot sze (largest to smallest) (e) Row Dstance (Unstructured Engne) (f) Col Dstance (Unstructured Engne) Fgure 4: Row (column) dstance versus pvot sze for a varety of off-dagonal blocks of sample frontal matrces. Row (column) dstance s the dstance correspondng to the row (column) ndex of a pvot from the boundary as defned n Fgure 1(a). Ths graph shows that large pvots are near the boundary nterface, whereas the pvot sze decays as we move away. Ths justfes heurstcally our approach wth BDLR. a,b) An off dagonal block of an unstructured beam geometry frontal matrx of sze.95k. c,d) An off dagonal block of an structured cube geometry frontal matrx of sze 3.75K. e,f) An off dagonal block of an unstructured engne geometry frontal matrx of sze 1.9K. 19

c 2009 Society for Industrial and Applied Mathematics

c 2009 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 31, No. 3, pp. 1382 1411 c 2009 Socety for Industral and Appled Mathematcs SUPERFAST MULTIFRONTAL METHOD FOR LARGE STRUCTURED LINEAR SYSTEMS OF EQUATIONS JIANLIN XIA, SHIVKUMAR

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids) Structured meshes Very smple computatonal domans can be dscretzed usng boundary-ftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

LU Decomposition Method Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

LU Decomposition Method Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America nbm_sle_sm_ludecomp.nb 1 LU Decomposton Method Jame Trahan, Autar Kaw, Kevn Martn Unverst of South Florda Unted States of Amerca aw@eng.usf.edu nbm_sle_sm_ludecomp.nb 2 Introducton When solvng multple

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

AVO Modeling of Monochromatic Spherical Waves: Comparison to Band-Limited Waves

AVO Modeling of Monochromatic Spherical Waves: Comparison to Band-Limited Waves AVO Modelng of Monochromatc Sphercal Waves: Comparson to Band-Lmted Waves Charles Ursenbach* Unversty of Calgary, Calgary, AB, Canada ursenbach@crewes.org and Arnm Haase Unversty of Calgary, Calgary, AB,

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Multiblock method for database generation in finite element programs

Multiblock method for database generation in finite element programs Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs

More information

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation College of Engneerng and Coputer Scence Mechancal Engneerng Departent Mechancal Engneerng 309 Nuercal Analyss of Engneerng Systes Sprng 04 Nuber: 537 Instructor: Larry Caretto Solutons to Prograng Assgnent

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Accounting for the Use of Different Length Scale Factors in x, y and z Directions 1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros. Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Dynamic wetting property investigation of AFM tips in micro/nanoscale

Dynamic wetting property investigation of AFM tips in micro/nanoscale Dynamc wettng property nvestgaton of AFM tps n mcro/nanoscale The wettng propertes of AFM probe tps are of concern n AFM tp related force measurement, fabrcaton, and manpulaton technques, such as dp-pen

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell Module 6: FEM for Plates and Shells Lecture 6: Fnte Element Analyss of Shell 3 6.6. Introducton A shell s a curved surface, whch by vrtue of ther shape can wthstand both membrane and bendng forces. A shell

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Recognizing Faces. Outline

Recognizing Faces. Outline Recognzng Faces Drk Colbry Outlne Introducton and Motvaton Defnng a feature vector Prncpal Component Analyss Lnear Dscrmnate Analyss !"" #$""% http://www.nfotech.oulu.f/annual/2004 + &'()*) '+)* 2 ! &

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis Internatonal Mathematcal Forum, Vol. 6,, no. 7, 8 Soltary and Travelng Wave Solutons to a Model of Long Range ffuson Involvng Flux wth Stablty Analyss Manar A. Al-Qudah Math epartment, Rabgh Faculty of

More information

Radial Basis Functions

Radial Basis Functions Radal Bass Functons Mesh Reconstructon Input: pont cloud Output: water-tght manfold mesh Explct Connectvty estmaton Implct Sgned dstance functon estmaton Image from: Reconstructon and Representaton of

More information

Reading. 14. Subdivision curves. Recommended:

Reading. 14. Subdivision curves. Recommended: eadng ecommended: Stollntz, Deose, and Salesn. Wavelets for Computer Graphcs: heory and Applcatons, 996, secton 6.-6., A.5. 4. Subdvson curves Note: there s an error n Stollntz, et al., secton A.5. Equaton

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information