c 2009 Society for Industrial and Applied Mathematics

Size: px
Start display at page:

Download "c 2009 Society for Industrial and Applied Mathematics"

Transcription

1 SIAM J. MATRIX ANAL. APPL. Vol. 31, No. 3, pp c 2009 Socety for Industral and Appled Mathematcs SUPERFAST MULTIFRONTAL METHOD FOR LARGE STRUCTURED LINEAR SYSTEMS OF EQUATIONS JIANLIN XIA, SHIVKUMAR CHANDRASEKARAN, MING GU, AND XIAOYE S. LI Abstract. In ths paper we develop a fast drect solver for large dscretzed lnear systems usng the supernodal multfrontal method together wth low-rank approxmatons. For lnear systems arsng from certan partal dfferental equatons such as ellptc equatons, durng the Gaussan elmnaton of the matrces wth proper orderng, the fll-n has a low-rank property: all off-dagonal blocks have small numercal ranks wth proper defnton of off-dagonal blocks. Matrces wth ths low-rank property can be effcently approxmated wth semseparable structures called herarchcally semseparable (HSS) representatons. We reveal the above low-rank property by orderng the varables wth nested dssecton and elmnatng them wth the multfrontal method. All matrx operatons n the multfrontal method are performed n HSS forms. We present effcent ways to organze the HSS structured operatons along the elmnaton. Some fast HSS matrx operatons usng tree structures are proposed. Ths new structured multfrontal method has nearly lnear complexty and a lnear storage requrement. Thus, we call t a superfast multfrontal method. It s especally sutable for large sparse problems and also has natural adaptablty to parallel computatons and great potental to provde effectve precondtoners. Numercal results demonstrate the effcency. Key words. structured drect solver, herarchcally semseparable matrx, low-rank property, superfast multfrontal method, nested dssecton AMS subject classfcatons. 15A23, 65F05, 65F30, 65F50 DOI / X 1. Introducton. In many computatonal and engneerng problems t s crtcal to solve large structured lnear systems of equatons. Dfferent structures come from dfferent natures of the orgnal problems or dfferent technques of dscretzaton, lnearzaton, or smplfcaton. It s usually mportant to take advantage of the specal structures or to preserve the structures when necessary. Drect solvers often provde good chances to explot the structures. Drect methods are attractve also due to ther effcency, relablty, and generalty. Here we are nterested n the dscretzaton of dfferental equatons such as ellptc PDEs on a two-dmensonal (2D) fnte element mesh (grd) M. We consder dscretzatons wth a regular mesh or, more generally, dscretzatons wth a well-shaped mesh [37, 38, 42], so that nested dssecton or ts generalzatons [19, 20, 23, 32, 42] can be used. The fnte element system (1.1) Ax = b assocated wth the dscretzaton on M s consdered, where A s symmetrc postve defnte (SPD). Such matrces arse, say, when we apply fnte dfference or fnte ele- Receved by the edtors January 5, 2009; accepted for publcaton (n revsed form) by J. H. Brandts September 2, 2009; publshed electroncally December 4, Department of Mathematcs, Purdue Unversty, West Lafayette, IN (xaj@math.purdue. edu). Department of Electrcal and Computer Engneerng, Unversty of Calforna, Santa Barbara, CA (shv@ece.ucsb.edu). Department of Mathematcs, Unversty of Calforna, Berkeley, CA (mgu@math.berkeley. edu). Lawrence Berkeley Natonal Laboratory, MS 50F-1650, One Cyclotron Rd., Berkeley, CA (xsl@lbl.gov). 1382

2 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1383 ment technques to solve 2D lnear boundary value problems such as ellptc boundary problems on rectangular domans. For the purpose of presentatons, we focus on 2D problems and demonstrate the potental of our deas, although t s possble to generalze to other problems. For convenence, we wll refer to the followng model problem at some places. Model Problem 1.1. We use the 2D dscrete Laplacan on a 5-pont stencl as a model problem, where A s a 5-dagonal n 2 n 2 sparse matrx: T I I (1.2) A = I 0 I T, T = Note we can also use the 9-pont or other stencls, snce the actual entres or nonzero patterns are not referenced n the descrptons of our method below. To factorze the dscretzed matrx A, people typcally frst order the mesh ponts. Drect factorzaton of such a matrx wth rowwse or columnwse mesh orderng takes O(n 4 ) flops [19]. The nested dssecton orderng [19] gves an elmnaton scheme wth O(n 3 ) cost, whch s optmal for any orderng n exact arthmetc [31] (gnorng any specal technques such as Strassen s algorthm [41]). But O(n 3 ) s stll large for a bg n. Sometmes, teratve methods are cheaper f effectve precondtoners are avalable. But for hard problems good precondtoners can be dffcult to fnd Superfast multfrontal method. Drect solvers have been consdered expensve because of the large amount of fll-n even f the orgnal matrx s sparse. To effectvely handle the fll-n, some people approxmate full matrces n complcated problems wth structured matrces such as H-matrces [26, 28, 29], H 2 -matrces [3, 27, 30], quasseparable matrces [17], semseparable matrces [7, 9, 10], etc. Smlarly, when solvng dscretzed PDEs such as ellptc equatons, we can develop fast drect solvers by explotng the rank property and by approxmatng dense matrces n these problems wth compact structured matrces wthout compromsng the accuracy. These approxmatons are feasble, as we notce that the fll-n durng the elmnaton wth certan orderng s actually structured. The fll-n s closely related to the Green s functons va Schur complements. Thus the off-dagonal blocks of the fll-n have small numercal ranks, whch has been observed n [1, 2, 8, 24, 25, 35, 45] and other work. Due to ths low-rank property, we show that the structured approxmatons of the dense N N subproblems n those problems can be solved wth a cost of O( N), and ths leads to a total complexty of O(pn 2 ), where p s a constant related to the PDE and the accuracy n the matrx approxmatons and n s the mesh dmenson. Our solver ncludes both the approxmaton stage of dense matrces and the drect soluton stage usng the approxmatons. We stll call the overall procedure a drect solver. Our work shares smlar deas as those n [1, 2, 8, 24, 25, 35, 45]. Here, we fully ntegrate sparse matrx technques (nested dssecton) n the context of a supernodal multfrontal method, and we use tree structures for both the overall matrx factorzaton and the ntermedate data structure. The multfrontal method s one of the most mportant drect methods for sparse matrx solutons [16, 34]. In our drect solver we order the mesh nodes nto separators wth nested dssecton [19] and organze the elmnaton process wth a supernodal multfrontal method. The method elmnates separators and accumulates updates locally followng an elmnaton tree [22, 33]..

3 1384 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI Moreover, the dense ntermedate matrces (fll-n) n the supernodal multfrontal method for many dscretzed PDEs have the above low-rank property. Those dense matrces are then approxmated by tree-structured semseparable matrces, called herarchcally semseparable (HSS) matrces [11, 13, 14, 46]. HSS matrces have close relaton to H 2 -matrces n [3, 27, 30]. Many HSS operatons such as matrx multplcatons and system solutons can be done n nearly lnear complexty. HSS matrces feature herarchcal low-rank propertes n the off-dagonal blocks. More specfcally, we say that a matrx has the low-rank property f all ts off-dagonal blocks have small ranks or numercal ranks (Defnton 2.6). In our supernodal multfrontal method, all matrx operatons are conducted effcently va HSS approxmatons. Some basc HSS operatons can be found n [11, 13, 14]. In ths paper, we provde some new HSS algorthms necessary to convert the multfrontal method nto a structured one. These new HSS algorthms provde an nnovatve way of handlng HSS matrx operatons by tree technques. Both the multfrontal method and the HSS structure have nce herarchcal tree structures. They both take good advantage of dense matrx computatons and have effcent storage, good data localty, and natural adaptablty to parallel computatons. The usage of HSS matrces n the multfrontal method leads to an effcent structured multfrontal method. For problems such as the 5-pont or 9-pont fnte dfference Laplacan, ths structured multfrontal method has nearly lnear complexty. We thus call ths method a superfast multfrontal method. The method s also memory effcent. By settng a relatvely large tolerance n the matrx approxmatons, the method can also work as an effcent and effectve precondtoner. We have developed a software package for the solver and varous HSS operatons Outlne of the paper. Ths paper s organzed as follows. In secton 2, we gve an ntroducton to the multfrontal method, nested dssecton, and HSS structures. Secton 3 demonstrates the low-rank property and presents an overvew of our new superfast multfrontal method. The detaled superfast structured multfrontal algorthm s dscussed n secton 4. Two major steps are covered: structured elmnaton of separators and structured matrx assembly (called extend-add). Some new HSS operatons are proposed. Secton 5 demonstrates the effcency of the superfast multfrontal method wth numercal experments n terms of the model problem and a lnear elastcty problem. Secton 6 gves some general remarks. 2. Revew of the multfrontal method and HSS representatons. In ths secton, we brefly revew the multfrontal method and HSS representatons whch buld our superfast structured multfrontal method Multfrontal method wth nested dssecton orderng. In the drect factorzaton of a sparse matrx, usually, the rows and columns of the matrx are frst ordered to reduce fll-n. Nested dssecton [19] s an mportant method for fndng an elmnaton orderng. Consder a dscretzed matrx and ts assocated mesh. Nested dssecton orders mesh ponts wth the ad of separators. A separator s a small set of mesh ponts whose removal dvdes the rest of the mesh or submesh nto two dsjont peces. The mesh s recursvely parttoned wth multple levels of separators. Lower level separators are ordered before upper level ones; see Fgure 2.1(). Here, by a level we mean a set of separators at the same level of partton. After the nodes are ordered, we compute the Cholesky factorzaton A = LL T by Gaussan elmnaton, whch corresponds to the elmnaton of the mesh ponts

4 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1385 () Partton wth separators () Connectons of ponts durng elmnaton Fg Separators n nested dssecton and the connectons of mesh ponts. (unknowns) from lower levels to upper levels. Mesh ponts may get connected durng the elmnaton (Fgure 2.1()). The elmnaton of a mesh pont parwse connects all ts neghbor ponts [19, 39]. The factorzaton process can be conducted followng the multfrontal method [16, 34]. The central dea of the multfrontal method s to reorganze the overall factorzaton of A nto partal updates and factorzatons of small dense matrces. It has been wdely used n numercal methods for PDEs, optmzaton, flud dynamcs, and other areas. Suppose j 1 steps of factorzatons of A are fnshed. Some portons of the frst j 1 columns of L contrbute to later computatons n the form of outer-product updates [34]. The nonzero porton of column j of A and some early outer-product contrbutons are assembled together by an operaton called extend-add. The result matrx s called a frontal matrx F j. One step of factorzaton of F j gves the jth column of L. The Schur complement s called the update matrx. See [34] for a formal dscusson of the procedure. To convenently consder how the update contrbutons are passed, a powerful tool called elmnaton tree s used. The followng defntons can be found n [33, 34, 40, etc.]. Defnton 2.1. The elmnaton tree T (A) of an N N matrx A s the tree structure wth N nodes {1,...,N} such that node p s the parent of j f and only f p =mn{ >j l j 0}, wherea = LL T and L =(l j ) N N s lower trangular. In addton, the concept of an assembly tree s gven n [34]. In ths work, we do not dstngush between these two types of trees. We also assume the elmnaton follows the postorderng of the elmnaton tree. We use a supernodal verson of the multfrontal method together wth nested dssecton to solve the dscretzed problems. Each separator n nested dssecton s consdered to be a node n the postorderng elmnaton tree. The separators are put nto dfferent levels of the tree (Fgure 2.2). Durng elmnaton, the separators are elmnated followng the postorderng elmnaton tree. The elmnaton of a separator wll connect all ts neghbor separator peces. For a separator, let,,...,p k be the peces of the unelmnated neghbor separators of whch are drectly or ndrectly connected to (due to matrx factorzaton). We say that the peces {,,,...,p k } form an element and that s the pvot separator of ths element. Fgure 2.3 shows two examples.

5 1386 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI () Orderng separators () Separator tree/nested dssecton elmnaton tree Level Fg Orderng separators. L R c 1 c 2 L R () Leaf node () Non-leaf node Fg Examples of elements. The orderng of,,, may not necessarly follow ther order n the elmnaton tree. If separator s a bottom level separator n nested dssecton (or a leaf node n the elmnaton tree) (Fgure 2.3()), the frontal matrx F s drectly formed from A: A A p1 A pk (2.1) F F 0 = A p1 A p1.. 0, U =. A 1 ( ) Ap1 A pk. A A pk pk The elmnaton of separator provdes the block column n L correspondng to, and the Schur complement s the th update matrx U. If separator s not a leaf node (Fgure 2.3()), we assume t has two chldren c 1 and c 2. The update matrces U c1 and U c2 represent contrbutons from the subtrees rooted at c 1 and c 2, respectvely. Then F s obtaned by assemblng F 0, U c 1,andU c2 wth the extend-add process, and the elmnaton of separator yelds U : ( )( ) (2.2) F = F 0 L 0 L T U c1 U c2 = L T B, I 0 U L B where B denotes the element boundary {,,...,p k }. For convenence, when presentng the deas of handlng the connectons of separators, we use the stuaton k = 4 as n the model problem. Stuatons wth a general k can be smlarly dscussed (subsecton 4.6). Here, as shown n Fgure 2.4, we say that the separator peces {c 1, p L 1,, p L 3, } form the left chld element and {c 2,p R 1,, p R 3,} form the rght chld element.

6 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1387 L R c 1 c 2 L R Fg In extend-add, separator peces n the left and rght chld elements, marked by the sold-lne oval box and the dashed-lne oval box, respectvely () Postorderng of a bnary tree () 2nd level HSS blocks () 1st level HSS blocks Fg Example: A bnary tree wth postorderng and two levels of HSS off-dagonal blocks of a matrx, where the ndces follow the postorderng of the tree. Recursve applcaton of the above procedure to all nodes of the elmnaton tree leads to a supernodal multfrontal method. The supernodal multfrontal method wth nested dssecton for factorzng A n (1.2) costs O(n 3 ) flops. The number of nonzeros n the Cholesky factor s O(n 2 log 2 n). A stack of sze O(n 2 ) s used to store the update matrces Herarchcally semseparable structures. Semseparable or quasseparable structures have attracted a lot of nterests n the recent years. In our superfast multfrontal method, frontal matrces and update matrces are approxmated by semseparable matrces. Semseparable forms of upper level frontal matrces are obtaned from lower level ones recursvely. In ths subsecton, we revew a tree-structured semseparable representaton. Note ths tree structure s used for each frontal matrx and s not assocated wth the outer assembly tree. Thus, ths subsecton can be understood ndependent of the multfrontal method. There are dfferent defntons for semseparable matrces [17, 18, 43, 44]. One defnton often used s based on the low-rankness of approprate off-dagonal blocks. Here we use the HSS off-dagonal blocks n [11, 12, 13, 14], as shown n the example n Fgure 2.5. We frst defne HSS blocks wth the ad of a full bnary tree (a bnary tree where each node except the root has exactly one sblng) and ts postorderng. Defnton 2.2 (HSS blocks). HSS blocks are block rows or columns excludng the dagonal parts defned at dfferent levels of splttngs of a matrx as follows. Gven a full bnary tree wth ts postorderng, an N N matrx H, and a partton sequence {m j } k j=1,where j,j =1, 2,...,k, are the leaf nodes of the tree and k j=1 m j = N, partton H nto k block rows (columns) followng {m j } k j=1 so that block row (column) j has m j rows (columns) of H. Any block row (column) excludng the m j m j dagonal block s called a bottom level HSS (off-dagonal) block. Assocate

7 1388 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI 7 3 B 3 B 6 6 W 1 W 2 W 4 W 5 R 1 B 1 R 2 R 4 B 4 R 5 1 B B 5 5 U 1, V1 U 2, V2 U 4, V4 D 1 D 2 D 4 D 5 () HSS tree U5, V5 D 1 U1B1V2 T D 2 U 3B 3V 6 T D 4 U4B4V5 T D 5 () Herarchcal form of the matrx Fg An HSS tree correspondng to Fgure 2.5 and the structured form of the matrx. wth each leaf a bottom level HSS block. An HSS block for an upper level node s defned recursvely from chld HSS blocks but wth the approprate dagonal block removed. We emphasze that postordered trees are used n ths paper so that the HSS blocks n Fgure 2.5() are ndexed followng the orderng of the tree nodes. Ths sgnfcantly smplfes the HSS notaton below and the codng, as dscussed n [11], snce for each node only one ndex s needed nstead of two or three as n [12, 13, 14]. A full bnary tree wth k leaves has totally 2k 1 nodes, where node 1 s the frst leaf and 2k 1 s the root. The bnary tree used n the above defnton s called an HSS tree (Fgure 2.6(), also called a merge tree n [12, 13, 14]), whch helps defne the HSS structure. Defnton 2.3 (HSS tree and HSS representaton). An HSS tree T =(V, E) that defnes an HSS representaton for a matrx H s the bnary tree n Defnton 2.2 and s further defned as follows. Let node 1 be the frst leaf, 2k 1 be the root, and j,j=1, 2,...,k, be the leaves of T. Eachnode V ( <2k 1) s assocated wth matrces D,U,V,R,W,B, whch are called generators of H. The HSS representaton of H s gven by the generators {R,W,B } 2k 2 =1 and {D j,u j,v j } k j=1,whch satsfy the recursve defnton of upper level generators D,U,andV : ( ) Dc1 U D = c1 B c1 Vc T 2, V s a nonleaf node, U c2 B c2 V T c 1 D c2 (2.3) ( ) ( ) Uc1 R U = c1 Vc1 W, V U c2 R = c1, V\{2k 1} s a nonleaf node, c2 V c2 W c2 so that at the top level, D 2k 1 H, wherec 1 and c 2 represent the left and rght chldren of, respectvely. Remark 2.4. Generators are ndexed followng the postorderng of the tree nodes. The generators D,U,V for a nonleaf node are not explctly stored. R and W are empty matrces f s a drect chld of the root. The followng s a block 4 4 HSS example correspondng to Fgure 2.5() and Fgure 2.6: (2.4) D 1 U 1 B 1 V2 T U 1 R 1 B 3 W4 T V 4 T U 1 R 1 B 3 W5 T V 5 T m 1 H = U 2 B 2 V1 T D 2 U 2 R 2 B 3 W4 T V 4 T U 2 R 2 B 3 W5 T V 5 T U 4 R 4 B 6 W1 T V 1 T U 4 R 4 B 6 W2 T V m 2 2 T D 4 U 4 B 4 V5 T. m 4 U 5 R 5 B 6 W1 T V 1 T U 5 R 5 B 6 W2 T V 2 T U 5 B 5 V5 T D 4 m 5 To see the herarchcal structure of (2.4), we can wrte H as ( ) D H = 3 U 3 B 3 V6 T m1 + m 2 U 6 B 6 V3 T, D 6 m 4 + m 5

8 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1389 correspondng to Fgure 2.5(), where the generators are obtaned by settng = 3, c 1 =1,c 2 = 2 n (2.3). However, only the generators n (2.4) are explctly stored. Remark 2.5. In the HSS representaton, -eachu s an approprate column bass matrx for an HSS block row. For example, the second level HSS block row assocated wth node =1s ( ( U 1 B1 V2 T R 1 B 3 W T 4 V4 T W5 T V )) 5 T. - we can verfy the followng [11]: to dentfy a block of H, say,the(2, 3) block n (2.4), we can use the drected path connectng the 2nd and 3rd nodes at the bottom level (nodes 2 and 4 as marked) of the HSS tree: U 2 2 R 2 3 B 3 6 W T 4 - for a symmetrc HSS matrx, we can set U = V,R = W,andB = Bj T, =1, 2,...,2k 2, where j represents the sblng of each. Theoretcally, an HSS form representaton can be constructed for any matrx H [11, 14] and an approprate HSS tree. However, such a representaton s generally more useful when the HSS blocks have small (numercal) ranks. Defnton 2.6 (numercal rank and HSS rank). In ths work, the numercal rank of any matrx block wth a relatve or absolute tolerance τ s the rank obtaned by applyng rank revealng QR factorzatons [5, 6] or τ-accurate SVD (SVD wth a tolerance τ for sngular values) to the block. For a matrx and an HSS tree, the maxmum of the numercal ranks of the HSS blocks at all tree levels s called the HSS rank (wth a gven τ) of the matrx. Later, we say a matrx s herarchcally separable f ts HSS rank s small wth a gven τ. For an HSS matrx wth a small HSS rank p, f all B generators n ts HSS representaton have szes close to p, we say that the HSS form s compact. It s shown n [11, 14] that for a matrx n compact HSS form, nearly lnear complexty system solvers exst. Many other HSS matrx operatons such as structure generaton, compresson, etc., are also very effcent. The reader s referred to [11, 12, 13, 14] for more detals on HSS representatons. 3. Superfast multfrontal method: Low-rank property and overvew. Notce that n the multfrontal method for dscretzed matrces the frontal and update matrces are generally dense because of the mutual connectons among mesh nodes (Fgure 2.1()). The elmnaton together wth the extend-add operaton on such a dense N N matrx typcally take O(N 3 ) flops n exact arthmetc. Here we consder approxmatons of these dense matrces. Approxmatons of dense matrces are feasble n solvng lnear systems derved from dscretzatons of certan PDEs such as ellptc equatons, as we dscover that low-rank propertes exst n these problems. Smlar results can also be found n [1, 2, 8, 24, 25, 35]. In ths work, we take advantage of the herarchcal tree structures of both HSS matrces and the multfrontal method. We have developed a seres of effcent HSS operatons [11, 14]. Addtonal HSS operatons necessary for our superfast multfrontal method wll be presented here. Wth these technques, we are able to produce a structured multfrontal method, and we reduce the total complexty for solvng dscretzed problems such as (1.2) from O(n 3 )too(pn 2 ) and storage from O(n 2 log 2 n) to O(n 2 log 2 p), where p s a parameter related to the problem and the tolerance for matrx approxmatons. V T 4.

9 1390 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI Table 3.1 Numercal ranks wth dfferent relatve tolerances τ of four F B blocks from mesh dmensons n = 127,255,511 and 1023, respectvely. Note the sze of F B can be larger than n. sze(f B ) τ Table 3.2 HSS ranks of an order 1023 block F wth dfferent τ, where the bottom level HSS block row sze s about 16 and a perfect bnary HSS tree wth 64 nodes s used. τ HSS rank Off-dagonal numercal ranks. It has been shown n [1, 2, 8, 24, 25] that the low-rank property exsts n the LU factorzatons of fnte-element matrces from ellptc operators and some other problems. Here n the context of the supernodal multfrontal method, we show some rank results of the frontal and update matrces. For a separator and all ts neghbors (denoted B) as shown n Fgure 2.3, we order them and ther nteror nodes properly (Defnton 3.1 below). The correspondng frontal matrx F has the followng form: ( ) ( )( F F (3.1) F = B L 0 L T = L T B F B F BB L B I 0 U where the elmnaton of separator gves the update matrx U. For the frontal and update matrces n the supernodal multfrontal method for solvng Model Problem 1.1, we have the followng crtcal rank observatons: The off-dagonal block F B has a small numercal rank. The HSS blocks of F have small numercal ranks. The HSS blocks of U have small numercal ranks. Some rank results are reported as follows. (1) Numercal rank of F B. We choose some frontal matrces F and compute the numercal rank of F B n each F. Table 3.1 shows the rank results. (2) Off-dagonal numercal ranks of F. We then test the HSS ranks of F wth dfferent relatve tolerances. As an example, we use the frontal matrx correspondng to the top level separator of a mesh. In such a stuaton, F ( F )has order We choose a fxed block row sze and make all bottom level HSS offdagonal blocks to have approxmately the same row dmenson, so that the HSS tree s a perfect bnary tree. (As an example, when there are four block rows, a bnary tree as n Fgure 2.6 s used. Agan, ths bnary tree s not related to the elmnaton tree.) Table 3.2 shows the HSS ranks, whch are relatvely small as compared wth the sze of F. (3) Off-dagonal numercal ranks of U. Smlarly, Table 3.3 shows HSS ranks of an update matrx U wth dfferent tolerances. The mesh dmenson s n = Smlar stuatons hold for the frontal matrces. When n s larger, the low-rank property s more sgnfcant. The low-rank property has certan physcal background. For example, the paper [4] consders a 2D physcal model consstng of a set of partcles wth parwse nteractons satsfyng Coulomb s law. The authors defne well-separated sets of partcles to ),

10 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1391 Table 3.3 HSS ranks of a update matrx U wth dfferent τ, where the bottom level HSS block row sze s about 16 and a perfect bnary HSS tree wth 64 nodessused. τ HSS rank Fg Examples of well-separated sets n an element. be the sets that have strong nteractons wthn sets but weak ones between dfferent sets. Here when we consder 2D meshes for dscretzed problems, we have a smlar stuaton. In Fgure 2.1() the mesh ponts n the current-level element are mutually connected. But the connectons dffer for dfferent ponts. We can thnk of the ponts closer to each other or not well separated to have stronger connectons; see Fgure 3.1. For theoretcal analyss of the low-rank property, see, e.g., [1, 2, 8, 24, 25] Overvew of the structured extend-add process and the structured multfrontal method. We can take advantage of the prevous low-rank property n the multfrontal process to get a new structured method. There are two major tasks: to replace the tradtonal dense Cholesky factorzaton by a structured one; to develop a structured extend-add process. We use HSS matrces to approxmate frontal and update matrces. HSS matrces can be quckly factorzed due to the low-rank property. HSS forms are accumulated bottom-up along the assembly tree. The structured extend-add process s relatvely complcated, snce the mesh nodes and separators are generally not consstent wth the HSS block parttons. We consder a separator, ts four neghbor peces,,, at upper levels of the assembly tree, and ts chldren c 1 and c 2 as n Fgure 2.3(). In the tradtonal multfrontal method, the frontal matrx F s obtaned from the extend-add operaton (2.2) (3.2) F = F 0 U c1 U c2 F 0 + Ûc 1 + Ûc 2, where each Ûc s a subtree update matrx obtaned from U c by matchng ndces to F 0 and nsertng zero entres [34]. The matrces n (3.2) take the nonzero patterns as llustrated n Fgure 3.2. There are three key ssues n developng the structured extend-add process. (1) Unform orderng. Frstly, n order to effectvely conduct (3.2) and to handle the nteractons of elements, we need to match the orderng of separators and mesh ponts at dfferent levels. The orderng can be predetermned n a symbolc factorzaton stage. We defne the followng unform orderng whose effectveness n revealng the low-rank property s shown by our numercal experments. Defnton 3.1 (unform orderng). The separator peces and mesh ponts wthn an element are unformly ordered f the neghbors are ordered counterclockwse as

11 1392 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI 1 L pl 1 1 L 1 L 1 pr 1 R 1 R R 1 1 Fg Matrces n extend-add (3.2), where the + shaped bars correspond to the overlap p L 1 pr 1 n separator. () Unform orderng () Transpose of () () Rank pattern of F Fg Unform orderng of neghbors and mesh ponts and the resultng rank pattern of F. Table 3.4 Makng chld element separator peces consstent wth the current level peces,,,, followng the unform orderng. Matrx Unform orderng Permutaton Paddng zero blocks F,,,, / / U c1 U c2,p L 1,,pL 3, p R 1,,p R 3, p L 1,,p L 3, p R 1,pR 3,, {p L 1, 0},, {p L 3, 0}, 0, {0,p R 1 }, 0, {0,pR 3 }, shownnfgure3.3() (or clockwse n Fgure 3.3(), whch can be consdered as a transpose of Fgure 3.3()), and the mesh ponts nsde each separator pece are ordered followng the natural orderng of mesh ponts (left-rght and top-down). Accordng to the unform orderng for Fgure 2.3(), we have the orderng of the separator peces and ther correspondng matrces as shown n the frst two columns of Table 3.4. Clearly, the neghbor orderngs assocated wth U c1 and U c2 do not match wth F. Thus, permutatons of the separator peces n the two chld elements are needed for (3.2); see the thrd column of Table 3.4. (2) Incompatble separator peces. Secondly, the separator peces for U c1 are not fully compatble wth those for U c2. That s, separator peces n one chld element may not appear n the other. For example, appears n the left chld element but not n the rght one. Thus, we need to nsert some zero blocks nto U c1 and U c2. In terms of the separators, we attach a zero pece to p L 1 so that the length of {p L 1, 0} s consstent wth. Other separators are processed smlarly; see the fourth column of Table 3.4.

12 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1393 F p U p structured factorzatons swtchng level F U F j U j F c 1 U c1 F c 2 U c2 tradtonal factorzatons Fg Illustraton of the superfast multfrontal method, where unstructured matrces are n dark color and others are structured. In each oval box, a frontal matrx s partally factorzed and, the update matrx s computed. (3) Overlaps. Lastly, the chld elements share (parts of) the neghbors. The left and rght chld elements share the entre separator, whch becomes the pvot separator n the upper element. The pece p L 1 n the left chld element and pr 1 n the rght one satsfy p L 1 pr 1 =. In general, p L 1 pr 1 s nonempty (Fgures 2.3(), 2.4, and 3.2) and s shared by both the left and rght chld elements. Furthermore, p L 1 p R 1 may not always correspond to an entre HSS block row/column. Then certan blocks may need to be splt and merged wth others. The technques n subsecton can be used. A smlar stuaton holds for p L 3 and p R 3. All the matrx operatons are done n HSS forms to provde a structured extendadd process. After F s formed, we elmnate the pvot separator and compute the Schur complement wth the fast HSS algorthms n [11]. The structured extend-add s used agan, and the process repeats. Before gong nto the detals of the HSS operatons, we gve an overvew of the structured multfrontal algorthm. A pctoral llustraton s shown n Fgure 3.4. Algorthm 3.2 (Structured supernodal multfrontal method). 1. Use nested dssecton to order the nodes. Buld postorderng elmnaton tree of separators. 2. Do tradtonal factorzaton and extend-add at certan bottom levels. 3. At a swtchng level, construct HSS approxmatons of update matrces and do structured extend-add. 4. Followng the nested dssecton orderng, do structured factorzaton and extend-add at each upper level. (a) Elmnate a separator by factorzng the pvot blocks of the structured frontal matrx to obtan the structured update matrx. (b) Do structured extend-add and repeat. Two layers of trees are used: the outer layer elmnaton tree and an nner layer HSS tree for each separator. Steps 4(a) and 4(b) are the two major structured operatons. 4. Superfast multfrontal method: Detaled algorthm. Accordng to the prevous secton, n the supernodal multfrontal method, the update matrces and frontal matrces are approxmated by compact HSS matrces. There are two major

13 1394 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI W R W j B R j B j j W j R j j W B j R B Fg Permutng two subtrees. tasks: the structured elmnaton ste(a) of Algorthm 3.2 and the HSS structured extend-add n ste(b) of Algorthm 3.2. In ths secton we brefly revew a fast generalzed HSS factorzaton n [11] and then dscuss n detal some new HSS algorthms whch buld the HSS structured extend-add Fast generalzed HSS Cholesky factorzaton. The elmnaton of a separator wth order N can be done by the fast generalzed HSS Cholesky factorzaton n [11] n O( N)flops,wherepsan approprate HSS rank. The fast generalzed HSS Cholesky factorzaton computes explct factorzatons of HSS matrces where the factors (called generalzed HSS factors) consst of trangular matrces, permutatons, and other orthogonal matrces. For a gven SPD HSS matrx wth generators {D j }, {U j ( V j )}, {W j ( R j )}, {B j }, the major steps nclude the followng: 1. Introduce zeros nto off-dagonal blocks by compressng U j generators. The compresson s done by rank revealng QR factorzatons or τ-accurate SVD. 2. Partally factorze D j. The subblock of D j correspondng to zero off-dagonal entres are elmnated. 3. Merge the unelmnated subblock of D j wth that of the sblng of j n the HSS tree. Pass the block to the parent. 4. The HSS matrx s reduced to a new one wth a smaller sze and fewer blocks. Repeat the process. Ths process s appled to F of an HSS form frontal matrx F as n (3.1). Ths elmnaton corresponds to the removal of the subtree for F from the HSS tree for F. Or, more specfcally, after ths elmnaton the HSS subtree for F shrnks to one sngle node. The generators assocated wth ths sngle node are used to update the rest nodes. Ths leads to the Schur complement, or the update matrx U n HSS form. The detals are gven n [11]. Note that L n (3.1) s now the generalzed HSS factor (see [11] for an example), and U = F BB L L T s essentally computed by a structured low-rank update. Ths update s fast because F BB and L share some common generators Some basc HSS operatons needed n structured extend-add. In order to convert the standard extend-add process nto a structured one, we need some basc HSS operatons, whch are used to address the ssues dscussed n subsecton 3.2. These operatons nclude permutng, mergng/splttng, and nsertng/deletng HSS blocks n an HSS matrx Permutng HSS blocks. It s convenent to permute an HSS matrx by permutng ts HSS tree. We can generally get the new HSS form of the permuted matrx by updatng just a few generators. For example, consder permutng two neghbor block rows/columns at a certan level of the HSS matrx. Ths corresponds to the permutaton of two neghbor HSS subtrees wth roots beng sblngs; see Fgure 4.1. Thus n ths smple stuaton, we can drectly exchange generators assocated wth and j, and all ther chldren, wthout updatng the matrces. The new matrx

14 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS 1395 W c1 R c1 c 1 U c1, Vc1 D c1 R W W c2 B c1 R c2 B c2 c 2 U c2, Vc2 D c2 U, V D R W Fg Mergng and splttng nodes of an HSS tree. s stll n HSS form. For more complcated stuatons, say, f the two subtrees are not neghbors, we can dentfy the path connectng the two subtrees and update the matrces assocated wth the nodes n the path and those connected to the path. As ths vares for dfferent stuatons, we come back to t n subsecton 4.3, where the permutatons are specfcally desgned for our superfast multfrontal method Mergng and splttng HSS blocks. We frst look at a smple stuaton. (1) Basc mergng and splttng. The herarchcal structure of HSS matrces makes t very convenent to merge and to splt HSS blocks. For a node wth two chldren c 1 and c 2 n an HSS tree (Fgure 4.2), we can merge c 1 and c 2 and update node as n (2.3) n Defnton 2.3. On the other hand, f we want to splt nto c 1 and c 2, then we need to fnd D ck, U ck, V ck, R ck, W ck, B ck, k =1, 2 such that (2.3) s satsfed. Frst, partton U,V,D conformally as ( ) ( ) ( ) U;1 V;1 Dc1 D (4.1) U =, V U =, D ;2 V = ;1,2. ;2 D ;2,1 D c2 Then compute QR factorzatons ( ) ( ) ( ) T T (4.2) U;1 D ;1,2 = Uc1 Rc1 T 1, 1 V ;2 (4.3) ( ) B T = V c1 c2, W c2 ( ) ( ) ( ) ( ) T T U;2 D ;2,1 = Uc2 Rc2 T 2, 2 B T = V c2 V c2. ;1 W c1 Equatons (4.1) (4.3) provdes all the necessary new generators. (2) Advanced mergng and splttng. There are more complcated stuatons that are very useful. Sometmes, we need to mantan the tree structure durng mergng or splttng. For an HSS matrx H, weconsdersplttngapecefromaleafnode of the HSS tree of H and then to mergng that pece wth a neghbor j. We look at a general stuaton where and j are not sblngs. Wthout loss of generalty, we make two smplfcatons. One s that the matrx s symmetrc; another s that block j s an empty block (or zero block whose sze s to be set by the splttng). It suffces to look at an example n Fgure 4.3, where =5andj = 10. We frst splt node nto two chld nodes c 1 and c 2 by the method above: (4.1) (4.3). Then move c 2 to the poston of j. Afterthatwemergec 1 nto. The detals are as follows. Identfy the path connectng nodes c 2 and j: p(c 2,):c We observe that n order to get a new HSS representaton for H (denoted H), all the generators assocated wth the nodes n ths path and those drectly connected to

15 1396 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI 15 1 R 1 9 B R 8 1 R 2 B R3 R 7 R 9 R 14 B 9 B3 7 R 6 R R 4 B R 5 4 R D 4, D 5, U5 U4 D10, U10 12 R R c 1 c 2 Fg Splttng a block (node 5) ofanhsstree,wherec 1 and c 2 are vrtual nodes. t should be updated. We use tlded notaton for the generators of H, wth c1 not merged to yet. Call the subtrees rooted at nodes 9 and 14 left and rght subtrees, respectvely. To get the HSS tree for H, we consder the followng connectons, where by a connecton we mean the product of the generators assocated wth the path connectng two nodes. - The connecton between nodes 15 and c 2 should be transferred to the connecton between 15 and The connecton between node c 2 and each node k {1, 2, 3, 4,c 1 } that s drectly connected to the path p(c 2,) should be transferred to the connecton between nodes 10 and k. - The connecton between any two nodes k 1,k 2 {1, 2, 3, 4,c 1 } that are drectly connected to the path p(c 2,) should reman the same. - The connectons between node 15 and each node k {1, 2, 3, 4,c 1 } that s drectly connected to the path p(c 2,) should reman the same. All these relatons can be reflected by approprate products of generators assocated wth the nodes; see [46] for the detals. We can assemble all the matrx products nto one sngle equaton 0 RT 14 Ũ14 T R 1 ( R 9 B9 Ũ T 14) R 2 ( B 1 T R 8 ( R 9 B9 T )) R 3 ( B 2 T R 7 ( B 1 T R 8 ( R 9 B9 Ũ14))) T R 4 ( B 3 T R 6 ( B 2 T R 7 ( B 1 T R 8 ( R 9 B9 Ũ14)))) T R c1 ( B 4 T R 5 ( B 3 T R 6 ( B 2 T R 7 ( B 1 T R 8 ( R 9 B9 Ũ14 T ))))) 0 R9 T RT 8 RT 7 RT 6 RT 5 RT c 2 Uc T 2 R 1 R 9 B 1 R7 T RT 6 RT 5 RT c 2 U T c 2 = R 2 (B1 T R 8 R 9 ) B 2 R6 T RT 5 RT c 2 Uc T (4.4) 2 R 3 (B2 T R 7 (B1 T R 8 R 9 )) B 3 R5 T Rc T 2 Uc T, 2 R 4 (B3 T R 6(B2 T R 7 (B1 T R 8 R 9 ))) B 4 Rc T 2 Uc T 2 R c1 (B4 T R 5(B3 T R 6(B2 T R 7 (B1 T R 8 R 9 )))) B c1 Uc T 2 where Ũ14 Ũ10 R 10 R11 R12 R13. Equaton (4.4) s parttoned nto four nonzero blocks correspondng to the above four types of connecton changes. It turns out that we

16 3 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS R 3 B3 R 6 6 R 1 B 1 R 2 R 4 B 4 R D 1, U1 D 2, U2 D 4, U4 D 5, U5 7 B B R 3 B3 R 6 D 5, U5 R 1 R 2 R p R B 1 B p 1 2 p 5 R 1 R 2 R R 4 D B 1 B 1, U1 D 2, U2 R R 4 D 5, U5 B D 1, U1 D 2, U2 D, U D 4, U4 D, U D 4, U4 () HSS tree example () Insertng a node () Another way to nsert Fg Insertng a node nto an HSS tree n two dfferent ways, where dark nodes and edges should be updated or created. can construct the generators on the left-hand sde of (4.4) n the followng sequental way. Frst, consder the last row n (4.4). Compute a QR factorzaton of the rght-hand sde such that ( Rc1 ( B T 4 R 5 ( B T 3 R 6 ( B T 2 R 7 ( B T 1 R 8 ( R 9 B9 Ũ T 14))))) ) = Q 1 T 1. Then partton T 1 =(T 1;1 T 1;2 ) such that T 1;1 has the same column dmenson as = T 1;1,and (4.5) B T 4. Thus, we can let R c1 = Q 1, BT 4 ( R5 ( B T 3 R 6 ( B T 2 R 7 ( B T 1 R 8 ( R 9 B9 Ũ T 14)))) ) = T 1;2. In ths way, one layer (the last row) s removed from (4.4). Next, combne (4.5) wth the fourth row of (4.4): ( R4 R 5 ) ( BT 3 R6 ( B T 2 R 7 ( B T 1 R 8 ( R 9 B9 Ũ14))) ) ( ) T B4 Rc = T 2 Uc T 2. T 1;2 Agan, compute a QR factorzaton Q 2 T 2 of the rght-hand sde. Then partton Q T 2 =(Q T 2;1 Q T 2;2) such that Q 2;1 has the same row dmenson as B 4, and partton T 2 =(T 2;1 T 2;2 ) such that T 2;1 has the same column dmenson as B3 T. We can set = T 2;1,and (4.6) R 4 = Q 2;1, R 5 = Q 2;2, BT 3 ( R 6 ( B T 2 R 7 ( B T 1 R 8 ( R 9 B 9 Ũ T 14))) ) = T 2;2. Now, we can combne (4.6) wth the thrd row of (4.4), and the above procedure repeats. Fnally, t s trval to merge node c 2 nto. The overall process costs no more than O(N) flops for an order-n matrx H Insertng and deletng HSS blocks. Sometmes, we need to nsert a block row/column to an HSS matrx or to remove one from t. To remove a block s usually straghtforward. For smplcty, n ths subsecton we consder symmetrc HSS matrces. To remove a node from an HSS tree, we remove any generators assocated wth the subtree rooted at and merge the sblng node j of nto ts parent p by settng U p = U j R j and D p = D j f j s a leaf node. To nsert a block row/column nto an HSS matrx, the result depends on the desred HSS structure. For example, suppose Fgure 4.4() s the orgnal HSS tree for an HSS matrx, and we nsert a new node betweennode2and4togetanewmatrx

17 1398 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI wth a tree structure as n Fgure 4.4() or Fgure 4.4(). We need only to update a few generators (shown n dark n Fgure 4.4). Agan, we can consder the connecton changes between nodes and use QR factorzatons to fnd the new generators. The detals are smlar to those n the prevous subsecton. Note that n Fgure 4.4, the HSS tree can be more general and that the node to be nserted can also represent another HSS tree. In all cases, we need only to update few generators to get the new matrx. Thus the overall process s fast HSS structured extend-add. In ths subsecton, we use the prevous basc HSS operatons to buld the structured extend-add process, as outlned n subsecton 3.2. We consder a general element and ts two chld elements, as shown n Fgure 2.3() or 2.4, where s the pvot separator n the assembly tree. The frontal and update matrces n the extend-add F = F 0 U c 1 U c2 F 0 + Ûc 1 + Ûc 2 have the relatonshp as shown n Fgure 3.2, where all matrces should now be n HSS forms. The general HSS extend-add procedure s as follows. Assume that, before the extend-add, the frontal matrces F c1 and F c2 for the left and rght chld elements, respectvely, are already n HSS forms. (These HSS forms come recursvely from lower level separators or from smple constructons at the startng level of structured factorzatons.) For smplcty, assume each separator s represented by one leaf of the HSS tree, although each leaf can be potentally a subtree. There are then fve leaf nodes n each HSS tree. As we use full bnary HSS trees, t s natural to use trees as shown n the frst row of Fgure 4.5. A tree lke that has the mnmum depth among all full bnary trees. Note that the separators are ordered wth the unform orderng. Followng the unform orderng of the separator peces,,,,,thehsstree of F has the form n Fgure 4.6. Therefore, we should transform the tree structures n the frst row of Fgure 4.5 to the structure n Fgure 4.6 (also see Table 3.4). Fgure 4.5 shows the process of generatng Ûc 1 and Ûc 2 from F c1 and F c2, respectvely. The HSS trees of F c1 and F c2 are shown n the frst row of Fgure 4.5, wth ther leaf nodes marked by the separators n Fgure 2.4. Typcally, there are fve steps as follows for an extend-add operaton to advance from the level of c 1 and c 2 to the level of (for convenence, we also nclude the partal factorzaton of frontal matrces at the begnnng): 0. Elmnate c 1 and c 2 and get update matrces U c1 and U c2 n HSS forms. 1. Permute the trees for U c1 and U c2, as n the thrd column of Table Insert approprate zero nodes to the HSS trees of U c1 and U c2 to get Ûc 1 and Û c2, respectvely; see the fourth column of Table Splt HSS blocks n Ûc 1 and Ûc 2 to handle overlaps n separators. 4. Wrte the ntal frontal matrx F 0 n HSS form based on the tree structure of U c1 U c2 Ûc 1 + Ûc Get F by addng HSS matrces F 0, Ûc 1, Ûc 2.CompressF when necessary. Step 0 can be done by applyng the generalzed HSS Cholesky factorzaton n subsecton 4.1 to the leadng prncpal blocks of F cj correspondng to separator c j for j = 1, 2 (frst row n Fgure 4.5). When separator c j s removed, the Schur complement/update matrx U cj s obtaned by updatng the rest tree nodes. Ste s to permute some branches of the HSS tree of U cj. Note that even f the HSS tree has more levels, we stll just need to update few top level nodes because we need only to permute the four separator peces. Ths means the cost of permutatons n the superfast multfrontal method s O(1), even f the update matrx has dmenson N. Specfcally, to permute U c1 (Fgure 4.5, left column, from row 1 to row 2), we exchange and, as shown n the thrd column of Table 3.4. We can set new

18 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS L 3 6 R 1 c L 1 c R F c1 (elmnaton and permutaton) 7 F c2 (elmnaton and permutaton) L L 2 R U c1 (permuted) U c2 (permuted) (nsertng zero blocks) (nsertng zero blocks) R 0 L {, 0} L {, 0} {0, p R 1 } 0 {0, p R 3 } Û c1 () Operatons on the left chld element Û c2 () Operatons on the rght chld element Fg Operatons on chld elements n the structured extend-add process (3.2). generators (n tlded notaton) of the permuted tree to be B 5 = B4 T, R8 = B7 T RT 6, B2 = R 2 R 3 B 7, R2 = R 2 B 3, B3 = I. Smlarly, to permute U c2 (Fgure 4.5, rght column, from row 1 to row 2), we exchange and p R 3. The new generators of the permuted tree should satsfy B 2 = R 2 R 3 R4 T, B8 ( ) ( = B7 T RT 6 RT 5, ) R2 ( ) R2 R B 3 RT R 8 RT 5 = 3 B 7 R 2 R 3 R5 T. 4 R 4 R 6 B 7 B 4

19 1400 J. XIA, S. CHANDRASEKARAN, M. GU, AND X. S. LI L R L R Fg HSS tree structure of F, where the dark bars represent overlaps p L 1 pr 1 and pl 3 pr 3, respectvely. An SVD of the rght-hand sde of the last equaton can provde all the generators on the left-hand sde. Although these formulas look specfc, they are suffcent for general extend-add n the superfast multfrontal method. At ste, we nsert zero blocks nto the permuted update matrces to get Ûc 1 and Ûc 2. For the left chld element, a zero block (node) s attached to the matrx (tree); see Fgure 4.5, left column, row 3. Ths operaton s trval n that only certan zero generators should be added. For the rght chld element, a zero node s nserted between p R 1 and p R 3. Ths has already been dscussed n subsecton After we get the HSS trees as shown n the last row of Fgure 4.5, we use ste to handle the overlaps so that Ûc 1 and Ûc 2 wll have the same HSS tree structure and row/column ndces. As dscussed n secton 3.2, overlaps occur n separators and ; see Fgures 2.3() and 2.4. As an example, the overlap p L 1 pr 1 may correspond to HSS blocks n both Ûc 1 and Ûc 2 or, more lkely, parts of ther HSS blocks. For the latter case, n order to match the HSS structures of p L 1 pr 1 n Ûc 1 and Ûc 2, we need to cut p L 1 pr 1 from ether pl 1, on the rght end, or from pr 1, on the left end, and to merge t wth a nearby zero block. We use the splttng procedure n subsecton Now, Ûc 1 and Ûc 2 have the same HSS tree structure, whch becomes the structure of Û c1 + Ûc 2 and also F (Fgure 4.6). Then at ste, we convert the ntal frontal matrx F 0 nto an HSS form followng the HSS structure of Ûc 1 +Ûc 2. Because the orgnal matrx A s sparse, F 0 s generally sparse also. We are often able to wrte the HSS form of F 0 n advance. For example, for Model Problem 1.1 wth nested dssecton orderng, F 0 n (2.1) has the sparsty pattern where A s trdagonal, A p2 =0,A p4 =0,andeachofA p1 and A p3 has only one nonzero entry. Such a matrx has HSS rank 2. Now at step 5, we are ready to get F by computng the HSS sum of F 0, Û c1, and Ûc 2 wth formulas n [11]. The szes of the generators of F ncrease after ths addton, although the actual HSS rank of F wll not. Thus usually the HSS addton s followed by a compresson step [11]. Now F s n compact HSS form, and we can contnue the factorzaton along the elmnaton tree Algorthm and performance. Based on the prevous dscussons, we present the man superfast multfrontal algorthm and ts analyss. Before that, we frst clarfy a few mplementaton ssues for the HSS operatons.

20 SUPERFAST MULTIFRONTAL METHOD FOR LINEAR SYSTEMS Implementaton ssues. One ssue s related to the mesh boundary. For convenence, we can assume the mesh boundary corresponds to empty separators. We may then have empty nodes n HSS trees. Empty nodes do not accumulate or change and are not assocated wth any actual operaton. Another ssue s to predetermne the HSS structures before the actual factorzatons. Smlar to other sparse drect solvers, we can have a symbolc factorzaton stage whch s used after nested dssecton to approxmately predct the HSS structures of the frontal/update matrces n the elmnaton. Fnally, for the purpose of computatonal performance, we usually avod too large or too small HSS block szes Factorzaton algorthm. We provde the superfast multfrontal method and analyze ts performance. Algorthm 4.1 (Superfast multfrontal method wth HSS structures). 1. Use nested dssecton to order the nodes n the n n mesh. Buld an elmnaton tree wth separator orderng. Assume the total number of separators to be k and the total number of levels to be l = log 2 n. 2. Decde l 0, the number of bottom levels of tradtonal factorzatons (see Theorem 4.2 below). 3. For separators =1,...,k (a) If separator s at level l >l l 0, do tradtonal Cholesky factorzaton and extend-add.. If s a leaf node n the elmnaton tree, obtan the frontal matrx F from A and compute U as n (2.1). Push U onto an update matrx stack.. Otherwse, pop two update matrces U c1 and U c2 from the update matrx stack, where c 1 and c 2 are the chldren of. Use extend-add to form the frontal matrx F as n (2.2). Factorze F and get U as n (2.2). Push U onto the update matrx stack. (b) If separator s at the swtchng level l = l l 0,. Followng ste(a), buld F, factorze ts pvot block, and get U.. Construct a smple HSS form for U wth few blocks (1, 2, or 4, etc.). Push the HSS form of U onto the update matrx stack. (c) Otherwse (separator s at level l <l l 0 ), do structured factorzaton and extend-add at upper levels.. Pop two HSS matrces U c1 and U c2 from the stack. Use HSS extendadd to form the frontal matrx F, as n Fgure Compute the generalzed HSS Cholesky factorzaton of the leadng prncpal blocks of F and compute the Schur complement whch s U (n HSS form). Push the HSS form of U onto the stack. About the complexty and storage requrement of the algorthm, we have the followng theorem. Theorem 4.2. Assume p s the maxmum of all HSS ranks of the frontal and update matrces throughout the multfrontal method. Then the optmal complexty of Algorthm 4.1 s O(pn 2 ). In ths stuaton, the number of bottom levels of tradtonal Cholesky factorzatons s l 0 = O(log 2 p), the bottom level tradtonal Cholesky factorzatons and upper level structured factorzatons take the same amount of work, the storage requred for the factors s O(n 2 log 2 p), and the update matrx stack sze s O(pn).

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

arxiv: v3 [cs.na] 18 Mar 2015

arxiv: v3 [cs.na] 18 Mar 2015 A Fast Block Low-Rank Dense Solver wth Applcatons to Fnte-Element Matrces AmrHossen Amnfar a,1,, Svaram Ambkasaran b,, Erc Darve c,1 a 496 Lomta Mall, Room 14, Stanford, CA, 9435 b Warren Weaver Hall,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using

More information

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids) Structured meshes Very smple computatonal domans can be dscretzed usng boundary-ftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman) CS: Algorthms and Data Structures Prorty Queues and Heaps Alan J. Hu (Borrowng sldes from Steve Wolfman) Learnng Goals After ths unt, you should be able to: Provde examples of approprate applcatons for

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Multiblock method for database generation in finite element programs

Multiblock method for database generation in finite element programs Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs

More information

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme Mathematcal and Computatonal Applcatons Artcle A Fve-Pont Subdvson Scheme wth Two Parameters and a Four-Pont Shape-Preservng Scheme Jeqng Tan,2, Bo Wang, * and Jun Sh School of Mathematcs, Hefe Unversty

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements Explct Formulas and Effcent Algorthm for Moment Computaton of Coupled RC Trees wth Lumped and Dstrbuted Elements Qngan Yu and Ernest S.Kuh Electroncs Research Lab. Unv. of Calforna at Berkeley Berkeley

More information

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell Module 6: FEM for Plates and Shells Lecture 6: Fnte Element Analyss of Shell 3 6.6. Introducton A shell s a curved surface, whch by vrtue of ther shape can wthstand both membrane and bendng forces. A shell

More information

NOVEL CONSTRUCTION OF SHORT LENGTH LDPC CODES FOR SIMPLE DECODING

NOVEL CONSTRUCTION OF SHORT LENGTH LDPC CODES FOR SIMPLE DECODING Journal of Theoretcal and Appled Informaton Technology 27 JATIT. All rghts reserved. www.jatt.org NOVEL CONSTRUCTION OF SHORT LENGTH LDPC CODES FOR SIMPLE DECODING Fatma A. Newagy, Yasmne A. Fahmy, and

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm Internatonal Journal of Advancements n Research & Technology, Volume, Issue, July- ISS - on-splt Restraned Domnatng Set of an Interval Graph Usng an Algorthm ABSTRACT Dr.A.Sudhakaraah *, E. Gnana Deepka,

More information

Reading. 14. Subdivision curves. Recommended:

Reading. 14. Subdivision curves. Recommended: eadng ecommended: Stollntz, Deose, and Salesn. Wavelets for Computer Graphcs: heory and Applcatons, 996, secton 6.-6., A.5. 4. Subdvson curves Note: there s an error n Stollntz, et al., secton A.5. Equaton

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

UNIT 2 : INEQUALITIES AND CONVEX SETS

UNIT 2 : INEQUALITIES AND CONVEX SETS UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES CHAPTER 10: ALGORITHM DESIGN TECHNIQUES So far, we have been concerned wth the effcent mplementaton of algorthms. We have seen that when an algorthm s gven, the actual data structures need not be specfed.

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Line Clipping by Convex and Nonconvex Polyhedra in E 3

Line Clipping by Convex and Nonconvex Polyhedra in E 3 Lne Clppng by Convex and Nonconvex Polyhedra n E 3 Václav Skala 1 Department of Informatcs and Computer Scence Unversty of West Bohema Unverztní 22, Box 314, 306 14 Plzeò Czech Republc e-mal: skala@kv.zcu.cz

More information

Intra-Parametric Analysis of a Fuzzy MOLP

Intra-Parametric Analysis of a Fuzzy MOLP Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information