1.5D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY
|
|
- Darcy Horton
- 5 years ago
- Views:
Transcription
1 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY ENVER KAYAASLAN, BORA UÇAR, AND CEVDET AYKANAT Abstract. There are three common parallel sparse matrix-vector multiply algorithms: D row-parallel, D column-parallel and D row-column-parallel. The D parallel algorithms offer the advantage of having only one communication phase. On the other hand, the D parallel algorithm is more scalable due to a high level of flexibility on distributing fine-grain tasks, whereas they suffer from two communication phases. Here, we introduce a novel concept of heterogeneous messages where a heterogeneous message may contain both input-vector entries and partially computed output-vector entries. This concept not only leads to a decreased number of messages but also enables fusing the input- and output-communication phases into a single phase. These findings are utilized to propose a.d parallel sparse matrix-vector multiply algorithm which is called local row-column-parallel. This proposed algorithm requires local fine-grain partitioning where locality refers to the constraint on each fine-grain task being assigned to the processor that contains either its input-vector entry, or its output-vector entry, or both. This constraint, nevertheless, happens to be not very restrictive so that we achieve a partitioning quality close to that of the D parallel algorithm. We propose two methods for local fine-grain partitioning. The first method is based on a novel directed hypergraph partitioning model that minimizes total communication volume while maintaining a load balance constraint as well as an additional locality constraint which is handled by adopting and adapting a recent and simple yet effective approach. The second method has two parts where the first part finds a distribution of the input- and output-vectors and the second part finds a nonzero/task distribution that exactly minimizes total communication volume while keeping the vector distribution intact. We conduct our experiments on a large set of test matrices to evaluate the partitioning qualities and partitioning times of these proposed.d methods. Key words. sparse matrix partitioning, parallel sparse matrix-vector multiplication, directed hypergraph model, bipartite vertex cover, combinatorial scientific computing AMS subject classifications. 0C0, 0C, 0C0, F0, F0, Y0. Introduction. The sparse matrix-vector multiply is a fundamental operation in many iterative solvers such as for linear systems, eigensystems and least squares problems. This renders the parallelization of sparse matrix-vector multiply as an important problem. Since the same sparse matrix is multiplied many times during the iterations of such applications, several comprehensive sparse matrix partitioning models and methods are proposed and implemented for scaling parallel sparse matrixvector multiply operations on distributed memory systems. The parallel sparse matrix-vector multiply operation is composed of fine-grain tasks of multiply-and-add operations where each fine-grain task involves an inputvector entry, a nonzero and a partial result on an output-vector entry. Here, each fine-grain task is associated with a separate nonzero and assumed to be performed by the processor that contains the associated nonzero by the owner-computes rule. In the literature, there are three basic sparse matrix-vector multiply algorithms: row-parallel, column-parallel and row-column-parallel. The row- and column-parallel algorithms are D parallel, whereas the row-column-parallel algorithm is D parallel. In row-parallel sparse matrix-vector multiply, all fine-grain tasks associated with the nonzeros at a row are combined into a composite task of inner product of a sparse row vector and a dense input vector. This row-oriented combination requires rowwise partitioning where the nonzeros at a row and the respective output-vector entry are all assigned to the same processor. Similarly, in column-parallel sparse matrix-vector Independent Researcher CNRS and University of Lyon, FRANCE Bilkent University, TURKEY
2 E. KAYAASLAN, B. UÇAR AND C. AYKANAT multiply, all fine-grain tasks associated with the nonzeros at a column are combined into a composite task of daxpy operation over a dense output vector where the operation involves a sparse column vector and an input-vector entry. This columnoriented combination requires columnwise partitioning where the nonzeros at a column and the respective input-vector entry are all assigned to the same processor. In row-parallel sparse matrix-vector multiply, all messages are communicated in an input-communication phase called expand where each message contains only inputvector entries. In column-parallel sparse matrix-vector multiply, on the other hand, all messages are communicated in an output-communication phase called fold where each message contains only partially computed output-vector entries. In row-columnparallel sparse matrix-vector multiply, there is no restriction of any kind on distributing input- and output-vector entries and nonzeros, which is also referred as fine-grain partitioning. In the row-column-parallel algorithm, some messages are communicated in the expand phase and some messages are communicated in the fold phase. Each message of the expand phase contains only input-vector entries as in the row-parallel algorithm, whereas each message of the fold phase contains only partially computed output-vector entries as in the column-parallel algorithm. In all three sparse matrixvector multiply algorithms, the messages are homogenous, that is, each message contains either only input-vector entries or only partially computed output-vector entries. In order to solve each of the above-mentioned three partitioning problems, a different hypergraph model is proposed, where vertex partitioning with minimum cutsize while maintaining balance on part weights exactly corresponds to matrix partitioning with minimum total communication volume while maintaining computational load balance on processors. These hypergraph models are as follows: the column-net hypergraph model [
3 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY expand-fold phase. The proposed local row-column-parallel algorithm requires local fine-grain partitioning where a fine-grain partition is said to be local if each fine-grain task is local either to its input-vector entry, or to its output-vector entry, or to both. This flexibility on assigning fine-grain tasks brings an opportunity to perform sparse matrix-vector multiply in parallel with a partitioning time and partitioning quality close to those of the D and D parallel algorithms, respectively. We propose two methods to obtain a.d local fine-grain partition each with a different setting and approach where some preliminary studies on these methods are given in our recent work [
4 E. KAYAASLAN, B. UÇAR AND C. AYKANAT P` P r P k x j x j a ij ŷ i y i ŷ i ŷ i + a ij x j y i y i +ŷ i Fig..: A fine-grain task and its parallel computation. task a ij. P` P k P` P k x j a ij ŷi y i x j x j y i a ij ŷ i ŷ i + a ij x j y i y i +ŷ i y i y i + a ij x j a P P P x [x ] x [x, ŷ ] y x a y a a y a x + a x ŷ a x y a x +ŷ
5 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY P P P P P P P P P x P P P x a x y a a y a Fig..: A task-and-data distribution P Π(y Ax) P P P P of matrix-vector P multiply on a sample sparse matrix A. P P P A of the block structure ( P P P x a x x a y a y a
6 E. KAYAASLAN, B. UÇAR AND C. AYKANAT In column-parallel sparse matrix-vector multiply, the basic computational units are the columns. For an input-vector entry x j assigned to processor P k, the fine-grain tasks associated with the nonzeros of A j = {a ij A : i m} are combined into a composite task of daxpy operation ŷ k ŷ k +A j x j which is to be carried out on P k where ŷ k is the partially computed output-vector of P k. As a result, a task-anddata distribution Π(y Ax) of matrix-vector multiply on A for the column-parallel algorithm should satisfy the following condition: a ij A (k) whenever x j x (k) (.9) and in the literature this kind of distribution is known as columnwise partitioning [
7 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Algorithm The row-column-parallel sparse matrix-vector multiply For each processor P k :. (expand) for each nonzero column stripe A (l) (a) form vector ˆx (k) l which contains only those entries of x (k) corresponding to nonzero columns in A (l) k and (b) send vector ˆx (k) l to P l,. for each nonzero row stripe A (k) l ; compute (a) y (l) k A (k) lk x(k) and (b) y (l) k y (l) k + r k A(k) lr ˆx(r) k. (fold) for each nonzero row stripe A (k) (a) form vector ŷ (l) k to nonzero rows in A (k) l (b) send vector ŷ (l) k to P l,. compute output-subvector (a) y (k) A kk x (k), (b) y (k) y (k) + A (k) ˆx(l) k and (c) y (k) y (k) + l k ŷ(k) l. l ; k ; which contains only those entries of y (l) k and corresponding of such messages. Then, the total reduction in the number of messages equals to the number of heterogeneous messages of the local row-column-parallel algorithm... Task-communication dependency graph. We first introduce a two-way categorization of input- and output-vector entries and a four-way categorization of fine-grain tasks (
8 expand NL fold expand fold NL 8 E. KAYAASLAN, B. UÇAR AND C. AYKANAT NL expand expand NL NL fold fold NL (c) expand-fold row-parallel (a) task-communication dependency graph fold expand fold expand NL fold (d) column-parallel fold expand fold fold NL NL expand expand-fold expand expand (b) row-column-parallel expand-fold (e) local row-column-parallel expand-fold expand-fold Fig..: (a) task-communication dependency graph, (b) (e) topological orderings for different sparse matrix-vector multiply algorithms. : input-communication phase, : output-communication phase, : input-output-local tasks, : inputlocal tasks, : output-local tasks, NL: nonlocal tasks. a dependency on the output-communication phase, however, the nonlocal tasks are linked with both communication phases. Figure
9 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 9 In the column-parallel algorithm, each of the fine-grain tasks is either inputoutput-local or input-local due to the columnwise partitioning condition (
10 P P P P P P x a x P P P x y a P y a a P P 0 E. KAYAASLAN, B. UÇAR AND C. AYKANAT P P P P P P P P P P P P x a x x y a a y a Fig..: A sample local fine-grain partition. Here, a is an input-output-local task, a is an input-local task, a and a are output-local tasks. A = A A () A () A () A () A () A () A () A () 0 A () A () A A () 0 0 A () A () 0 0 A () A A () A () 0 A () A () A () A () A () A () A () A () A +. (.) For instance, A = A () + A(), A = A () + A(), A = A () + A(),..., etc. Figure
11 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Algorithm The local row-column-parallel sparse matrix-vector multiply For each processor P k :. for each nonzero off-diagonal block A (k) compute y (l) k lk ; A (k) P` P m P k input-local tasks of P k lk x(k),. (expand-fold) for each nonzero x off-diagonal block A lk = A (k) j ŷ i lk + A(l) lk ; (a) form vector ˆx (k) x j a ij y i l, which contains only those entries of x (k) corresponding to nonzero columns in A (l) ŷ lk i, ŷ i + a ij x j y i y i +ŷ i (b) form vector ŷ (l) k, which contains only those entries of y(l) k corresponding to nonzero rows in A (k) lk, P k (c) send vector P` [ˆx (k) l, ŷ (l) k ] to processor P l.. compute output-subvector x ŷi j x j (a) y (k) A kk x (k) y i x, input-output-local j a ij a ij tasks of P k (b) y (k) y (k) + A (k) ŷ ˆx(l) i ŷ i + a ij k and output-local tasks of P k (c) y (k) y (k) + x j y i y i +ŷ i y i y i + a ij x j l k ŷ(k) l. input-local tasks of other processors P` P k y i a P P P x [x ] x [x, ŷ ] y x a y a a y a x + a x ŷ a x y a x +ŷ Fig..: An illustration of Algorithm
12 E. KAYAASLAN, B. UÇAR AND C. AYKANAT. Two proposed methods for local row-column-parallel partitioning. In this section, we propose two methods to find a local row-column-parallel partition that is required for.d local row-column-parallel sparse matrix vector multiply. One method finds vector and nonzero distributions simultaneously, whereas the other employs two parts in which vector and nonzero distributions are found separately... A directed hypergraph model for simultaneous vector and nonzero distribution. In this method, we adopt the elementary hypergraph model for finegrain partitioning of [
13 P.D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY (a) a sparse matrix P P P V P P V P P (c) a -way local hypergraph partition P V (b) directed hypergraph model y y y x x x x x x x x x y x x x y y y y y y y y (d) local fine-grain partition Fig..: An illustration of attaining a local fine-grain partition through vertex partitioning of the directed hypergraph model that satisfies locality constraints. The input- and output-data vertices are drawn with triangles and rectangles, respectively. j has smaller number of nonzeros than row i and it is amalgamated into v y (i) if vice versa, where the ties are broken arbitrarily. The result is a reduced hypergraph that contains only input- and output-data vertices amalgamated with task vertices where the weight of a data vertex is equal to the number of task vertices amalgamated into that data vertex. As a result, the locality constraint on vertex partitioning of the initial directed hypergraph naturally holds through vertex partitioning on the reduced hypergraph for which the net directions become irrelevant. A vertex partition of this reduced hypergraph can be obtained by any existing hypergraph partitioning tools and then can be trivially decoded as a local fine-grain partition. Figure
14 E. KAYAASLAN, B. UÇAR AND C. AYKANAT " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " (a) task-vertex amalgamations (b) reduced hypergraph P V V V x x x x x x x x x P y x x x P y y y V y y y y P y V y y y (c) a -way hypergraph partition V (d) local fine-grain partition P Fig..: An illustration of local fine-grain P partitioning through task-vertex amalgamations. The input- and output-data vertices are drawn with triangles and rectangles, respectively. The figure on right bottom shows the fine-grain partition. the recursive-bisection framework which distorts the locality of task vertices so that a partition obtained in further recursive steps is no more a local fine-grain partition... Optimal nonzero distribution to minimize total communication volume. This method is composed of two parts. The first part is to find a vector distribution (Π(x), Π(y)) and the second part is to find a nonzero/task distribution Π(A) that exactly minimizes total communication volume over all possible local finegrain partitions those abide by the vector distribution (Π(x), Π(y)) of the first part. In this way, we generate a local fine-grain partition Π(y Ax) = (Π(A), Π(x), Π(y)). The first part can be accomplished by any conventional data partitioning methods such as D partitioning and this section is devoted to the second part of the method. Consider the block structure (
15 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY x x x x x x (a) a sample 0 sparse matrix y y y y 0 0 y 8 y (b) the induced block structure y y y x x x x x x x x x x x x Fig..: A sample 0 sparse matrix A and its block structure induced by input- distribution Π(x) = {x 0 (), x 0 (), x () } and output-data 8 distribution Π(y) 0 = 0 data y y {y (), y (), y () }, where x () = {x, x, xy, x, x 8, x 9 }, x () = {x, x }, x () = {x, x, x, x 0 }, y () = {y, y 0 }, y () = {y, y, y, y, y 8 } and y () = {y, y, y 9 }. y be performed independently for minimizing y total y communication volume. In the local row-column-parallel algorithm, P l sends [ˆx (k) l, ŷ (l) k ] to P k where ˆx (k) l 8 corresponds 8 to the nonzero columns of A (l) 8 8 lk and ŷ(l) k corresponds to the nonzero rows of A (k) lk y, for a nonzero/task distribution A = A (k) y y + A(l). Then, we can derive the following formula for the communication volume φ from P l to P k : φ = ˆn(A (k) ) + ˆm(A(l) ), (.) where ˆn(.) and ˆm(.) refer to the number of nonzero columns and nonzero rows of the input submatrix, respectively. The total communication volume φ is then computed by summing the communication volumes incurred by each nonzero off-diagonal block of the block structure. Then, the problem of our interest can be described as follows. Problem. Given A and a vector distribution (Π(x), Π(y)), find a nonzero/task distribution Π(A) such that each nonzero off-diagonal block A = A (k) + A(l) and each diagonal block A kk = A (k) kk for the block structure induced by (Π(x), Π(y)) minimizing total communication volume φ = k l φ. Let G = (U V, E ) be the bipartite graph representation of A, where U and V are the set of vertices corresponding to the rows and columns of A, respectively, and E is the set of edges corresponding to the nonzeros of A. Based on this notation, the following theorem states a correspondence between the problem of distributing nonzeros/tasks of A to minimize communication volume φ from P l to P k and the problem of finding a minimum vertex cover of G. Theorem.. Let A be a nonzero off-diagonal block and G = (U V, E ) be its bipartite graph representation.. For any vertex cover S of G, there is a nonzero distribution A = A (k) A (l) such that S ˆn(A (k) ) + ˆm(A(l) ),. For any nonzero distribution A = A (k) of G such that S = ˆn(A (k) ) + ˆm(A(l) ). + + A(l), there is a vertex cover S
16 E. KAYAASLAN, B. UÇAR AND C. AYKANAT A (k) Proof. We prove the two parts of the theorem separately.. Take any vertex cover S of G. Consider any nonzero distribution A = + A(l) such that A (k) if v j S and u i S, a ij A (l) if v j S and u i S, (.) A (k) or A (l) if v j S l and u i S. Since v j S for every a ij A (k) and u i S for every a ij A (l), S V ˆn(A (k) ) and S U ˆm(A (l) ), which in turn leads S ˆn(A (k) ) + ˆm(A(l) ). (.). Take any nonzero distribution A = A (k) + A(l). Consider S = {u i U : ) + ˆm(A(l) ). Now, consider a nonzero a ij A and its corresponding edge {u i, v j } E. If a ij A (k) then a ij A (l) } {v j V : a ij A (k) } where S = ˆn(A (k) v j S. Otherwise, u i S since a ij A (l). So, S is a vertex cover of G. At this point, however, it is still not clear how the reduction from the problem of distributing nonzeros/tasks to the problem of finding minimum vertex cover holds. For this purpose, using Theorem
17 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 8 Ak` 8 9 y k = {y,y,y,y,y 8 } x` = {x,x,x,x,x 8,x 9 } G k` v u v u v u v u v 8 u 8 v 9 S k` = {u,u,v,v 8 } Input-communication [x,x 8 ] 8 9 A (k) k` A (`) k` Output-communication [ŷ, ŷ ] Fig..: The minimum vertex cover model for A to minimize communication volume φ from P l to P k. Due to minimum vertex cover S, P l sends [x, x 8, ŷ, ŷ ] to P k. Algorithm Nonzero/task distribution to minimize total communication volume : procedure NonzeroTaskDistributeVolume(A, Π(x), Π(y)) : for each nonzero off-diagonal block A do Equation (
18 8 E. KAYAASLAN, B. UÇAR AND C. AYKANAT u u u u u 8 G G v v v v v 8 v 9 u u 0 v v G u u 9 G v v S ={v,u 0 } S ={v,u 9 } u u u u 8 v v v 0 v S ={v,v 8,u,u } S ={v,u } (a) a minimum vertex cover for each nonzero off-diagonal block of Figure
19 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 9 running with default parameters and setting the maximum allowable imbalance ratio as %. Since PaToH depends on randomization, we report the geometric mean of ten different runs for each partitioning instance. In all experiments, we report the results using a generic tool called performance profiles [
20 0 E. KAYAASLAN, B. UÇAR AND C. AYKANAT trices with or without dense rows (K = 0).0 Method D D.D-H.D-V.D-L.D-L SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel Matrices with no dense rows/columns rowwise fine-grain local fine-grain local fine-grain Matrices with no dense rows/columns Partitioning time relative to the best Partitioning time relative to the best Communication volume relative to the best.0 (a) Total volume (K = ) Matrices with no dense rows/columns % % % % % % % % 8% 9% 0% Load imbalance ratio.0 (b) Load balance (K = ) Matrices with no dense rows/columns Communication volume relative to the best (c) Total communication volume (K = 0) 0% % % % % % % % 8% 9% 0% Load imbalance ratio (d) Load balance (K = 0) Fig..: Performance profiles that compare total communication volume and load balance using test matrices with no dense rows/columns for K = and 0. respectively, for K = 0. Figures
21 trices with or without dense rows (K = 0).0.D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Method D D.D-H.D-V.D-L.D-L SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel Matrices with dense rows/columns rowwise fine-grain local fine-grain local fine-grain Matrices with dense rows/columns Partitioning time relative to the best Partitioning time relative to the best Communication volume relative to the best.0 (a) Total volume (K = ) Matrices with dense rows/columns % % 8% % % 0% % 8% % % 0% Load imbalance ratio.0 (b) Load balance (K = ) Matrices with dense rows/columns Communication volume relative to the best (c) Total communication volume (K = 0) 0% % 8% % % 0% % 8% % % 0% Load imbalance ratio (d) Load balance (K = 0) Fig..: Performance profiles that compare total communication volume and load balance using test matrices with dense rows/columns for K = and 0. performances in terms of total communication volume as expected. Figure
22 E. KAYAASLAN, B. UÇAR AND C. AYKANAT trices with or without dense rows (K = 0).0 Method D D.D-H.D-V.D-L.D-L All matrices SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel rowwise fine-grain local fine-grain local fine-grain All matrices Partitioning time relative to the best Partitioning time relative to the best Number of messages relative to the best.0 (a) Total message count All matrices Number of messages relative to the best.0 (b) Maximum message count All matrices Communication volume relative to the best (c) Maximum volume Partitioning time relative to the best (d) Partitioning time Fig..: Performance profiles that compare total message count and maximum message count for three methods D, D and.d-h, maximum communication volume per processor and partitioning time for all methods using all test matrices for K = 0. still be favorable to other methods for particular matrices due to low communication volume it may lead. In short, if the sparse matrix contains dense rows/columns then.d-h seems to be the method of choice in general; otherwise,.d-v and D are reasonable alternatives competing with each other.. Conclusion and further discussions. This paper introduced.d parallelism for sparse matrix-vector multiply. We presented the local row-column parallel sparse matrix-vector multiply that uses this introduced.d parallelism. This algorithm is the fourth parallel algorithm in the literature for sparse matrix-vector multiply in addition to the well-known D row-parallel, D column-parallel and D row-column-parallel ones. In this paper, we also proposed two methods (.D-H and.d-v) to distribute tasks and data in accordance with the requirements of the proposed.d parallel algorithm. Using an extensive set of matrices from the UFL sparse
23 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY matrix collection, we compared the partitioning qualities of these two methods against the baseline D and D methods. The experiments suggest the use of the local row-column-parallel sparse matrixvector multiply with a local fine-grain partition obtained by the proposed directed hypergraph model for matrices those contain dense rows/columns as we observe a performance close to that of D fine-grain partitioning in terms of the partitioning quality but with considerably less number of messages and significant efficiency. We consider the problem mainly from a theoretical point of interest and leave the performance of.d parallel sparse matrix-vector multiply algorithms in terms of the parallel multiply timings as a future work. We note that the main ideas behind the proposed.d parallelism, such as heterogeneous messaging and avoiding nonlocal tasks by a locality constraint on partitioning, are of course not restricted to the parallel sparse matrix-vector multiply operation and these ideas can be extended to other parallel computations as well. REFERENCES [] Ümit Çatalyürek and Cevdet Aykanat, Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication, Parallel and Distributed Systems, IEEE Transactions on, 0 (999), pp. 9. [], A fine-grain hypergraph model for d decomposition of sparse matrices, Parallel and Distributed Processing Symposium, International, (00), p. 08b. [], Patoh (partitioning tool for hypergraphs), in Encyclopedia of Parallel Computing, Springer, 0, pp [] Ümit Çatalyürek, Cevdet Aykanat, and Bora Uçar, On two-dimensional sparse matrix partitioning: Models, methods, and a recipe, SIAM Journal on Scientific Computing, (00), pp. 8. [] Elizabeth D Dolan and Jorge J Moré, Benchmarking optimization software with performance profiles, Mathematical programming, 9 (00), pp. 0. [] Enver Kayaaslan, Bora Uçar, and Cevdet Aykanat, Semi-two-dimensional partitioning for parallel sparse matrix-vector multiplication, in Parallel and Distributed Processing Symposium Workshop (IPDPSW), 0 IEEE International, IEEE, 0, pp.. [] Daniël M Pelt and Rob H Bisseling, A medium-grain method for fast d bipartitioning of sparse matrices, in Parallel and Distributed Processing Symposium, 0 IEEE 8th International, IEEE, 0, pp [8] Bora Uçar and Cevdet Aykanat, Revisiting hypergraph models for sparse matrix partitioning, SIAM review, 9 (00), pp [9], Partitioning sparse matrices for parallel preconditioned iterative methods, SIAM Journal on Scientific Computing, 9 (008), p. 8.
Mathematics and Computer Science
Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR
More informationTechnical Report. OSUBMI-TR_2008_n04. On two-dimensional sparse matrix partitioning: Models, Methods, and a Recipe
The Ohio State University Department of Biomedical Informatics Graves Hall, W. th Avenue Columbus, OH http://bmi.osu.edu Technical Report OSUBMI-TR n On two-dimensional sparse matrix partitioning: Models,
More informationENCAPSULATING MULTIPLE COMMUNICATION-COST METRICS IN PARTITIONING SPARSE RECTANGULAR MATRICES FOR PARALLEL MATRIX-VECTOR MULTIPLIES
SIAM J. SCI. COMPUT. Vol. 25, No. 6, pp. 1837 1859 c 2004 Society for Industrial and Applied Mathematics ENCAPSULATING MULTIPLE COMMUNICATION-COST METRICS IN PARTITIONING SPARSE RECTANGULAR MATRICES FOR
More informationOptimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning
Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,
More informationc 2010 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol., No., pp. 66 68 c Society for Industrial and Applied Mathematics ON TWO-DIMENSIONAL SPARSE MATRIX PARTITIONING: MODELS, METHODS, AND A RECIPE ÜMİT V. ÇATALYÜREK, CEVDET AYKANAT,
More informationMinimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices
Minimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices Bora Uçar and Cevdet Aykanat Department of Computer Engineering, Bilkent University, 06800, Ankara, Turkey {ubora,aykanat}@cs.bilkent.edu.tr
More informationHypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication Ümit V. Çatalyürek and Cevdet Aykanat, Member, IEEE Computer Engineering Department, Bilkent University 06 Bilkent,
More informationA Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices
A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices Ümit V. Çatalyürek Dept. of Pathology, Division of Informatics Johns Hopkins Medical Institutions Baltimore, MD 21287 umit@jhmi.edu
More informationHypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 7, JULY 1999 673 Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication UÈ mitv.cëatalyuè rek and
More informationTechnical Report. OSUBMI-TR-2009-n02/ BU-CE Hypergraph Partitioning-Based Fill-Reducing Ordering
Technical Report OSUBMI-TR-2009-n02/ BU-CE-0904 Hypergraph Partitioning-Based Fill-Reducing Ordering Ümit V. Çatalyürek, Cevdet Aykanat and Enver Kayaaslan April 2009 The Ohio State University Department
More informationA matrix partitioning interface to PaToH in MATLAB
A matrix partitioning interface to PaToH in MATLAB Bora Uçar a,,1, Ümit V. Çatalyürek b,2, and Cevdet Aykanat c,3 a Centre National de la Recherche Scientifique, Laboratoire de l Informatique du Parallélisme,
More informationPar$$oning Sparse Matrices
SIAM CSE 09 Minisymposium on Parallel Sparse Matrix Computa:ons and Enabling Algorithms March 2, 2009, Miami, FL Par$$oning Sparse Matrices Ümit V. Çatalyürek Associate Professor Biomedical Informa5cs
More informationOn the scalability of hypergraph models for sparse matrix partitioning
On the scalability of hypergraph models for sparse matrix partitioning Bora Uçar Centre National de la Recherche Scientifique Laboratoire de l Informatique du Parallélisme, (UMR CNRS-ENS Lyon-INRIA-UCBL),
More informationExploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Architectures
Eploiting Matri Reuse and Data Locality in Sparse Matri-Vector and Matri-ranspose-Vector Multiplication on Many-Core Architectures Ozan Karsavuran 1 Kadir Akbudak 2 Cevdet Aykanat 1 (speaker) sites.google.com/site/kadircs
More informationParallel Computing 36 (2010) Contents lists available at ScienceDirect. Parallel Computing. journal homepage:
Parallel Computing 36 (2010) 254 272 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco A Matrix Partitioning Interface to PaToH in MATLAB Bora
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationTechnical Report. Hypergraph-Partitioning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication
Technical Report BU-CE-1201 Hypergraph-Partitioning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication Kadir Akbudak, Enver Kayaaslan and Cevdet Aykanat February
More informationA Parallel Matrix Scaling Algorithm
A Parallel Matrix Scaling Algorithm Patrick R. Amestoy 1, Iain S. Duff 2,3, Daniel Ruiz 1, and Bora Uçar 3 1 ENSEEIHT-IRIT, 2 rue Camichel, 31071, Toulouse, France amestoy@enseeiht.fr, Daniel.Ruiz@enseeiht.fr
More informationPortable, usable, and efficient sparse matrix vector multiplication
Portable, usable, and efficient sparse matrix vector multiplication Albert-Jan Yzelman Parallel Computing and Big Data Huawei Technologies France 8th of July, 2016 Introduction Given a sparse m n matrix
More informationAnalyzing and Enhancing OSKI for Sparse Matrix-Vector Multiplication 1
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Analyzing and Enhancing OSKI for Sparse Matrix-Vector Multiplication 1 Kadir Akbudak a, Enver Kayaaslan a, Cevdet Aykanat
More informationCommunication balancing in Mondriaan sparse matrix partitioning
Communication balancing in Mondriaan sparse matrix partitioning Rob Bisseling and Wouter Meesen Rob.Bisseling@math.uu.nl http://www.math.uu.nl/people/bisseling Department of Mathematics Utrecht University
More informationSparse matrices: Basics
Outline : Basics Bora Uçar RO:MA, LIP, ENS Lyon, France CR-08: Combinatorial scientific computing, September 201 http://perso.ens-lyon.fr/bora.ucar/cr08/ 1/28 CR09 Outline Outline 1 Course presentation
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationThe Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs
The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent
More informationGraph and Hypergraph Partitioning for Parallel Computing
Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:
More informationc 2004 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 25, No. 6, pp. 160 179 c 2004 Society for Industrial and Applied Mathematics PERMUTING SPARSE RECTANGULAR MATRICES INTO BLOCK-DIAGONAL FORM CEVDET AYKANAT, ALI PINAR, AND ÜMIT
More informationSelf-Improving Sparse Matrix Partitioning and Bulk-Synchronous Pseudo-Streaming
Self-Improving Sparse Matrix Partitioning and Bulk-Synchronous Pseudo-Streaming MSc Thesis Jan-Willem Buurlage Scientific Computing Group Mathematical Institute Utrecht University supervised by Prof. Rob
More informationCombinatorial problems in a Parallel Hybrid Linear Solver
Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop
More informationOn shared-memory parallelization of a sparse matrix scaling algorithm
On shared-memory parallelization of a sparse matrix scaling algorithm Ümit V. Çatalyürek, Kamer Kaya The Ohio State University Dept. of Biomedical Informatics {umit,kamer}@bmi.osu.edu Bora Uçar CNRS and
More informationAdvances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing
Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Erik G. Boman 1, Umit V. Catalyurek 2, Cédric Chevalier 1, Karen D. Devine 1, Ilya Safro 3, Michael M. Wolf
More informationAugmenting Hypergraph Models with Message Nets to Reduce Bandwidth and Latency Costs Simultaneously
Augmenting Hypergraph Models with Message Nets to Reduce Bandwidth and Latency Costs Simultaneously Oguz Selvitopi, Seher Acer, and Cevdet Aykanat Bilkent University, Ankara, Turkey CSC16, Albuquerque,
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationDownloaded 01/21/15 to Redistribution subject to SIAM license or copyright; see
SIAM J. SCI. COMPUT. Vol. 36, No. 5, pp. C568 C590 c 2014 Society for Industrial and Applied Mathematics SIMULTANEOUS INPUT AND OUTPUT MATRIX PARTITIONING FOR OUTER-PRODUCT PARALLEL SPARSE MATRIX-MATRIX
More informationfor Parallel Matrix-Vector Multiplication? Umit V. C atalyurek and Cevdet Aykanat Computer Engineering Department, Bilkent University
Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication? Umit V. C atalyurek and Cevdet Aykanat Computer Engineering Department, Bilkent University 06533 Bilkent, Ankara, Turkey
More informationVertex Magic Total Labelings of Complete Graphs 1
Vertex Magic Total Labelings of Complete Graphs 1 Krishnappa. H. K. and Kishore Kothapalli and V. Ch. Venkaiah Centre for Security, Theory, and Algorithmic Research International Institute of Information
More informationDesign of Parallel Algorithms. Models of Parallel Computation
+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes
More informationExploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture K. Akbudak a, C.Aykanat
More informationFor example, the system. 22 may be represented by the augmented matrix
Matrix Solutions to Linear Systems A matrix is a rectangular array of elements. o An array is a systematic arrangement of numbers or symbols in rows and columns. Matrices (the plural of matrix) may be
More informationHypergraph Partitioning for Computing Matrix Powers
Hypergraph Partitioning for Computing Matrix Powers Nicholas Knight Erin Carson, James Demmel UC Berkeley, Parallel Computing Lab CSC 11 1 Hypergraph Partitioning for Computing Matrix Powers Motivation
More informationSPARSE matrix-vector multiplication (SpMV) is a
Spatiotemporal Graph and Hypergraph Partitioning Models for Sparse Matrix-Vector Multiplication on Many-Core Architectures Nabil Abubaker, Kadir Akbudak, and Cevdet Aykanat Abstract There exist graph/hypergraph
More informationCOUNTING PERFECT MATCHINGS
COUNTING PERFECT MATCHINGS JOHN WILTSHIRE-GORDON Abstract. Let G be a graph on n vertices. A perfect matching of the vertices of G is a collection of n/ edges whose union is the entire graph. This definition
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationrppatoh: Replicated Partitioning Tool for Hypergraphs
rppatoh: Replicated Partitioning Tool for Hypergraphs R. Oguz Selvitopi Computer Engineering Department Bilkent University Ankara, 06800 Turkey reha@cs.bilkent.edu.tr roguzsel@gmail.com Ata Turk Computer
More informationHypergraph-Theoretic Partitioning Models for Parallel Web Crawling
Hypergraph-Theoretic Partitioning Models for Parallel Web Crawling Ata Turk, B. Barla Cambazoglu and Cevdet Aykanat Abstract Parallel web crawling is an important technique employed by large-scale search
More informationVertex Magic Total Labelings of Complete Graphs
AKCE J. Graphs. Combin., 6, No. 1 (2009), pp. 143-154 Vertex Magic Total Labelings of Complete Graphs H. K. Krishnappa, Kishore Kothapalli and V. Ch. Venkaiah Center for Security, Theory, and Algorithmic
More informationStructured System Theory
Appendix C Structured System Theory Linear systems are often studied from an algebraic perspective, based on the rank of certain matrices. While such tests are easy to derive from the mathematical model,
More informationAcyclic Colorings of Graph Subdivisions
Acyclic Colorings of Graph Subdivisions Debajyoti Mondal, Rahnuma Islam Nishat, Sue Whitesides, and Md. Saidur Rahman 3 Department of Computer Science, University of Manitoba Department of Computer Science,
More informationAbusing a hypergraph partitioner for unweighted graph partitioning
Abusing a hypergraph partitioner for unweighted graph partitioning B. O. Fagginger Auer R. H. Bisseling Utrecht University February 13, 2012 Fagginger Auer, Bisseling (UU) Mondriaan Graph Partitioning
More informationThe block triangular form and bipartite matchings
Outline The block triangular form and bipartite matchings Jean-Yves L Excellent and Bora Uçar GRAAL, LIP, ENS Lyon, France CR-07: Sparse Matrix Computations, November 2010 http://graal.ens-lyon.fr/~bucar/cr07/
More informationOn the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments
Electronic Journal of Graph Theory and Applications 3 (2) (2015), 191 196 On the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments Rafael Del Valle
More informationDiscrete Mathematics
Discrete Mathematics Lecturer: Mgr. Tereza Kovářová, Ph.D. tereza.kovarova@vsb.cz Guarantor: doc. Mgr. Petr Kovář, Ph.D. Department of Applied Mathematics, VŠB Technical University of Ostrava About this
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationMapReduce-based Parallelization of Sparse Matrix Kernels for Large-scale Scientific Applications
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe MapReduce-based Parallelization of Sparse Matrix Kernels for Large-scale Scientific Applications Gunduz Vehbi Demirci a,
More informationGraph Theory Day Four
Graph Theory Day Four February 8, 018 1 Connected Recall from last class, we discussed methods for proving a graph was connected. Our two methods were 1) Based on the definition, given any u, v V(G), there
More informationSite-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation
786 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 5, MAY 20 Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation Ali Cevahir, Cevdet Aykanat, Ata
More informationChapter 8 Dense Matrix Algorithms
Chapter 8 Dense Matrix Algorithms (Selected slides & additional slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to arallel Computing, Addison Wesley, 23. Topic Overview
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationFill-in reduction in sparse matrix factorizations using hypergraphs
Fill-in reduction in sparse matrix factorizations using hypergraphs Oguz Kaya, Enver Kayaaslan, Bora Uçar, Iain S. Duff To cite this version: Oguz Kaya, Enver Kayaaslan, Bora Uçar, Iain S. Duff. Fill-in
More informationPartitioning Spatially Located Load with Rectangles: Algorithms and Simulations
Partitioning Spatially Located Load with Rectangles: Algorithms and Simulations, Erdeniz Ozgun Bas, Umit V. Catalyurek Department of Biomedical Informatics, The Ohio State University {esaule,erdeniz,umit}@bmi.osu.edu
More informationMATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.
MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the
More informationAN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS
Volume 8 No. 0 208, -20 ISSN: 3-8080 (printed version); ISSN: 34-3395 (on-line version) url: http://www.ijpam.eu doi: 0.2732/ijpam.v8i0.54 ijpam.eu AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationMathematics High School Geometry An understanding of the attributes and relationships of geometric objects can be applied in diverse contexts
Mathematics High School Geometry An understanding of the attributes and relationships of geometric objects can be applied in diverse contexts interpreting a schematic drawing, estimating the amount of
More informationAlgebraic Graph Theory- Adjacency Matrix and Spectrum
Algebraic Graph Theory- Adjacency Matrix and Spectrum Michael Levet December 24, 2013 Introduction This tutorial will introduce the adjacency matrix, as well as spectral graph theory. For those familiar
More informationOn the Relationships between Zero Forcing Numbers and Certain Graph Coverings
On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,
More informationChordal Graphs and Minimal Free Resolutions
Chordal Graphs and Minimal Free Resolutions David J. Marchette David A. Johannsen Abstract The problem of computing the minimal free resolution of the edge ideal of a graph has attracted quite a bit of
More informationEgemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for
Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and
More informationMath 1B03/1ZC3 - Tutorial 3. Jan. 24th/28th, 2014
Math 1B03/1ZC3 - Tutorial 3 Jan. 24th/28th, 2014 Tutorial Info: Website: http://ms.mcmaster.ca/ dedieula. Math Help Centre: Wednesdays 2:30-5:30pm. Email: dedieula@math.mcmaster.ca. Elementary Matrices
More informationSparse Hypercube 3-Spanners
Sparse Hypercube 3-Spanners W. Duckworth and M. Zito Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3052, Australia Department of Computer Science, University of
More informationarxiv: v1 [cs.dm] 21 Dec 2015
The Maximum Cardinality Cut Problem is Polynomial in Proper Interval Graphs Arman Boyacı 1, Tinaz Ekim 1, and Mordechai Shalom 1 Department of Industrial Engineering, Boğaziçi University, Istanbul, Turkey
More informationExercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:
Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). Use the inversion algorithm to find the inverse of an invertible matrix. Express
More informationLATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS
LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment
More informationREGULAR GRAPHS OF GIVEN GIRTH. Contents
REGULAR GRAPHS OF GIVEN GIRTH BROOKE ULLERY Contents 1. Introduction This paper gives an introduction to the area of graph theory dealing with properties of regular graphs of given girth. A large portion
More informationOn Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)
On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')
More informationComputer Graphics Hands-on
Computer Graphics Hands-on Two-Dimensional Transformations Objectives Visualize the fundamental 2D geometric operations translation, rotation about the origin, and scale about the origin Learn how to compose
More informationSummary of Raptor Codes
Summary of Raptor Codes Tracey Ho October 29, 2003 1 Introduction This summary gives an overview of Raptor Codes, the latest class of codes proposed for reliable multicast in the Digital Fountain model.
More informationK 4 C 5. Figure 4.5: Some well known family of graphs
08 CHAPTER. TOPICS IN CLASSICAL GRAPH THEORY K, K K K, K K, K K, K C C C C 6 6 P P P P P. Graph Operations Figure.: Some well known family of graphs A graph Y = (V,E ) is said to be a subgraph of a graph
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course
More informationCS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1
More information15. The Software System ParaLab for Learning and Investigations of Parallel Methods
15. The Software System ParaLab for Learning and Investigations of Parallel Methods 15. The Software System ParaLab for Learning and Investigations of Parallel Methods... 1 15.1. Introduction...1 15.2.
More informationMonday, 12 November 12. Matrices
Matrices Matrices Matrices are convenient way of storing multiple quantities or functions They are stored in a table like structure where each element will contain a numeric value that can be the result
More informationNumerical Algorithms
Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0
More informationSpectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras
Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras Graph Sparsification or Sketching Compute a smaller graph that preserves some
More informationMathematics High School Geometry
Mathematics High School Geometry An understanding of the attributes and relationships of geometric objects can be applied in diverse contexts interpreting a schematic drawing, estimating the amount of
More informationBilinear Programming
Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationData Partitioning. Figure 1-31: Communication Topologies. Regular Partitions
Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy
More informationMathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2
CHAPTER Mathematics 8 9 10 11 1 1 1 1 1 1 18 19 0 1.1 Introduction: Graphs as Matrices This chapter describes the mathematics in the GraphBLAS standard. The GraphBLAS define a narrow set of mathematical
More information(Lec 14) Placement & Partitioning: Part III
Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationParallelizing LU Factorization
Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More information3 Identify shapes as two-dimensional (lying in a plane, flat ) or three-dimensional ( solid ).
Geometry Kindergarten Identify and describe shapes (squares, circles, triangles, rectangles, hexagons, cubes, cones, cylinders, and spheres). 1 Describe objects in the environment using names of shapes,
More informationSHORTEST PATHS ON SURFACES GEODESICS IN HEAT
SHORTEST PATHS ON SURFACES GEODESICS IN HEAT INF555 Digital Representation and Analysis of Shapes 28 novembre 2015 Ruoqi He & Chia-Man Hung 1 INTRODUCTION In this project we present the algorithm of a
More information3D Computer Graphics. Jared Kirschner. November 8, 2010
3D Computer Graphics Jared Kirschner November 8, 2010 1 Abstract We are surrounded by graphical displays video games, cell phones, television sets, computer-aided design software, interactive touch screens,
More informationNew Challenges In Dynamic Load Balancing
New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance
More informationLecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the
More informationAn Improved Measurement Placement Algorithm for Network Observability
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper
More information