1.5D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY

Size: px

Start display at page:

Download "1.5D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY"

Darcy Horton
5 years ago
Views:

1 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY ENVER KAYAASLAN, BORA UÇAR, AND CEVDET AYKANAT Abstract. There are three common parallel sparse matrix-vector multiply algorithms: D row-parallel, D column-parallel and D row-column-parallel. The D parallel algorithms offer the advantage of having only one communication phase. On the other hand, the D parallel algorithm is more scalable due to a high level of flexibility on distributing fine-grain tasks, whereas they suffer from two communication phases. Here, we introduce a novel concept of heterogeneous messages where a heterogeneous message may contain both input-vector entries and partially computed output-vector entries. This concept not only leads to a decreased number of messages but also enables fusing the input- and output-communication phases into a single phase. These findings are utilized to propose a.d parallel sparse matrix-vector multiply algorithm which is called local row-column-parallel. This proposed algorithm requires local fine-grain partitioning where locality refers to the constraint on each fine-grain task being assigned to the processor that contains either its input-vector entry, or its output-vector entry, or both. This constraint, nevertheless, happens to be not very restrictive so that we achieve a partitioning quality close to that of the D parallel algorithm. We propose two methods for local fine-grain partitioning. The first method is based on a novel directed hypergraph partitioning model that minimizes total communication volume while maintaining a load balance constraint as well as an additional locality constraint which is handled by adopting and adapting a recent and simple yet effective approach. The second method has two parts where the first part finds a distribution of the input- and output-vectors and the second part finds a nonzero/task distribution that exactly minimizes total communication volume while keeping the vector distribution intact. We conduct our experiments on a large set of test matrices to evaluate the partitioning qualities and partitioning times of these proposed.d methods. Key words. sparse matrix partitioning, parallel sparse matrix-vector multiplication, directed hypergraph model, bipartite vertex cover, combinatorial scientific computing AMS subject classifications. 0C0, 0C, 0C0, F0, F0, Y0. Introduction. The sparse matrix-vector multiply is a fundamental operation in many iterative solvers such as for linear systems, eigensystems and least squares problems. This renders the parallelization of sparse matrix-vector multiply as an important problem. Since the same sparse matrix is multiplied many times during the iterations of such applications, several comprehensive sparse matrix partitioning models and methods are proposed and implemented for scaling parallel sparse matrixvector multiply operations on distributed memory systems. The parallel sparse matrix-vector multiply operation is composed of fine-grain tasks of multiply-and-add operations where each fine-grain task involves an inputvector entry, a nonzero and a partial result on an output-vector entry. Here, each fine-grain task is associated with a separate nonzero and assumed to be performed by the processor that contains the associated nonzero by the owner-computes rule. In the literature, there are three basic sparse matrix-vector multiply algorithms: row-parallel, column-parallel and row-column-parallel. The row- and column-parallel algorithms are D parallel, whereas the row-column-parallel algorithm is D parallel. In row-parallel sparse matrix-vector multiply, all fine-grain tasks associated with the nonzeros at a row are combined into a composite task of inner product of a sparse row vector and a dense input vector. This row-oriented combination requires rowwise partitioning where the nonzeros at a row and the respective output-vector entry are all assigned to the same processor. Similarly, in column-parallel sparse matrix-vector Independent Researcher CNRS and University of Lyon, FRANCE Bilkent University, TURKEY

2 E. KAYAASLAN, B. UÇAR AND C. AYKANAT multiply, all fine-grain tasks associated with the nonzeros at a column are combined into a composite task of daxpy operation over a dense output vector where the operation involves a sparse column vector and an input-vector entry. This columnoriented combination requires columnwise partitioning where the nonzeros at a column and the respective input-vector entry are all assigned to the same processor. In row-parallel sparse matrix-vector multiply, all messages are communicated in an input-communication phase called expand where each message contains only inputvector entries. In column-parallel sparse matrix-vector multiply, on the other hand, all messages are communicated in an output-communication phase called fold where each message contains only partially computed output-vector entries. In row-columnparallel sparse matrix-vector multiply, there is no restriction of any kind on distributing input- and output-vector entries and nonzeros, which is also referred as fine-grain partitioning. In the row-column-parallel algorithm, some messages are communicated in the expand phase and some messages are communicated in the fold phase. Each message of the expand phase contains only input-vector entries as in the row-parallel algorithm, whereas each message of the fold phase contains only partially computed output-vector entries as in the column-parallel algorithm. In all three sparse matrixvector multiply algorithms, the messages are homogenous, that is, each message contains either only input-vector entries or only partially computed output-vector entries. In order to solve each of the above-mentioned three partitioning problems, a different hypergraph model is proposed, where vertex partitioning with minimum cutsize while maintaining balance on part weights exactly corresponds to matrix partitioning with minimum total communication volume while maintaining computational load balance on processors. These hypergraph models are as follows: the column-net hypergraph model [

3 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY expand-fold phase. The proposed local row-column-parallel algorithm requires local fine-grain partitioning where a fine-grain partition is said to be local if each fine-grain task is local either to its input-vector entry, or to its output-vector entry, or to both. This flexibility on assigning fine-grain tasks brings an opportunity to perform sparse matrix-vector multiply in parallel with a partitioning time and partitioning quality close to those of the D and D parallel algorithms, respectively. We propose two methods to obtain a.d local fine-grain partition each with a different setting and approach where some preliminary studies on these methods are given in our recent work [

4 E. KAYAASLAN, B. UÇAR AND C. AYKANAT P` P r P k x j x j a ij ŷ i y i ŷ i ŷ i + a ij x j y i y i +ŷ i Fig..: A fine-grain task and its parallel computation. task a ij. P` P k P` P k x j a ij ŷi y i x j x j y i a ij ŷ i ŷ i + a ij x j y i y i +ŷ i y i y i + a ij x j a P P P x [x ] x [x, ŷ ] y x a y a a y a x + a x ŷ a x y a x +ŷ

5 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY P P P P P P P P P x P P P x a x y a a y a Fig..: A task-and-data distribution P Π(y Ax) P P P P of matrix-vector P multiply on a sample sparse matrix A. P P P A of the block structure ( P P P x a x x a y a y a

6 E. KAYAASLAN, B. UÇAR AND C. AYKANAT In column-parallel sparse matrix-vector multiply, the basic computational units are the columns. For an input-vector entry x j assigned to processor P k, the fine-grain tasks associated with the nonzeros of A j = {a ij A : i m} are combined into a composite task of daxpy operation ŷ k ŷ k +A j x j which is to be carried out on P k where ŷ k is the partially computed output-vector of P k. As a result, a task-anddata distribution Π(y Ax) of matrix-vector multiply on A for the column-parallel algorithm should satisfy the following condition: a ij A (k) whenever x j x (k) (.9) and in the literature this kind of distribution is known as columnwise partitioning [

7 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Algorithm The row-column-parallel sparse matrix-vector multiply For each processor P k :. (expand) for each nonzero column stripe A (l) (a) form vector ˆx (k) l which contains only those entries of x (k) corresponding to nonzero columns in A (l) k and (b) send vector ˆx (k) l to P l,. for each nonzero row stripe A (k) l ; compute (a) y (l) k A (k) lk x(k) and (b) y (l) k y (l) k + r k A(k) lr ˆx(r) k. (fold) for each nonzero row stripe A (k) (a) form vector ŷ (l) k to nonzero rows in A (k) l (b) send vector ŷ (l) k to P l,. compute output-subvector (a) y (k) A kk x (k), (b) y (k) y (k) + A (k) ˆx(l) k and (c) y (k) y (k) + l k ŷ(k) l. l ; k ; which contains only those entries of y (l) k and corresponding of such messages. Then, the total reduction in the number of messages equals to the number of heterogeneous messages of the local row-column-parallel algorithm... Task-communication dependency graph. We first introduce a two-way categorization of input- and output-vector entries and a four-way categorization of fine-grain tasks (

8 expand NL fold expand fold NL 8 E. KAYAASLAN, B. UÇAR AND C. AYKANAT NL expand expand NL NL fold fold NL (c) expand-fold row-parallel (a) task-communication dependency graph fold expand fold expand NL fold (d) column-parallel fold expand fold fold NL NL expand expand-fold expand expand (b) row-column-parallel expand-fold (e) local row-column-parallel expand-fold expand-fold Fig..: (a) task-communication dependency graph, (b) (e) topological orderings for different sparse matrix-vector multiply algorithms. : input-communication phase, : output-communication phase, : input-output-local tasks, : inputlocal tasks, : output-local tasks, NL: nonlocal tasks. a dependency on the output-communication phase, however, the nonlocal tasks are linked with both communication phases. Figure

9 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 9 In the column-parallel algorithm, each of the fine-grain tasks is either inputoutput-local or input-local due to the columnwise partitioning condition (

10 P P P P P P x a x P P P x y a P y a a P P 0 E. KAYAASLAN, B. UÇAR AND C. AYKANAT P P P P P P P P P P P P x a x x y a a y a Fig..: A sample local fine-grain partition. Here, a is an input-output-local task, a is an input-local task, a and a are output-local tasks. A = A A () A () A () A () A () A () A () A () 0 A () A () A A () 0 0 A () A () 0 0 A () A A () A () 0 A () A () A () A () A () A () A () A () A +. (.) For instance, A = A () + A(), A = A () + A(), A = A () + A(),..., etc. Figure

11 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Algorithm The local row-column-parallel sparse matrix-vector multiply For each processor P k :. for each nonzero off-diagonal block A (k) compute y (l) k lk ; A (k) P` P m P k input-local tasks of P k lk x(k),. (expand-fold) for each nonzero x off-diagonal block A lk = A (k) j ŷ i lk + A(l) lk ; (a) form vector ˆx (k) x j a ij y i l, which contains only those entries of x (k) corresponding to nonzero columns in A (l) ŷ lk i, ŷ i + a ij x j y i y i +ŷ i (b) form vector ŷ (l) k, which contains only those entries of y(l) k corresponding to nonzero rows in A (k) lk, P k (c) send vector P` [ˆx (k) l, ŷ (l) k ] to processor P l.. compute output-subvector x ŷi j x j (a) y (k) A kk x (k) y i x, input-output-local j a ij a ij tasks of P k (b) y (k) y (k) + A (k) ŷ ˆx(l) i ŷ i + a ij k and output-local tasks of P k (c) y (k) y (k) + x j y i y i +ŷ i y i y i + a ij x j l k ŷ(k) l. input-local tasks of other processors P` P k y i a P P P x [x ] x [x, ŷ ] y x a y a a y a x + a x ŷ a x y a x +ŷ Fig..: An illustration of Algorithm

12 E. KAYAASLAN, B. UÇAR AND C. AYKANAT. Two proposed methods for local row-column-parallel partitioning. In this section, we propose two methods to find a local row-column-parallel partition that is required for.d local row-column-parallel sparse matrix vector multiply. One method finds vector and nonzero distributions simultaneously, whereas the other employs two parts in which vector and nonzero distributions are found separately... A directed hypergraph model for simultaneous vector and nonzero distribution. In this method, we adopt the elementary hypergraph model for finegrain partitioning of [

13 P.D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY (a) a sparse matrix P P P V P P V P P (c) a -way local hypergraph partition P V (b) directed hypergraph model y y y x x x x x x x x x y x x x y y y y y y y y (d) local fine-grain partition Fig..: An illustration of attaining a local fine-grain partition through vertex partitioning of the directed hypergraph model that satisfies locality constraints. The input- and output-data vertices are drawn with triangles and rectangles, respectively. j has smaller number of nonzeros than row i and it is amalgamated into v y (i) if vice versa, where the ties are broken arbitrarily. The result is a reduced hypergraph that contains only input- and output-data vertices amalgamated with task vertices where the weight of a data vertex is equal to the number of task vertices amalgamated into that data vertex. As a result, the locality constraint on vertex partitioning of the initial directed hypergraph naturally holds through vertex partitioning on the reduced hypergraph for which the net directions become irrelevant. A vertex partition of this reduced hypergraph can be obtained by any existing hypergraph partitioning tools and then can be trivially decoded as a local fine-grain partition. Figure

14 E. KAYAASLAN, B. UÇAR AND C. AYKANAT " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " (a) task-vertex amalgamations (b) reduced hypergraph P V V V x x x x x x x x x P y x x x P y y y V y y y y P y V y y y (c) a -way hypergraph partition V (d) local fine-grain partition P Fig..: An illustration of local fine-grain P partitioning through task-vertex amalgamations. The input- and output-data vertices are drawn with triangles and rectangles, respectively. The figure on right bottom shows the fine-grain partition. the recursive-bisection framework which distorts the locality of task vertices so that a partition obtained in further recursive steps is no more a local fine-grain partition... Optimal nonzero distribution to minimize total communication volume. This method is composed of two parts. The first part is to find a vector distribution (Π(x), Π(y)) and the second part is to find a nonzero/task distribution Π(A) that exactly minimizes total communication volume over all possible local finegrain partitions those abide by the vector distribution (Π(x), Π(y)) of the first part. In this way, we generate a local fine-grain partition Π(y Ax) = (Π(A), Π(x), Π(y)). The first part can be accomplished by any conventional data partitioning methods such as D partitioning and this section is devoted to the second part of the method. Consider the block structure (

15 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY x x x x x x (a) a sample 0 sparse matrix y y y y 0 0 y 8 y (b) the induced block structure y y y x x x x x x x x x x x x Fig..: A sample 0 sparse matrix A and its block structure induced by input- distribution Π(x) = {x 0 (), x 0 (), x () } and output-data 8 distribution Π(y) 0 = 0 data y y {y (), y (), y () }, where x () = {x, x, xy, x, x 8, x 9 }, x () = {x, x }, x () = {x, x, x, x 0 }, y () = {y, y 0 }, y () = {y, y, y, y, y 8 } and y () = {y, y, y 9 }. y be performed independently for minimizing y total y communication volume. In the local row-column-parallel algorithm, P l sends [ˆx (k) l, ŷ (l) k ] to P k where ˆx (k) l 8 corresponds 8 to the nonzero columns of A (l) 8 8 lk and ŷ(l) k corresponds to the nonzero rows of A (k) lk y, for a nonzero/task distribution A = A (k) y y + A(l). Then, we can derive the following formula for the communication volume φ from P l to P k : φ = ˆn(A (k) ) + ˆm(A(l) ), (.) where ˆn(.) and ˆm(.) refer to the number of nonzero columns and nonzero rows of the input submatrix, respectively. The total communication volume φ is then computed by summing the communication volumes incurred by each nonzero off-diagonal block of the block structure. Then, the problem of our interest can be described as follows. Problem. Given A and a vector distribution (Π(x), Π(y)), find a nonzero/task distribution Π(A) such that each nonzero off-diagonal block A = A (k) + A(l) and each diagonal block A kk = A (k) kk for the block structure induced by (Π(x), Π(y)) minimizing total communication volume φ = k l φ. Let G = (U V, E ) be the bipartite graph representation of A, where U and V are the set of vertices corresponding to the rows and columns of A, respectively, and E is the set of edges corresponding to the nonzeros of A. Based on this notation, the following theorem states a correspondence between the problem of distributing nonzeros/tasks of A to minimize communication volume φ from P l to P k and the problem of finding a minimum vertex cover of G. Theorem.. Let A be a nonzero off-diagonal block and G = (U V, E ) be its bipartite graph representation.. For any vertex cover S of G, there is a nonzero distribution A = A (k) A (l) such that S ˆn(A (k) ) + ˆm(A(l) ),. For any nonzero distribution A = A (k) of G such that S = ˆn(A (k) ) + ˆm(A(l) ). + + A(l), there is a vertex cover S

16 E. KAYAASLAN, B. UÇAR AND C. AYKANAT A (k) Proof. We prove the two parts of the theorem separately.. Take any vertex cover S of G. Consider any nonzero distribution A = + A(l) such that A (k) if v j S and u i S, a ij A (l) if v j S and u i S, (.) A (k) or A (l) if v j S l and u i S. Since v j S for every a ij A (k) and u i S for every a ij A (l), S V ˆn(A (k) ) and S U ˆm(A (l) ), which in turn leads S ˆn(A (k) ) + ˆm(A(l) ). (.). Take any nonzero distribution A = A (k) + A(l). Consider S = {u i U : ) + ˆm(A(l) ). Now, consider a nonzero a ij A and its corresponding edge {u i, v j } E. If a ij A (k) then a ij A (l) } {v j V : a ij A (k) } where S = ˆn(A (k) v j S. Otherwise, u i S since a ij A (l). So, S is a vertex cover of G. At this point, however, it is still not clear how the reduction from the problem of distributing nonzeros/tasks to the problem of finding minimum vertex cover holds. For this purpose, using Theorem

17 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 8 Ak` 8 9 y k = {y,y,y,y,y 8 } x` = {x,x,x,x,x 8,x 9 } G k` v u v u v u v u v 8 u 8 v 9 S k` = {u,u,v,v 8 } Input-communication [x,x 8 ] 8 9 A (k) k` A (`) k` Output-communication [ŷ, ŷ ] Fig..: The minimum vertex cover model for A to minimize communication volume φ from P l to P k. Due to minimum vertex cover S, P l sends [x, x 8, ŷ, ŷ ] to P k. Algorithm Nonzero/task distribution to minimize total communication volume : procedure NonzeroTaskDistributeVolume(A, Π(x), Π(y)) : for each nonzero off-diagonal block A do Equation (

18 8 E. KAYAASLAN, B. UÇAR AND C. AYKANAT u u u u u 8 G G v v v v v 8 v 9 u u 0 v v G u u 9 G v v S ={v,u 0 } S ={v,u 9 } u u u u 8 v v v 0 v S ={v,v 8,u,u } S ={v,u } (a) a minimum vertex cover for each nonzero off-diagonal block of Figure

19 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY 9 running with default parameters and setting the maximum allowable imbalance ratio as %. Since PaToH depends on randomization, we report the geometric mean of ten different runs for each partitioning instance. In all experiments, we report the results using a generic tool called performance profiles [

20 0 E. KAYAASLAN, B. UÇAR AND C. AYKANAT trices with or without dense rows (K = 0).0 Method D D.D-H.D-V.D-L.D-L SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel Matrices with no dense rows/columns rowwise fine-grain local fine-grain local fine-grain Matrices with no dense rows/columns Partitioning time relative to the best Partitioning time relative to the best Communication volume relative to the best.0 (a) Total volume (K = ) Matrices with no dense rows/columns % % % % % % % % 8% 9% 0% Load imbalance ratio.0 (b) Load balance (K = ) Matrices with no dense rows/columns Communication volume relative to the best (c) Total communication volume (K = 0) 0% % % % % % % % 8% 9% 0% Load imbalance ratio (d) Load balance (K = 0) Fig..: Performance profiles that compare total communication volume and load balance using test matrices with no dense rows/columns for K = and 0. respectively, for K = 0. Figures

21 trices with or without dense rows (K = 0).0.D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY Method D D.D-H.D-V.D-L.D-L SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel Matrices with dense rows/columns rowwise fine-grain local fine-grain local fine-grain Matrices with dense rows/columns Partitioning time relative to the best Partitioning time relative to the best Communication volume relative to the best.0 (a) Total volume (K = ) Matrices with dense rows/columns % % 8% % % 0% % 8% % % 0% Load imbalance ratio.0 (b) Load balance (K = ) Matrices with dense rows/columns Communication volume relative to the best (c) Total communication volume (K = 0) 0% % 8% % % 0% % 8% % % 0% Load imbalance ratio (d) Load balance (K = 0) Fig..: Performance profiles that compare total communication volume and load balance using test matrices with dense rows/columns for K = and 0. performances in terms of total communication volume as expected. Figure

22 E. KAYAASLAN, B. UÇAR AND C. AYKANAT trices with or without dense rows (K = 0).0 Method D D.D-H.D-V.D-L.D-L All matrices SpMV Partitioning Section row-parallel row-column-parallel local row-column-parallel local row-column-parallel rowwise fine-grain local fine-grain local fine-grain All matrices Partitioning time relative to the best Partitioning time relative to the best Number of messages relative to the best.0 (a) Total message count All matrices Number of messages relative to the best.0 (b) Maximum message count All matrices Communication volume relative to the best (c) Maximum volume Partitioning time relative to the best (d) Partitioning time Fig..: Performance profiles that compare total message count and maximum message count for three methods D, D and.d-h, maximum communication volume per processor and partitioning time for all methods using all test matrices for K = 0. still be favorable to other methods for particular matrices due to low communication volume it may lead. In short, if the sparse matrix contains dense rows/columns then.d-h seems to be the method of choice in general; otherwise,.d-v and D are reasonable alternatives competing with each other.. Conclusion and further discussions. This paper introduced.d parallelism for sparse matrix-vector multiply. We presented the local row-column parallel sparse matrix-vector multiply that uses this introduced.d parallelism. This algorithm is the fourth parallel algorithm in the literature for sparse matrix-vector multiply in addition to the well-known D row-parallel, D column-parallel and D row-column-parallel ones. In this paper, we also proposed two methods (.D-H and.d-v) to distribute tasks and data in accordance with the requirements of the proposed.d parallel algorithm. Using an extensive set of matrices from the UFL sparse

23 .D PARALLEL SPARSE MATRIX-VECTOR MULTIPLY matrix collection, we compared the partitioning qualities of these two methods against the baseline D and D methods. The experiments suggest the use of the local row-column-parallel sparse matrixvector multiply with a local fine-grain partition obtained by the proposed directed hypergraph model for matrices those contain dense rows/columns as we observe a performance close to that of D fine-grain partitioning in terms of the partitioning quality but with considerably less number of messages and significant efficiency. We consider the problem mainly from a theoretical point of interest and leave the performance of.d parallel sparse matrix-vector multiply algorithms in terms of the parallel multiply timings as a future work. We note that the main ideas behind the proposed.d parallelism, such as heterogeneous messaging and avoiding nonlocal tasks by a locality constraint on partitioning, are of course not restricted to the parallel sparse matrix-vector multiply operation and these ideas can be extended to other parallel computations as well. REFERENCES [] Ümit Çatalyürek and Cevdet Aykanat, Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication, Parallel and Distributed Systems, IEEE Transactions on, 0 (999), pp. 9. [], A fine-grain hypergraph model for d decomposition of sparse matrices, Parallel and Distributed Processing Symposium, International, (00), p. 08b. [], Patoh (partitioning tool for hypergraphs), in Encyclopedia of Parallel Computing, Springer, 0, pp [] Ümit Çatalyürek, Cevdet Aykanat, and Bora Uçar, On two-dimensional sparse matrix partitioning: Models, methods, and a recipe, SIAM Journal on Scientific Computing, (00), pp. 8. [] Elizabeth D Dolan and Jorge J Moré, Benchmarking optimization software with performance profiles, Mathematical programming, 9 (00), pp. 0. [] Enver Kayaaslan, Bora Uçar, and Cevdet Aykanat, Semi-two-dimensional partitioning for parallel sparse matrix-vector multiplication, in Parallel and Distributed Processing Symposium Workshop (IPDPSW), 0 IEEE International, IEEE, 0, pp.. [] Daniël M Pelt and Rob H Bisseling, A medium-grain method for fast d bipartitioning of sparse matrices, in Parallel and Distributed Processing Symposium, 0 IEEE 8th International, IEEE, 0, pp [8] Bora Uçar and Cevdet Aykanat, Revisiting hypergraph models for sparse matrix partitioning, SIAM review, 9 (00), pp [9], Partitioning sparse matrices for parallel preconditioned iterative methods, SIAM Journal on Scientific Computing, 9 (008), p. 8.

Mathematics and Computer Science

Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR