The Pennsylvania State University The Graduate School College of Engineering FAST PARALLEL TRIAD CENSUS AND TRIANGLE LISTING ON

Size: px
Start display at page:

Download "The Pennsylvania State University The Graduate School College of Engineering FAST PARALLEL TRIAD CENSUS AND TRIANGLE LISTING ON"

Transcription

1 The Pennsylvania State University The Graduate School College of Engineering FAST PARALLEL TRIAD CENSUS AND TRIANGLE LISTING ON SHARED-MEMORY PLATFORMS A Thesis in Computer Science and Engineering by Sindhuja Parimalarangan 2016 Sindhuja Parimalarangan Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2016

2 The thesis of Sindhuja Parimalarangan was reviewed and approved by the following: Kamesh Madduri Assistant Professor in Department of Computer Science and Engineering Thesis Advisor Mahmut Taylan Kandemir Professor in Department of Computer Science and Engineering Director of Graduate Studies John Hannan Associate Professor in Department of Computer Science and Engineering Interim Associate Department Head Signatures are on file in the Graduate School. ii

3 Abstract Triad census and triangle counting are essential graph analysis measures used in areas such as social network analysis and systems biology. Triad census is a graph analytic for comparative network analysis and to characterize local structure in directed networks. For large sparse graphs, an algorithm by Batagelj and Mrvar is considered the state-of-the-art for computing triad census. In this work, we present a new parallel algorithm for triad census. Our algorithm takes advantage of a specific graph vertex identifier ordering to reduce the operation count. We also develop several new variants for exact triangle counting and triangle listing in large sparse, undirected graphs. Further, we implement a parallel sampling-based algorithm for approximate triangle counting. We show that our parallel triangle counting variants outperform other recently-developed triangle counting methods on current Intel multicore and manycore processors. We also achieve good strong scaling for both triad census and triangle counting on these platforms. iii

4 Table of Contents List of Figures List of Tables Acknowledgments vi vii viii Chapter 1 Introduction 1 Chapter 2 Background Triangle Counting Triad Census Chapter 3 New Serial and Parallel Algorithms Triangle Counting Approximate Triangle Counting Triad Census Chapter 4 Performance Discussion Experimental Methodology Results and Performance Analysis Triad Census Triangle Counting Approximate Triangle Counting Performance Scaling Impact of ordering on overall performance Performance comparisons to prior work iv

5 Chapter 5 Conclusions and Future Work 34 Bibliography 36 v

6 List of Figures 2.1 A directed graph representation for a canonical triangle isomorphism classes for triads in directed graphs Illustrating all possible triangle counting algorithm variants Triad census analysis of various graphs Parallel scaling of triangle counting methods on SNB and KNC processors Parallel scaling of triad census methods on SNB and KNC processors Parallel scaling of approximate triangle counting on a single node of Lion-XG (SNB) vi

7 List of Tables 3.1 Operation counts for all the counting variants Directed graphs used to evaluate performance of our new triad census approaches. TC indicates total number of connected triads Undirected graphs used to evaluate performance of our new triangle counting approaches. TC indicates total triangle count Triad census execution times (in seconds) Triangle counting performance on Sandy Bridge node Triangle counting performance on KNC Serial approximate triangle counting performance on a single compute node of Lion-XG (SNB) Parallel approximate triangle counting performance on a single compute node of Lion-XG (SNB) Performance Impact of ordering strategy (NAT: Natural, SF: smaller degree first, LF: larger degree first) for parallel triad census and parallel triangle counting on SNB (16 cores). Table values are performance improvements over random vertex ordering (higher values are better) vii

8 Acknowledgments I would like to heartily thank my advisor Prof. Kamesh Madduri for his guidance and support in my research. I extend my warm regards to my family and friends for their unwavering support. This work is supported by the US National Science Foundation grants ACI and CCF viii

9 Chapter 1 Introduction A triad is a graph between three vertices. There are different connectivity patterns that can occur between three nodes in a graph [1]. In an undirected graph, there are four possible triads (all three vertices connected to each other; the null triad, or no connections between three vertices; one edge subgraph; two edge subgraph). In a directed graph, sixteen nonisomorphic triads can be enumerated based on the the direction of edges and the number of vertices connected to each other. Those that involve only two vertices connected by one or more directed edges are also referred to as dyads. Triad census of a directed graph refers to determining the number of each of the 16 kinds of triads; or determining the frequencies of isomorphic triads [2]. It is possible and relevant for certain applications to count only one or a few of the 16 triad types. When all three vertices are connected in an undirected graph, this triad is called a triangle. A triangle is considered to be the most basic and non-trivial subgraph. The fundamental triangle related problems are triangle finding, counting (determining the number of triangles) and listing (listing the triangles found - vertices involved in each triangle found) [3] [4]. Triangle counting is generally classified into two types - global triangle counting and local triangle counting. Global triangle counting involves finding the total number of triangles in an undirected graph while local triangle counting calculates the number of triangles incident to each node. The latter can be used to yield results for the former, but there are more efficient methods for global triangle counting. Approximation procedures and heuristics have been developed for both triad census and triangle counting [5] [6]. Depending on the application, accurate or approximate methods of triangle counting are implemented. The notion of a triad has its roots in sociology and social network analysis, 1

10 with work on the triadic closure concept motivating it [7]. Triad census is a graph analytic used for comparative network analysis and to characterize local structure in directed networks [8]. The aforementioned 16 triads can be further classified into mutual, asymmetric and null triads (none of the three vertices are connected by an edge), each with an associated attribute of up, down, cyclical or transitive. The mutual attribute depicts a bidirectional edge while asymmetric denotes a unidirectional edge between two vertices. This classification aids in identifying the required pattern for a specific application. Triad census helps in detecting various motifs and structural properties between three nodes of a network. For example, in a friend s network, a fully connected triad would depict three mutual friends, extending the likelihood of their neighbors becoming friends too. Triad census gives a measure of reciprocity of relations within a network, which can lead to deductions of stability and hierarchy of the network. Triad distribution and density can be used to detect strongly connected components on the internet that will further aid congestion control and bandwidth allocation. Triadic transitivity (or intransitivity) analysis in directed graphs provides key information about graph equilibrium and the direction of graph growth. Too less transitivity could depict a disorganized portion of the graph; while a high transitivity would mean that the portion is internally highly clustered, but isolated from the rest of the graph. Statistics of triangles in an undirected graph is elemental for complex network analysis attributes like clustering coefficient and transitivity ratio [9]. Triangle counting plays an important role in security graph applications. Triangle distributions can be used to classify spam and non-spam hosts on a network. Triangles in a graph of web pages denote mutual recognition and are used as a seed to identify thematic structures in the graph. Triangles (like triads) depict homophily and transitivity in social networks [9]. It contributes to computation of Jaccard index which is a measure of difference (or similarity) between communities. This information, in conjunction with other data, can predict adjacencies of a future vertex or edge additions. There is an abundance of triangles in protein-interaction networks [10] and facilitates linkage between structural and functional properties of these biological networks. For large (and usually sparse) real world networks, naive triad census and triangle counting algorithms are inefficient with respect to space or time or both. Information from triad census and triangle counting is being widely used in various 2

11 applications. A major part of these algorithms involve computations with loops that have a high potential for parallelization. Vertex ordering strategies and smart data structures have a sizeable impact the time and space complexity of the algorithm. This motivates algorithms to be implemented in a faster and more space efficient way. This is a driving factor for approximate algorithms and parallelization strategies implemented in both shared and distributed memory. With the advent and availability of multicore and many core processors, implementations are shaped to increase their performance by utilize their inherent parallel architecture and cache structures. The simplistic and scalable nature of OpenMP standard promotes its use for parallelzed execution on shared memory platforms. Exploring such optimization methods have been the focus of this thesis. 3

12 Chapter 2 Background Numerous applications have resulted in the investigation of a multitude of algorithmic approaches for triangle counting and triad census [11]. The algorithms vary based on memory constraints and time complexity apart from parallelization strategies for implementation on shared and distributed memory platforms. 2.1 Triangle Counting The fastest-known algorithms for triangle counting are based on matrix multiplication, with running time Θ(n γ ), γ < The ones used frequently used in practice has been described by Latapy [4]. Let G(V, E) be an undirected graph with n = V vertices and m = E edges. Further assume that G is a simple graph (no multiple edges) and has no self loops. Let Adj(v) be the set of adjacencies of v, Adj(v) = {u : v, u E}. The degree of v, d v, is the size of Adj(v). For undirected graphs, v V d v = 2m. Let d max = max v V d v. Crude triangle counting algorithms takes Θ(n 3 ) time and Θ(n 2 ) space. Matthieu Latapy [4] describes a series of space efficient and time efficient algorithms for triangle counting. The first algorithm counts triangles by determining number of triangles in two parts of the graph and then merging the obtained results. The graph is partitioned by a factor K, where K Θ(m ω 1 ω+1 ).The triangles formed by vertices v = {v : d v < K} are counted based on the intersection of their adjacency lists, along with constraints to ensure that the same triangle is not counted twice. Number of triangles formed by vertices v = {v : d v > K} is determined by fast matrix product of the adjacency matrix. The two results are merged to get a total count. Although such an algorithm can take advantage of sparse matrix 4

13 Algorithm 1 Compact forward algorithm for triangle listing. 1: procedure CompactForce(V, E) 2: Renumber vertices in G according to η() where η(u) < η(v) if d(u) > d(v) 3: Sort Adjacency array associated with each vertex by η() 4: for vertex v V do taken in order of η() 5: for vertex u Adj(v) where η(u) > η(v) do 6: u is first neighbor of u and v is the first neighbor of v 7: while there are untreated vertices of u and v and η(u ) < η(v) and η(v ) < η(v) do 8: if u < v, set u as the next neighbor of u 9: else if u > v, set v as the next neighbor of v 10: else list u, v, u as a triangle set u to the next neighbor of u set v to the next neighbor of v computations, they have a restrictive space complexity in addition to the error prone fast matrix implementations.it is demonstrated that given the adjacency matrix representation, crude triangle listing can be achieved in Θ(n 3 ) time and Θ(n 2 ) space. The main drawback of this is that the time complexity is independent of sparse or dense graphs. Listing procedures based on vertex iterators and edge iterators improve upon the time complexity to Θ(md max ) for sparse graphs by using adjacency array representations. However, the performance of these algorithms falls when the maximum degree is unbounded. An improved algorithm for sparse graphs referred to as the forward algorithm, proposed earlier by Schank and Wagner [2], is presented with Θ(m 3 2 ) time and Θ(m) space complexity. It uses only the adjacency array representation and an injective function η() such that for any two vertices u, v V, η(u) < η(v) if d(u) > d(v). Every vertex u is associated with an array A(u) as the set v Adj(u), η(v) < η(u). Walking the graph in ascending order of η() and looking for intersections in the A() list ensures that there are no duplications in triangle counting. The forward algorithm has been further improved upon for space complexity. The forward algorithm requires arrays η() and A() to be stored in memory throughout the graph computations.in the new compact forward algorithm, the vertices are renumbered by η() and there is no requirement to maintain the η() array after that. Vertex ordering by degree in this algorithm produces a time complexity of Θ(m 3 2 ) and space complexity of Θ (2m + 2n). Another technique proposed by Schank and Wagner to bring down execution 5

14 time [2] is a variant of the edge iterator algorithm that attempts to cut down on time by using hash containers for the adjacencies of each vertex. An intersection of adjacencies can be determined by checking if the smaller container is present in the larger container. It has a time complexity of Θ( u,v E d u, d v ) This is in fact a precursor to the forward algorithm described above and the forward algorithm also has a hashed implementation. Such algorithms, however, require complex datastructures and hash functions for their implementations. Ortmann and Brandes [3] classify triangle counting algorithms into two types - neighbor intersection and adjacency testing. In neighborhood intersection, all edges are iterated and intersection is checked for between the adjacencies of the incident vertices. The edge-iterator, forward and compact forward algorithms discussed earlier fall in this category. Adjacency testing comprises of two stages - marking adjacencies of a vertex and scanning neighbors of each adjacency to look for a marked vertex. It can be faster than neighborhood intersection and the scanning step is performed via bit vectors. However, it requires additional space to store the same bit vectors and poses a latency challenging while accessing its elements. Node-iterator algorithms fall in this category. This classification is the same as adjacency intersection (AI)-based and adjacency marking (AM)-based methods. Algorithms 2 and 3 give the templates for each of these methods. The key distinction is that the AM-based methods use additional storage for adjacency lookups and perform faster set intersections in comparison to adjacency intersection (AI)-based methods. Algorithm 2 The general structure of adjacency intersection-based triangle counting algorithms. 1: procedure TriCount-AI(V, E) 2: tc 0 3: for all v V do 4: for all u Adj(v) do 5: tc tc + Adj(v) Adj(u) 6: return tc 3 Performance of triangle counting algorithms can also be improved by Vertex reordering and renumbering. Degree ordering (vertices sorted by ascending or descending order of their degree) and smallest-first ordering (used commonly in graph coloring practices) are some of the common techniques. In Figure 2.1, we 6

15 Algorithm 3 The general structure of adjacency marking-based triangle counting algorithms. 1: procedure TriCount-AM(V, E) 2: tc 0 3: for all v V do 4: for all u Adj(v) do 5: mark u 6: for all u Adj(v) do 7: for all w Adj(u) do 8: if w is marked then tc tc + 1 9: for all u Adj(v) do 10: unmark u 11: return tc 3 show one way of generating a canonical representation, which is by selecting only the triples u, v, w that satisfy u < v < w. Note that this canonical representation can also be used to convert the undirected triangle to a directed graph, with edges oriented from vertices with lower labels to ones with higher labels.the edges can be labeled as two short edges (S1, S2) and one long edge (L). They both follow the rationale that the time complexity of choosing the next vertex for a triangle check is bounded by the maximum degree of the graph G. It has been observed that it takes a little more time to determine smallest-first ordering and thus degree ordering has an overall better time complexity. However, smallest-first ordering is more suitable for small world while degree ordering is more suitable for graphs with skewed degree distribution. Figure 2.1: A directed graph representation for a canonical triangle. Large graph data has posed a challenge in terms of storing and processing 7

16 it in a time and space efficient manner. This has been dealt via a combination of three methods - graph partitioning [12], approximation techniques [13] and parallelization [14]. Prudent graph partitioning practices take into account the cache structure of the implementation platform and then store only a chunk of graph data for that point in time in main memory. Map reduce approach is commonly used for parallelizing graph partitioning algorithms of triangle counting [15] [16]. Suri and Vassilvitski [17] present a Map Reduce algorithm keeping in mind the work distribution depending on the memory available in each node. their algorithm is agnostic of the sequential triangle counting algorithm itself. Park et. al. [18] extend this algorithm for larger graphs by increasing the maximum load each reducer node can handle. Approximate triangle counting has been approached by using a variety of strategies, each with trade-offs in memory, peed and accuracy. A parallel multilevel shared memory implementation for subgraph enumeration called FASCIA (Fast Approximate Subgraph Counting and Enumeration) [6] performs approximate triangle counting by color coding techniques. This tool has support for both shared and distributed memory systems. Shun and Tangwongsan [19] recently developed several shared-memory parallel schemes for triangle listing and related problems. They address the load balancing difficulties in triangle counting by resorting to dynamic multithreading. The parallel algorithm designed are cache oblivious [20], eliminating complex tuning requirements. They report parallel performance results of their implementations on a quad-socket, 40-core Intel server. Their main approach is also based on the compact-forward algorithm. They cover two approaches for exact triangle counting, namely, the merge approach and the hash approach to arrive at the intersection between the adjacency lists of the two vertices concerned. This work also reviews several prior algorithms to exact and approximate triangle counting. We present a detailed empirical comparison of our new methods to the fastest approach from [19]. J. Kim et. al. [21] propose an Overlapped and Parallel Triangulation Framework for multicore platforms called OPT. Their algorithm is a triangle listing algorithm based on edge-iterator and vertex iterator procedures. This paper deals with large scale graphs by dividing it into smaller graphs. They load this small graph into memory and execute the algorithm over it. This is done repeatedly till the entire graph is covered. They also have a parallel implementation of the algorithm on multicore platforms. The algorithm overlaps triangle computations in internal 8

17 and external memory. Their work addresses triangle listing in dynamic graph processing in light of parallel in-memory computations. Chu and Cheng [22] also deal with exact triangle listing of graphs that cannot fit in main memory. Local triangle counts of the partitioned graphs are combined to provide the global triangle count. Their focus is on avoiding random memory access. For applications that entail local triangle statistics, L. Becchetti at. al. [23] describe two semi-streaming approximate triangle counting algorithms. Other implementations of approximate triangle counting include those based on eigenvalues of adjacency matrix of a graph [24]. They have shown improved speed compared to algorithms that rely on intersecting adjacency lists of involved vertices. Tsourakakis et. al. [25] have come up with a parallelizable pre-processing method called DOULION. This approximate triangle estimation algorithm accounts for vertices and edges to be checked for triangle incidence on the basis on the sparsif ication parameter. Yet another probabilistic model for estimation of global triangle count is done using results of the birthday paradox problem [26].This is a streaming algorithm and the birthday paradox is used to predict the number of closed wedges from the stream of edges. It is more space efficient as it requires Θ( n) space under the condition of constant transitivity and higher number of edges compared to wedges. Wedge sampling has been used to provide triangle counts with high accuracy [5]. This is based on the concept that triangles are closed edges. So, identifying the number of closed edges would account for the global clustering coefficient and hence the global triangle count. The approximate count is determined by inspecting k vertices for closed wedges. Algorithm 4 An outline of wedge sampling algorithm for approximate triangle counting. 1: procedure WedgeSampling(V, E) 2: Determine wedge probability distribution W v 3: Select k vertices in a uniform random manner from W v 4: for all v W v do 5: mark u 6: Choose uniformly two vertices u1, u2 Adj(v) 7: Check for possibility of closed edge between u1 and u2 8: return estimate of triangle count X The value of k is determined by two constants δ and ɛ : k = 0.5ɛ 2 ln( 2 δ ). 9

18 With this value of k, Algorithm 4 outputs an estimate X for the triangle count T C such that X T C < ɛ with a probability greater than (1 δ). Such computations make k independent of graph size. We need a common k value for medium sized as well as large graphs. This greatly reduces the relative number of operations. However, all m edges used to compute the wedge probability distribution W v. Although only k vertices are considered to compute the clustering coefficient (and hence the triangle count), tuning of parameters ɛ and δ can help achieve extremely high accuracy. This is also because these k vertices are picked uniformly according to the wedges probability distribution of the graph. Seshadri et. al. [5] have been able to bring about an accuracy of 99.9% using the wedge sampling algorithm. 2.2 Triad Census As shown in Figure 2.2, triads in directed graphs can be divided into 16 isomorphism classes. When considering any three vertices in a directed graph, we can have one of three cases: a null triad with none of the vertices connected to one another, dyadic triads with only two of the three vertices connected, and connected triads, where all three vertices are connected to one or more of the other two vertices. The connections can be asymmetric or mutual (unidirectional or bidirectional for the vertices concerned). The patterns in the figure are further labeled based on the edge direction (U: Up, D: Down), transitivity (T: transitive), and cyclicity (C) [27]. The naming of the triad types follows a 3 number rule - number of mutual dyads, number of asymmetric dyads and number of null dyads. Like triangle counting algorithms, sequential triad census algorithms also use a canonical ordering of vertices [28] [29] to avoid repeated lookups. Moody [1] provides matrix based equations to compute each type of triad separately with O(n 2 ) complexity. These equations are useful for scanning the graph for a specific triad type, but simultaneously determining counts of all triads reduces the total amount of data accessed and increases reuse. Batagelj and Mrvar [30] present a sub-quadratic triad census algorithm of complexity O(md max ) that is suitable for graphs with a low d max. This algorithm is implemented in the Pajek [31] graph analysis software package. Apart from visualization features, Pajek moves away from matrix representation based network 10

19 u u u u v w v u w v u w v 4-021D w u v w v w v w 5-021U 6-021C 7-111D 8-111U w u u u u v 9-030T w u v C u w v u w v D u w v U w v C w v w v w Figure 2.2: 16 isomorphism classes for triads in directed graphs. analysis to more sophisticated techniques. While the asymptotic bounds for triad census are the same as triangle counting, this algorithm greatly reduces the number of adjacency intersections required in order to perform the census. Algorithm 5 lists the main routine. N(v) denotes the set of all neighbors of v. In addition to adjacencies from outgoing arcs, incoming arcs are also considered in this set. Canonical ordering is used to ensure every triad is counted only once. The algorithm uses a simple subroutine called TriCode, and an array called TriTypes to reduce the number of conditional statements. TriCode inspects the specific connectivity pattern of u, v, and w and assigns a value between 0 and 63 to the triple currently being inspected. This value is then used to index the TriTypes array to figure out the triad corresponding to this pattern. Connected triads are identified in the main 11

20 Algorithm 5 Batagelj Mrvar [30] triad census algorithm. 1: procedure TriadCensus-BM(V, E) 2: for i 1, 16 do 3: C[i] 0 4: for all v V do 5: for all u N(v) do 6: if v < u then 7: S N(u) N(v) \ {u, v} 8: if v Adj(u) and u Adj(v) then 9: tritype 3 10: else 11: tritype 2 12: C[tritype] C[tritype] + n S 2 13: for all w S do 14: if (u < w) or (v < w and w < u and w / N(v)) then 15: tricode TriCode(v, u, w) 16: tritype TriTypes[tricode] 17: C[tritype] C[tritype] : sum 0 19: for i 2, 16 do 20: sum sum + C[i] 21: C[1] = 1 n(n 1)(n 2) sum 6 22: return C algorithmic nested loop (line 13), while dyadic triads can be counted based on the number of transitive edges (lines 8 to 12). Finally, null triads are not explicitly computed, but instead determined using the total triad count of ( ) n 3. They applied this algorithm to the routing data of the internet and reported their triad census results. This is considered state of the art and we use it is used the basis for the new algorithms described in Chapter 3. Chin et al. [32, 33] discuss parallelizations of the Batagelj-Mrvar sequential algorithm on shared memory architectures (Cray XMT) and evaluated performance with loop futures and interleaved scheduling techniques. Seshadhri et al. [5] present a novel strategy to approximately count each triad pattern. They adapt the wedge sampling strategy for this purpose, which they also use to determine triangle counts. 12

21 Chapter 3 New Serial and Parallel Algorithms We first introduce the new parallel approaches for triangle counting, followed by triad census with implementation details. These new algorithms aim to boost both time and space complexities. It has been achieved by experimenting with various existing techniques and enhancing them with vertex ordering and cache efficient parallelization procedures. 3.1 Triangle Counting We use the adjacency intersection and adjacency marking classification from the previous section, and systematically list all possible variants for canonical triangle counting and triangle listing. See Figure 3.1. We define Adj + (v) to be the subset of Adj(v) comprising of vertices w with w > v. d + v is the cardinality of Adj + (v). Adj (v) and d v are defined similarly. For an undirected graph with no self loops, d v = d v + d + v and Adj + (v) Adj (v) =. Intersection-based algorithms are typically structured as shown in Algorithm 2, and marking-based algorithms are similar to Algorithm 3. The three AI algorithms shown in the figure differ by the adjacency sets that are being tested for common elements. All three variants search for the canonical triangle v, u, w, v < u < w, thus avoiding duplicate counting. The six AM variants differ by which pair of adjacency sets are involved in the marking and scanning process. Again, they all maintain the same vertex ordering. In our implementation of adjacency intersection-based methods, we perform a set 13

22 Table 3.1: Operation counts for all the counting variants. Variant Op Count Comments AI1 AI2 AI3 AM1 O( v O( v O( v O( v d + v + v d v + v d + 2 v + v d + v + v d + v d v ) use with low-degree-first ordering d v d v ) use high-degree-first ordering to get compactforward [4] d 2 v ) better to use AI1/AI2 with degree-based ordering d + v d v ) does not exploit ordering AM2 AM3 O( v O( v d v + v d + v + v d v d v 2 ) use with high-degree-first ordering 2 ) similiar to AM2 AM4 AM5 O( v O( v d + v + v d v + v d + v 2 ) use with low-degree-first ordering d + v d v ) similar to AM1 AM6 O( v d v + v d + v 2 ) similar to AM4 intersection by striding through the sorted sequences of adjacencies and identifying common ones. This is similar to the merge routine used in the merge sort algorithm. For two sorted lists of size p and q, determining their intersection requires Θ(p + q) operations using this simple merge-like routine. The main difference between the three intersection-based algorithms is the choice of adjacency lists to intersect. Again, we use the ordering v < u < w to guide us. We can also precisely determine the operation count for the overall algorithm in terms of vertex degrees. For instance, each intersection of AI1 requires O(d + v + d + u ) operations, and so the overall operation count is O( d + v + (d + v + d + u )). The second term simplifies to v V v,u E,v<u O( d + v d v ), giving the overall bounds shown in Table 3.1. We similarly derive the v V bounds for the two other AI variants. The marking-based approaches essentially perform set intersections by using O(n) auxiliary space. This reduces the operation counts to the ones given in 14

23 u 1 2 Adjacency intersection-based u 3 u v w Adj + [v] Adj + [u] v w Adj - [u] Adj - [w] v w Adj + [v] Adj - [w] u Adjacency marking-based u u v w Mark Adj + [v] Scan Adj + [u] u 4 v w Mark Adj - [u] Scan Adj - [w] u 5 6 v w Mark Adj + [v] Scan Adj - [w] u v w Mark Adj + [u] Scan Adj + [v] v w Mark Adj - [w] Scan Adj - [u] v w Mark Adj - [w] Scan Adj + [v] Figure 3.1: Illustrating all possible triangle counting algorithm variants. Table 3.1. Note that all these variants perform correctly for any vertex labeling. However, some variants perform fewer operations than others for certain vertex orderings. Consider graphs with highly skewed vertex degree distributions. If we reorder the vertices such that vertices of lower degree are assigned lower vertex identifiers, we are reducing v d + 2 v. This is the theoretical justification for using the AI1 and AM4 variants, and reordering vertices such that lower degree vertices are assigned low vertex identifiers to directlyreduce operation count. A further improvement to this simple degree-based vertex ordering would be to use a core number-based vertex ordering, as suggested by Ortmann and Brandes [3]. Ortmann and Brandes further show that the running time bounds under favorable orderings is O(mα(G)), where α(g) is the arboricity of the graph. The arboricity for real-world sparse graphs is typically a low value [29], and hence these ordering heuristics work well in practice. In Algorithms 6 and 7, we list the pseudocode for our parallel implementations 15

24 Algorithm 6 Our parallel adjacency intersection (AI)-based triangle counting algorithm (variant 1), exploiting vertex ordering and avoiding redundant counting. 1: procedure TriCountOptPar-AI(V, E) 2: tcl 0 thread-local count 3: for all v V pardo Parallelize 4: for all u Adj + (v) do 5: tcl tcl + Adj + (v) Adj + (u) 6: tc tcl sum all thread-local counts 7: return tc of AI1 and AM4, respectively. On shared memory systems, the outer loop over the vertices can be partitioned to multiple threads, and each thread updates a local variable to track the triangle count. These values can be aggregated in the end to get the global triangle count. Thus, this scheme is fairly simple to parallelize, with the only synchronization required at the end. However, note that with the degree-based ordering of vertices, the operation counts and running time increase as v increases. A few threads may have to work longer than the rest (maybe scan more vertices because that thread is assigned a high degree vertex). So, the other threads will be idle and the final triangle count cannot be computed till all the threads contribute their individual counts. A naive static partitioning of the outer loop to multiple threads may thus lead to considerable load imbalance. We explore different loop scheduling strategies in our empirical evaluation of these methods. Algorithm 7 lists the pseudocode for the AM4 variant. In our implementation, we use a per-thread bit vector to mark adjacencies. For a graph with 50 million vertices and 250 threads of execution, this scheme would require about 1.5 GB of additional memory to store the bit vectors, and so the AM variants are still applicable to a large class of graphs. This approach trades off reduced strided adjacency array accesses for potentially random memory lookups. However, if the bit vector can be cached and reused, then the random memory access latency can be amortized. Again, the use of degree ordering with the AM4 variant permits this, as the marked vector is reused for multiple iterations of the loop over v (loop at line 8 of Algorithm 7). While the algorithms do not show the steps for triangle listing, the actual implementation is straightforward. Each thread maintains a large in-memory buffer for storing the triple IDs for each triangle, and the buffer is written to disk when it is full. 16

25 Algorithm 7 Our parallel adjacency marking (AM)-based triangle counting algorithm (variant 4), exploiting vertex ordering and avoiding redundant counting. 1: procedure TriCountOptPar-AM(V, E) 2: tcl 0 thread-local count 3: for i 1, n do thread-local mark array 4: M[i] 0 5: for all u V pardo Parallelize 6: for all w Adj + (u) do 7: M[w] 1 8: for all v Adj (u) do 9: for all (w Adj + (v)) do 10: if M[w] = 1 then 11: tcl tcl : for all w Adj + (u) do 13: M[w] 0 14: tc tcl 15: return tc 3.2 Approximate Triangle Counting Our approximate triangle counting algorithm is an extension of the Seshadri et. al [5] wedge sampling method. We choose k random vertices uniformly from the wedge probability distribution of the given graph. For each of these vertices, we select two vertices (again in a uniform random manner) and check if there is a possibility of a closed wedge (triangle) between them. Parallization and optimization strategies have been used to its improve time and space complexity. This algorithm has a lot of scope for paralleization at different levels - determining probability distribution of wedges, extracting k vertices and estimating the number of closed edges. In contrast to common methodology, this method interestingly uses clustering coefficient data to determine the triangle count. The total number of wedges any vertex can have is given by ( ) d v 2. This information is used to create a wedges probability distribution of the graph. When k vertices are chosen from this distribution, we are indirectly giving preference to those vertices with higher degrees. This process of randomly selecting k vertices can be parallelized comfortably. Two adjacencies of each of the k vertices are selected at random to form a wedge. The approximation strategies end here. The possibility of a closed wedge between the 17

26 Algorithm 8 Parallel wedge sampling algorithm. 1: procedure WedgeCountPar(V, E) 2: k 0.5ɛ 2 ln 2 δ 3: totalw 0 Total possible wedges of v 4: for all v V pardo wedge probability distribution 5: wedgetotalv = ( ) d v Total number of possible edges of v 2 6: totalw totalw + wedgetotalv 7: totalw totalw 8: meancc 0 local mean clustering coefficient 9: kv ertices : uniform random method from wedgetotalv Parallelized 10: for all v kv ertices pardo Estimation of closed wedges 11: r1, r2 uniform randomly selected vertices from Adj(v) 12: for all w Adj(r1) do 13: if (w == r2) mean mean : meancc meancc 15: meancc = meancc/k 16: tc meancc totalw/3 17: return tc chosen adjacencies is determined by intersecting their respective adjacency lists with no explicit array being stored in memory for the same. The resulting clustering coefficient counts are combined to output an approximate triangle count. We have optimized this algorithm for locality by ordering the vertices and adjacency lists by degree [34]. This also aids in reducing memory storage as the adjacency intersection methods used to determine closed wedge counts is similar to triangle counting AI variants described above. The vertex ordering has been utilized to inspect only that part of r1 s adjacency list that can contain r2 - binary search has been implemented to determine existence of r2. So, this eliminated a large number of iterations in the nested loop. With appropriately small values of δ and ɛ, highly accurate triangle counts have been obtained by the above mentioned parallel implementation. We can see from the loop used to calculate total number of closed edges that smaller the value of k, lesser the number of vertices examined but the accuracy also falls. We can also note from the formula used to determine k that it is independent of the size of the graph or type of the graph. We have performed experiments to arrive at an acceptable value of k. 18

27 3.3 Triad Census We next discuss our new methods for triad census. Our main contribution is to combine the vertex reordering strategies with the Batagelj-Mrvar algorithm, in order to reduce operation counts. We also further simplify the algorithm to remove extraneous conditions. We count and list canonical triads only, and they are again given by the triple v < u < w. We first define N + (v) and N (v) analogous to the definition of Adj(v) in the undirected case. N + comprises all vertices k such that there is a directed edge from v to k, or k to v, or both, and k > v. Since N + (v) and N (v) rely on ordering of vertices from both incoming and outgoing arcs into v, we found it best to combine these two lists. We use an adjacency array representation of the graph. To quickly detect the direction of an arc given just the adjacency, we use an optimization proposed by Chin et al. [33]. The idea is to use two bits of the 32- or 64-bit word for the adjacency identifier to compactly store the edge direction. We set the bits to 11 if there are edges in both directions, and 01 (10) for just outgoing (incoming) edges. This compact scheme avoids unnecessary adjacency lookups and permits both AI- and AM-based implementations. A second optimization is to implicitly determine the vertices that are added to S (see Algorithm 5), without actually creating the array. This is made possible by using a merge-like routine to stride through the sorted adjacency lists of u and w, similar to the step in AI. We refer to our implementation of the census algorithm with these two changes (two bits for adjacency direction, implicit S) as the baseline version of the census routine. Building on the baseline, we develop two optimized variants (AM and AI). The AM-based approach is listed in Algorithm 9. To simplify the pseudocode, we remove the lines corresponding to counting the dyadic triads. After we do so, we can exploit the sorted ordering of the adjacencies. S is again maintained implicitly, and is different from the one in Algorithm 5. Each thread uses a local array to mark vertices and simplify adjacency intersections. The total triad count is computed by synchronizing the sum of all individual thread triad counts after completion of tasks by all threads. We parallelize this approach in a shared-memory environment by distributing 19

28 Algorithm 9 Counting all connected triads using our optimized parallel approach (adjacency marking variant). 1: procedure TriadCensusOptPar-AM(V, E) 2: for i 4, 16 do thread-local census counts 3: Cl[i] 0 4: for i 1, n do thread-local mark array 5: M[i] 0 6: for all v V pardo Parallelize 7: for all u N + (v) do M[u] 1 8: for all u N + (v) do 9: for all (w N (u) and w > v) do 10: if (M[w] = 0) then 11: tricode TriCode(v, u, w) 12: tritype TriTypes[tricode] 13: C[tritype] C[tritype] : S N + (u) N + (v) \ {x : x N + (u) and x > v} S maintained implicitly 15: for all w S do 16: tricode TriCode(v, u, w) 17: tritype TriTypes[tricode] 18: C[tritype] C[tritype] : for all u N + (v) do M[u] 0 20: C Cl Sum all thread-local counts 21: return C iterations of the outer loop to multiple threads. Notice again that there is minimal communication synchronization required for both counting and listing, as threads only need to update local counts. Both the variants we implement work for any vertex ordering, but a degree-based ordering certainly benefits graphs with skewed vertex distributions, as the loops have operation counts proportional to d + 2 v. Like the case of triangle counting, a naive outer loop parallelization will lead to significant load imbalance, as we are using a specific vertex ordering. The operation count analysis of this algorithm is similar to the previous case of triangle counting. Step 14 of the algorithm iterates more number of times than the other variants. So, the operation count is v d + 2 v instead of v d + v. However, the larger number of patterns that we track increases the complexity. Specifically, we need to get the connectivity of every triple in both directions to update the 20

29 appropriate triad count. This means we need to inspect a larger range for w (see line 9 of the Algorithm), when compared to a similar triangle counting algorithm. A minor additional optimization over the baseline is to simplify the TriCode routine (not shown here). We reduce conditional checks for existence of edges using bitwise operations. Optimal implementations of AI1 and AM1 variants of triad census has been parallelized to obtain performance results. AI is more space efficient as we do not store S explicitly as in the baseline version; rather the relevant adjacencies are scanned and compared directly for common elements. AM has to reserve extra space to store bit vectors used to mark adjacencies. Both AI and AM use the TriCode subroutine which has been optimized to remove branching operations that can hinder parallelization. In general, the code has minimal conditional operations. They have been substituted as appropriate by logical loop limits or bit operations. Proper loop inital and end points have been determined through binary search in order to avoid redundant iterations. Random pointer references and accesses have been curtailed by copying arrays locally. The code outputs only the connected triads as they are the ones that are challenging to optimize. The other types of triads can be calculated in constant time using formulas. 21

30 Chapter 4 Performance Discussion We evaluate performance of our optimized triad census and triangle counting variants on several large-scale graphs. Both sequential and parallel implementations are assessed for efficiency and complexity on single core and manycore platforms. 4.1 Experimental Methodology Table 4.1 lists all the directed graph instances that we use. We chose these graphs from the Koblenz network repository [35] and the UFL Sparse matrix collection [36]. Most of them are crawls of the web or of social networks, and the original sources of these graphs are also listed in the table. We removed any self loops and parallel edges that may be present in the original graphs. The table also lists the total count of all connected triads (isomorphism classes 4 to 16 in Figure 2.2) in these test graphs. We omit the counts of the dyadic patterns and the null triad, as these counts are indirectly derived from the rest, and they are several orders of magnitude larger than the count of the connected triads. The total raw counts of connected triads vary by nearly six orders of magnitude among the input graphs, from 320 million to 123 trillion. We also report d max for each graph. As we mention earlier, the skew in degree distribution motivates the vertex ordering schemes. We also use several undirected graphs, listed in Table 4.2, to evaluate performance of the triangle counting variants. Of the 10 input graphs, 6 are derived from directed graphs used in Table 4.1, by symmetrizing the graphs and removing parallel edges. The input graphs are ordered in increasing order of connected triad/triangle counts, as we note that the operation counts, as well as the algorithm running times, appear to scale linearly with the total count. 22

31 We preprocess all the data to reorder the vertex labels to be in non-decreasing degree order, and write the graphs to disk. We also evaluate the impact of alternate vertex orderings on overall performance. Note that the running times reported in this section do not include the time to read the graph from disk, nor the time to reorder the graph. For large graphs, the reordering time can be ignored in comparison to the overall census running time. For instance, reordering the twitter graph takes less than 1 minute in serial, whereas the census method running time in parallel is nearly 7.25 hours. However, initial graph I/O time and reordering time may not be negligible for small graphs (say, less than vertices). We evaluate performance on a single compute node of the TACC Stampede supercomputer. Each compute node has two 8-core Intel Xeon E processors (Sandy Bridge SNB microarchitecture) and one Intel Xeon Phi SE10P coprocessor (Knight s Corner KNC microarchitecture). The Xeon processors can access 32 GB of DDR3 memory. The Xeon Phi coprocessor has GHz cores and 8 GB of GDDR5 memory. We compile our programs (written in C and using OpenMP) using the Intel C/C++ compilers (v15.0.2) with -O3 optimization. We bind threads to cores using the KMP_AFFINITY and MIC_KMP_AFFINITY environment variables. On the Xeon E5s, we use the compact strategy and the balanced strategy on the Xeon Phi. In order to compare our implementations to prior work, we run Shun and Tangwongsan s orderedmerge variant [19] for exact triangle counting. This is similar to our AI1 variant. We built this code using the same version of Intel C++ compilers, and control parallelism using the CILK_NWORKERS and MIC_CILK_NWORKERS variables. We did not find any publicly-available parallel triadic census codes or implementations, and so most of our comparisons are relative to the baseline version (i.e., our parallel implementation of the Batagelj-Mrvar algorithm). To analyze the performance of the approximate triangle counting algorithms, we ran them on a single compute node of the Lion-XG cluster at Penn State. A compute node of Lion-XG has two 8-core Intel Xeon E processors (SNB microarchitecture). Processor cores are clocked at 2.6 GHz and each server has 32 GB of main memory. The code is compiled using C/C++ compilers with -O3 optimization and OpenMP support. 23

32 Table 4.1: Directed graphs used to evaluate performance of our new triad census approaches. TC indicates total number of connected triads. n m Graph ( 10 6 ) d max TC Sources patentcite [35, 37] cage [36] soc-pokec [11, 35] soc-livejournal [8, 35] HV15R [36] flickr [35, 38] indochina [36, 39, 40] arabic [36, 39 41] it [36, 39, 40] twitter [35, 42] 4.2 Results and Performance Analysis Triad Census In Table 4.3, we report parallel performance of our three census variants on the dual socket, 8-core Intel Xeon (SNB) and the Xeon Phi (KNC). The results are obtained by using OpenMP dynamic scheduling with a chunk size of 10 on both SNB and KNC for all the graphs. We also experimented with static scheduling and a chunk sizes of 1 and 10, and dynamic scheduling with chunk sizes of 50 and 100. We found that the chunk-size-10 setting and dynamic scheduling gave the best results for the majority of the graphs, and so we selected these settings. We could not execute census counts for some graphs on KNC due to memory limitations. The first observation from these running time results is that our optimized AI and AM provide a significant improvement over the baseline. In case of SNB, AM is 2.37 faster than Baseline, and in case of KNC, AI is 1.77 faster than Baseline. Using the total connected triad count information from Table 4.1, we can compute a performance rate, of number of connected triads counted per second (TCPS). For 16-way threading on SNB, this value ranges from 867 million TCPS (for patentcite) 24

33 Table 4.2: Undirected graphs used to evaluate performance of our new triangle counting approaches. TC indicates total triangle count. n m Graph ( 10 6 ) d max TC Sources hugetrace [36] soc-pokec [11, 35] cage [36] soc-livejournal [8, 35] rgg_n_2_24_s [36] orkut [9, 35] HV15R [36] kron_g500-logn [36] twitter [35, 42] indochina [36, 39, 40] to 5665 billion TCPS (for it-2004). For KNC, performance ranges between 1604 (patentcite) to 3000 TCPS (HV15R). On both systems, this performance rate increases on increasing problem size. The next important observation is that Table 4.3: Triad census execution times (in seconds). SNB (16 cores) KNC (61 cores) Graph Base AI AM Base AI AM patentcite cage soc-pokec soc-livejournal HV15R flickr indochina arabic it-2004 > 1 hr twitter > 8 hr > 8 hr 7.25 hr 25

34 different algorithmic variants are faster on each system. On SNB, AM is faster for most of the instances, whereas AI is consistently faster on KNC. Further, there is a considerable gap between AI and AM performance on KNC. This can be attributed to the space structuring of AM variants. The regular merge-like intersection routine is a better fit to KNC than the random access-based AM routine. However, on SNB, the considerable last-level cache and the relatively lower number of threads means that the overhead of random writes and reads to bit vectors can be amortized by caching. We also collected some cache performance statistics for the AI and AM variants on SNB. With the twitter graph, a single compute node of Lion-XG shows that the L3 cache misses for AM is around 17% of the number of cache accesses, whereas for AI, it is around 25% of the cache accesses. For some of the smaller graphs, KNC is actually slightly faster than the dual socket SNB. We also observe a noticeable performance impact of using dynamic scheduling for some of the regular graphs (cage15, HV15R). We achieve better performance for these instances with larger dynamic chunk sizes or with static scheduling. flickr HV15R indochina 2004 soc LiveJournal1 soc pokec Normalized Relative Frequency 1e 01 1e 03 1e Triad # Figure 4.1: Triad census analysis of various graphs. Figure 4.1 provides an example illustration of the types of analytics that are possible with exact triad census. We compare the relative frequency of each 26

35 connected triad normalized to the total connected triad count. The Y axis is in a log scale and we note a range of relative frequencies that span six orders of magnitude. For the indochina-2004 web crawl, pattern 5 has the largest count, indicating presence of vertices with very high in-degree and low out-degree. Also, we observe that patterns 10 and 14 are highly underrepresented in indochina Such observations have applications in social media analysis where in the graph structure has direct implications on the growth and community formations in the network Triangle Counting We next report sequential and parallel performance of our implementations for triangle counting. Table 4.4 gives the performance of AI and AM on SNB. We also give the parallel performance achieved with Shun and Tangwongsan s (ST) orderedmerge code. We use OpenMP dynamic scheduling with a chunk size of 50 on SNB, and also experimented with chunk sizes of 10 and 100. The AM variant is fastest for 9 out of 10 graph instances. As problem size increases, the performance of AM relative to AI also increases. The serial time corresponds to running time of either the AI or AM variant (whichever is faster) without OpenMP pragmas. Hence, the parallel speedup reported in the table is absolute speedup. We notice that the speedup is better for larger graph instances. The performance of AI is comparable to ST for a few instances, but is better for most other instances like orkut and LiveJournall. We were unable to run ST on the largest instance (twitter). We also refer to the serial and parallel running times reported on various Intel platforms in [3, 19] and find that our approaches are comparable to, or faster than, the times reported in these papers. For example on KNC, our AI implementation is 179x faster than ST for the orkut graph. Performance rates achieved, in terms of triangles counted per second, are comparable to the census implementation. In Table 4.5, we give performance of triangle counting on KNC. These results were obtained with dynamic scheduling and a chunk size of 10. We again notice that AI outperforms AM, similar to census. ST is dominated by AI on all the graphs. There are a few instances when ST fails to execute. Also, for a majority of the graphs, AI on KNC is faster than AM on SNB, which is a notable result. The 27

36 Table 4.4: Triangle counting performance on Sandy Bridge node. SNB (16 cores) time (s) AM Par. Graph Serial ST AI AM Speedup hugetrace soc-pokec cage soc-livejournal rgg_n_2_24_s orkut HV15R kron_g500-logn twitter Fail indochina Table 4.5: Triangle counting performance on KNC. KNC (61 cores) time (s) AI Par. Graph Serial ST AI AM Speedup hugetrace soc-pokec cage soc-livejournal rgg_n_2_24_s Fail orkut HV15R Fail kron_g500-logn indochina Fail variation in normalized performance is not as high on KNC as it is on SNB. We also note very high absolute speedups, although single-threaded times are admitted very low due to high instruction and memory latencies. We observe better speedups for social networks and web crawls than for the structured sparse matrices, indicating that there is likely more room for tuning and improvement on these data sets. 28

37 Table 4.6: Serial approximate triangle counting performance on a single compute node of Lion-XG (SNB). Graph Time (s) Exact Approximate Speedup soc-pokec orkut indochina twitter friendster Approximate Triangle Counting We now report the performance of our approximate triangle counting implementation. Results of both serial and parallel implementations have been recorded. We compare the serial performances of the exact and approximate triangle algorithms and then look at the parallel execution times. Our implementation provides an average accuracy of 99.6% with proper tuning of δ and ɛ values to obtain k for a graph. We observed that in general, δ 0.1 and ɛ 0.1 produces results of high accuracy. Keeping in mind a requirement for high accuracy, we study the serial implementation results. From Table 4.6, we can see that the approximate serial algorithm is considerably faster than the exact counting algorithm. This is mainly because a much smaller number of vertices are considered for triangle counting. soc pokec has almost no speedup, as the serial time is too small to really improve upon. The major contributors to the execution time will dominate equally in the approximate and exact approaches. It is important to note that since k is agnostic of the size of the graph, it causes different speedups based on the graph structure. Once the graph size has crossed a certain threshold, the speedup seems to be pretty much constant, about 6. Another factor that contributes to the lower execution time of the approximate algorithm is that the approximate algorithm involves many constant time operations such as selecting vertices randomly. Apart from the loop structure concerned with computing the probability distribution, the size of the other loops are determined by the value of k (which is fixed even if the graph is very large) and the (larger) degree of vertex s adjacencies being examined for closed wedges. Since 29

38 Table 4.7: Parallel approximate triangle counting performance on a single compute node of Lion-XG (SNB). Graph Time (s) 1 core 2 cores 4 cores 8 cores 16 cores Speedup soc-pokec orkut indochina twitter friendster most real-world graphs share a characteristic skewed degree distribution with very few vertices of high degree, the probability of choosing a vertex which will lead to a large number of iterations is very low. Thus, in an amortized sense, most of the time consumption will be dependent on k, which is in the user s control. Table 4.7 shows good speedups after parallelization. We use OpenMP pragmas to parallelize three main steps: computing the probability distribution, computing k, and determining the number of closed wedges. For the first two steps, the main operation is choosing vertices at random. So, static parallelization suffices as work is well-balanced across cores. 4.3 Performance Scaling In Figures 4.2 and 4.3, we plot relative speedups achieved by each of the variants on two graphs, for counting and triad census, respectively. For counting, AI shows the best scaling on SNB and KNC. This behavior is due to the space-efficient characteristic of AI. ST s scaling is comparable to AM for soc-pokec, but is slightly lower for soc-livejournal1 on both the platforms. Figure 4.3 shows that triad census scaling is comparable for all three variants, which is as expected. Thus, the overall running time improvements for triad census are due to the algorithmic changes in our new variants in comparison to the baseline scheme. Load imbalance is also not significant for these graphs, probably due to dynamic scheduling and the choice of a reasonable chunk size. Figure 4.4 and Table 4.7 show the speedups achieved by our approximate triangle 30

39 AI AM ST Relative Speedup soc LiveJournal1 KNC soc LiveJournal1 SNB soc pokec KNC soc pokec SNB # of threads 10 5 Figure 4.2: Parallel scaling of triangle counting methods on SNB and KNC processors. AI AM Base Relative Speedup patentcite KNC patentcite SNB soc pokec KNC soc pokec SNB # of threads Figure 4.3: Parallel scaling of triad census methods on SNB and KNC processors. implementation over 16 cores. The scaling up is good up to 8 cores. 8 to 16 cores does not show much difference, probably because there was not enough work for 16 cores. For smaller graphs like soc-pokec, it stops showing speedup even at 8 cores as there are more overheads in assigning tasks to 8 cores than actual per-core execution. There are few overheads associated with synchronization, with only a few reductions done across cores while computing total number of closed wedges. 4.4 Impact of ordering on overall performance We next study the impact of ordering on parallel performance. We report performance of triad census and counting with various ordering schemes, and normalize them to performance with random ordering. We see that the performance of triad census variants does not vary that much, but the performance of counting variants is significantly affected. Triad census remains largely agnostic to vertex reordering due to many complex operations to be performed for every outer loop iteration. For indochina-2004, ordering makes a substantial difference, greater than an order 31

40 Figure 4.4: Parallel scaling of approximate triangle counting on a single node of Lion-XG (SNB). of magnitude in AM performance with SF ordering. Table 4.8: Performance Impact of ordering strategy (NAT: Natural, SF: smaller degree first, LF: larger degree first) for parallel triad census and parallel triangle counting on SNB (16 cores). Table values are performance improvements over random vertex ordering (higher values are better). Census AI Census AM Graph NAT LF SF NAT LF SF patentcite flickr indochina Counting AI Counting AM Graph NAT LF SF NAT LF SF soc-livejournal orkut indochina

Multicore Triangle Computations Without Tuning

Multicore Triangle Computations Without Tuning Multicore Triangle Computations Without Tuning Julian Shun and Kanat Tangwongsan Presentation is based on paper published in International Conference on Data Engineering (ICDE), 2015 1 2 Triangle Computations

More information

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania

More information

2. Definitions and notations. 3. Background and related work. 1. Introduction

2. Definitions and notations. 3. Background and related work. 1. Introduction Exploring Optimizations on Shared-memory Platforms for Parallel Triangle Counting Algorithms Ancy Sarah Tom, Narayanan Sundaram, Nesreen K. Ahmed, Shaden Smith, Stijn Eyerman, Midhunchandra Kodiyath, Ibrahim

More information

A subquadratic triad census algorithm for large sparse networks with small maximum degree

A subquadratic triad census algorithm for large sparse networks with small maximum degree Social Networks 23 (2001) 237 243 A subquadratic triad census algorithm for large sparse networks with small maximum degree Vladimir Batagelj, Andrej Mrvar Department of Mathematics, University of Ljubljana,

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Irregular Graph Algorithms on Parallel Processing Systems

Irregular Graph Algorithms on Parallel Processing Systems Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories

More information

Simple Parallel Biconnectivity Algorithms for Multicore Platforms

Simple Parallel Biconnectivity Algorithms for Multicore Platforms Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info

More information

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 3-A Graphs Graphs A directed graph (or digraph) G is a pair (V, E), where V is a finite set, and E is a binary relation on V The set V: Vertex set of G The set E: Edge set of

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

FASCIA. Fast Approximate Subgraph Counting and Enumeration. 2 Oct Scalable Computing Laboratory The Pennsylvania State University 1 / 28

FASCIA. Fast Approximate Subgraph Counting and Enumeration. 2 Oct Scalable Computing Laboratory The Pennsylvania State University 1 / 28 FASCIA Fast Approximate Subgraph Counting and Enumeration George M. Slota Kamesh Madduri Scalable Computing Laboratory The Pennsylvania State University 2 Oct. 2013 1 / 28 Overview Background Motivation

More information

Data Structures and Algorithms for Counting Problems on Graphs using GPU

Data Structures and Algorithms for Counting Problems on Graphs using GPU International Journal of Networking and Computing www.ijnc.org ISSN 85-839 (print) ISSN 85-847 (online) Volume 3, Number, pages 64 88, July 3 Data Structures and Algorithms for Counting Problems on Graphs

More information

Counting Triangles & The Curse of the Last Reducer. Siddharth Suri Sergei Vassilvitskii Yahoo! Research

Counting Triangles & The Curse of the Last Reducer. Siddharth Suri Sergei Vassilvitskii Yahoo! Research Counting Triangles & The Curse of the Last Reducer Siddharth Suri Yahoo! Research Why Count Triangles? 2 Why Count Triangles? Clustering Coefficient: Given an undirected graph G =(V,E) cc(v) = fraction

More information

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,

More information

The Pennsylvania State University The Graduate School Department of Computer Science and Engineering

The Pennsylvania State University The Graduate School Department of Computer Science and Engineering The Pennsylvania State University The Graduate School Department of Computer Science and Engineering CPU- AND GPU-BASED TRIANGULAR SURFACE MESH SIMPLIFICATION A Thesis in Computer Science and Engineering

More information

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Mosaic: Processing a Trillion-Edge Graph on a Single Machine Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper @ EuroSys

More information

Algorithms for Grid Graphs in the MapReduce Model

Algorithms for Grid Graphs in the MapReduce Model University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National

More information

A subquadratic triad census algorithm for large sparse networks with small maximum degree

A subquadratic triad census algorithm for large sparse networks with small maximum degree A subquadratic triad census algorithm for large sparse networks with small maximum degree Vladimir Batagelj and Andrej Mrvar University of Ljubljana Abstract In the paper a subquadratic (O(m), m is the

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

Algorithm Design (8) Graph Algorithms 1/2

Algorithm Design (8) Graph Algorithms 1/2 Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Basic Search Algorithms

Basic Search Algorithms Basic Search Algorithms Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract The complexities of various search algorithms are considered in terms of time, space, and cost

More information

Multicore Triangle Computations Without Tuning

Multicore Triangle Computations Without Tuning Multicore Triangle Computations Without Tuning Julian Shun, Kanat Tangwongsan 2 Computer Science Department, Carnegie Mellon University, USA 2 Computer Science Program, Mahidol University International

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms Analysis of Algorithms Unit 4 - Analysis of well known Algorithms 1 Analysis of well known Algorithms Brute Force Algorithms Greedy Algorithms Divide and Conquer Algorithms Decrease and Conquer Algorithms

More information

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4) S-72.2420/T-79.5203 Basic Concepts 1 S-72.2420/T-79.5203 Basic Concepts 3 Characterizing Graphs (1) Characterizing Graphs (3) Characterizing a class G by a condition P means proving the equivalence G G

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

Unit 4: Formal Verification

Unit 4: Formal Verification Course contents Unit 4: Formal Verification Logic synthesis basics Binary-decision diagram (BDD) Verification Logic optimization Technology mapping Readings Chapter 11 Unit 4 1 Logic Synthesis & Verification

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)

More information

On the Approximability of Modularity Clustering

On the Approximability of Modularity Clustering On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050/VT3 Part I Introduction Overview Graphs definitions & representations Minimal Spanning Tree (MST) Prim s algorithm Single Source

More information

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are

More information

Efficient Counting of Network Motifs

Efficient Counting of Network Motifs Efficient Counting of Network Motifs Dror Marcus School of Computer Science Tel-Aviv University, Israel Email: drormarc@post.tau.ac.il Yuval Shavitt School of Electrical Engineering Tel-Aviv University,

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Link Prediction in Graph Streams

Link Prediction in Graph Streams Peixiang Zhao, Charu C. Aggarwal, and Gewen He Florida State University IBM T J Watson Research Center Link Prediction in Graph Streams ICDE Conference, 2016 Graph Streams Graph Streams arise in a wide

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Locality. Christoph Koch. School of Computer & Communication Sciences, EPFL

Locality. Christoph Koch. School of Computer & Communication Sciences, EPFL Locality Christoph Koch School of Computer & Communication Sciences, EPFL Locality Front view of instructor 2 Locality Locality relates (software) systems with the physical world. Front view of instructor

More information

STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation

STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation David A. Bader Georgia Institute of Technolgy Adam Amos-Binks Carleton University, Canada Jonathan Berry Sandia

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

arxiv: v1 [cs.ds] 23 Jul 2014

arxiv: v1 [cs.ds] 23 Jul 2014 Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph Kunihiro Wasa 1, Hiroki Arimura 1, and Takeaki Uno 2 arxiv:1407.6140v1 [cs.ds] 23 Jul 2014 1 Hokkaido University, Graduate School of Information

More information

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 PuLP Complex Objective Partitioning of Small-World Networks Using Label Propagation George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania State

More information

Graph Algorithms. Definition

Graph Algorithms. Definition Graph Algorithms Many problems in CS can be modeled as graph problems. Algorithms for solving graph problems are fundamental to the field of algorithm design. Definition A graph G = (V, E) consists of

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Fast algorithms for max independent set

Fast algorithms for max independent set Fast algorithms for max independent set N. Bourgeois 1 B. Escoffier 1 V. Th. Paschos 1 J.M.M. van Rooij 2 1 LAMSADE, CNRS and Université Paris-Dauphine, France {bourgeois,escoffier,paschos}@lamsade.dauphine.fr

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 3 Data Structures Graphs Traversals Strongly connected components Sofya Raskhodnikova L3.1 Measuring Running Time Focus on scalability: parameterize the running time

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Solutions to Exam Data structures (X and NV)

Solutions to Exam Data structures (X and NV) Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Extremal Graph Theory: Turán s Theorem

Extremal Graph Theory: Turán s Theorem Bridgewater State University Virtual Commons - Bridgewater State University Honors Program Theses and Projects Undergraduate Honors Program 5-9-07 Extremal Graph Theory: Turán s Theorem Vincent Vascimini

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network

Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network Wing Ning Li, CSCE Dept. University of Arkansas, Fayetteville, AR 72701 wingning@uark.edu

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Simplicity is Beauty: Improved Upper Bounds for Vertex Cover

Simplicity is Beauty: Improved Upper Bounds for Vertex Cover Simplicity is Beauty: Improved Upper Bounds for Vertex Cover Jianer Chen, Iyad A. Kanj, and Ge Xia Department of Computer Science, Texas A&M University, College Station, TX 77843 email: {chen, gexia}@cs.tamu.edu

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

Symmetric Product Graphs

Symmetric Product Graphs Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-20-2015 Symmetric Product Graphs Evan Witz Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

Discrete mathematics , Fall Instructor: prof. János Pach

Discrete mathematics , Fall Instructor: prof. János Pach Discrete mathematics 2016-2017, Fall Instructor: prof. János Pach - covered material - Lecture 1. Counting problems To read: [Lov]: 1.2. Sets, 1.3. Number of subsets, 1.5. Sequences, 1.6. Permutations,

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Parallelization of Graph Isomorphism using OpenMP

Parallelization of Graph Isomorphism using OpenMP Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort

More information

Bandwidth Avoiding Stencil Computations

Bandwidth Avoiding Stencil Computations Bandwidth Avoiding Stencil Computations By Kaushik Datta, Sam Williams, Kathy Yelick, and Jim Demmel, and others Berkeley Benchmarking and Optimization Group UC Berkeley March 13, 2008 http://bebop.cs.berkeley.edu

More information

Application of the Computer Capacity to the Analysis of Processors Evolution. BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018

Application of the Computer Capacity to the Analysis of Processors Evolution. BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018 Application of the Computer Capacity to the Analysis of Processors Evolution BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018 arxiv:1705.07730v1 [cs.pf] 14 May 2017 Abstract The notion of computer capacity

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

CSI 604 Elementary Graph Algorithms

CSI 604 Elementary Graph Algorithms CSI 604 Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. (Second edition) 1 / 25 Graphs: Basic Definitions Undirected Graph G(V, E): V is set of nodes (or vertices) and E is the

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Planar Graphs with Many Perfect Matchings and Forests

Planar Graphs with Many Perfect Matchings and Forests Planar Graphs with Many Perfect Matchings and Forests Michael Biro Abstract We determine the number of perfect matchings and forests in a family T r,3 of triangulated prism graphs. These results show that

More information

Mapping Vector Codes to a Stream Processor (Imagine)

Mapping Vector Codes to a Stream Processor (Imagine) Mapping Vector Codes to a Stream Processor (Imagine) Mehdi Baradaran Tahoori and Paul Wang Lee {mtahoori,paulwlee}@stanford.edu Abstract: We examined some basic problems in mapping vector codes to stream

More information

Measurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling

Measurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling Measurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling Tamara G. Kolda plus Grey Ballard, Todd Plantenga, Ali Pinar, C. Seshadhri Workshop on Incomplete Network Data Sandia National

More information

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1 UNIT I INTRODUCTION CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1. Define Graph. A graph G = (V, E) consists

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information