An optimal algorithm for counting network motifs
|
|
- Terence Ellis
- 5 years ago
- Views:
Transcription
1 Physica A 381 (2007) An optimal algorithm for counting network motifs Royi Itzhack, Yelena Mogilevski, Yoram Louzoun Math Department, Bar Ilan University, Ramat-Gan, Israel Received 7 January 2007; received in revised form 14 February 2007 Available online 6 March 2007 Abstract Network motifs are small connected sub-graphs occurring at significantly higher frequencies in a given graph compared with random graphs of similar degree distribution. Recently, network motifs have attracted attention as a tool to study networks microscopic details. The commonly used algorithm for counting small-scale motifs is the one developed by Milo et al. This algorithm is extremely costly in CPU time and actually cannot work on large networks, consisting of more than 100,000 edges on current CPUs. We here present a new optimal algorithm, based on network decomposition for counting K-size network motifs with constant memory costs and a CPU cost linear with the number of counted motifs. Our algorithm performs better than previous full enumeration algorithms in terms of running time. Moreover, it uses a constant amount of memory. It also outperforms sampling algorithms. Our algorithm permits the counting of three and four motif for large networks that consists of more than 500,000 nodes and 5,000,000 links. For large networks, it performs more than a thousand times faster than current algorithms. r 2007 Elsevier B.V. All rights reserved. Keywords: Graph; Networks; Motif; Algorithm 1. Introduction Milo et al. [1] defined motifs as basic interaction patterns recurring throughout different kinds of networks more often than in random networks with the same degree distribution. In biological networks, a small set of network motifs appears to serve as the building blocks of transcription networks from bacteria to mammals [2]. Specific network motifs are also found in signal transduction networks, neuronal networks and other biological and non-biological networks [3]. The analysis of network motifs also plays a role in network classification [3] and the analysis of structural network properties. A large amount of work was devoted to the interpretation and application of network motifs, but much less effort was devoted to the development of good motif counting algorithms. In general, we can divide the requirements for a k-motif counting algorithm Corresponding author. Tel./fax: address: ylouzoun@gmail.com (Y. Louzoun) /$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi: /j.physa
2 in a given graph to three main elements [4]: R. Itzhack et al. / Physica A 381 (2007) (1) Counting all k subgraphs occurring in the graph. (2) Determination of which of these subgraphs are isomorphic, and count only once every isomorphic groups. (3) Comparison of the motif number with the expected number in a random graph with the same connectivity structure. Performing the first subtask (counting all connected K-size subgraphs) by explicitly enumerating all subgraphs of a certain size is extremely time consuming due to their potentially large number even in small, sparse networks. One attitude proposed to overcome the high CPU cost is motif sampling developed by at Kashtan et al. [5] or Wernicke [4]. Random sampling algorithms are efficient algorithms, which can successfully approximate the expected number of network motifs using a small number of samples. Such algorithms collect samples from the whole network by randomly picking an edge adjacent to the current edge until it completes a k-size subgraph, and processing only a small number of sampled subgraphs. Sampling methods improve the running time dramatically. However, such methods can only estimate the frequency of subgraphs and cannot provide an exact enumeration. In this paper, we refine the first and second tasks by optimizing the motif count to the level that every motif is counted once and only once, with practically no overhead. We here show the application of the algorithm for the measurement of three and four subgraph occurrences. The time complexity of our algorithm is low enough to measure directly k-motifs on any graph of up to millions of edges. It actually performs similarly or better than sampling algorithms. The last version of the motif counting algorithm provided by Pr Alon is denoted mfinder1.2 [6]. This algorithm initiates the subgraph searching, by choosing a random edge, and extending the edge iteratively from both ends until it gets a k-size subgraph [5]. The number of subgraphs increases approximately as the number of edges to the power of k, while the runtime increases much faster than that for subgraphs with kx3, especially for a large number of nodes. We here present a new approach for the exact counting of network motifs consuming a minimal running time. The approach is based on network decomposition [7], via node removal. We detect all motifs containing a given node by measuring all incoming and outgoing neighbors of degree k 1, and then remove this node. We present our algorithm results for k ¼ 3 and 4 and describe the k ¼ 5 algorithm. There is no point in developing motifs counting algorithms for k45, since there are 1,530,843 different k ¼ 6 subgraphs, making the results practically impossible to decipher and understand. However, if one is interested in a specific k-motif frequency for k45, the same algorithm can be enlarged to any k. The rest of this paper is organized as follows. We first present the definitions and motif counting algorithm. We then present some computational results on motifs extracted from random (Erdos Renyi, ER) and scale free networks to compare the efficacy of our algorithm with standard counting and sampling algorithms. 2. Results The k-motifs counting problem is defined as the task of enumerating all the connected patterns of subgraph G k G of size k. Ak-motif is represented by k k size connectivity matrix, and all the possible isomorphic matrices. For example, the A-B-C motif can be represented by a connectivity matrix between A, B and C, where all values are zero except for the (A,B) and (B,C) cells. Replacing the rows and columns of A by B and B by C would produce the same motif but a different matrix. Not all connectivity matrices are defined as motifs, since the subgraph has to be connected. Thus, each node in a k-size motif must contain an undirected walk between all nodes. In a given directed graph G(V,E), where V is the set of nodes and E is the set of edges, the time complexity of finding all the subgraphs G k G has an upper bound of O(E k ) [8] and a lower bound of O(V c k 1 ). We here present an optimal motif counting algorithm with an efficacy close to the lower bound, where each motif is counted precisely once, using a constant amount of memory (that does not depend on the network size). The algorithm is based on the decomposition of the original network through the systematic removal of nodes. We choose a random node. We count all motifs containing this node, using a memory
3 484 R. Itzhack et al. / Physica A 381 (2007) Fig. 1. Counting pattern of motifs of size 3, 4 and 5. Starting from left to right: the first count is for all the permutations of the second level. The next count is permutations of (k 2Þ nodes from the first level and 1 from the second level and so on. structure allowing the direct enumeration of all motifs containing a given node. Once all the motifs containing a given node are counted, the node is removed from the network, and the process is reiterated for the next node. The resulting time complexity for all size k subgraphs, given an average of total (incoming and outgoing) neighbors number of c is less than OðVc k 1 log k 2 ðcþþ, where c5voe. Scale free network differs significantly from ER networks. While in scale free networks, the systematic removal of the highest degree nodes decomposes the network after a small number of steps [7], iner networks, one must remove enough nodes to bring the network below the percolation level. Based on this decomposition principle, we build for each node a k-motif tree similar to a Breadth first Search (BFS) tree. We count in this tree each motif containing the node exactly once, and remove the node. In a graph GðV; EÞ, the k neighborhood V k i of node v i is defined as the subset of nodes within a distance of at most k 1 edges from or to v i. V k i is spanned by a k-motif tree T k i, the first level of T k i is v source ¼ T 1, T L represents all the nodes in the l-level of T k i. The T L level also consists of all the neighbors of each node t j 2 T L 1 that are not contained in the level L 1, The similarity of the k-motif spanning tree to the regular BFS tree is its breadth span of the local neighborhood. The uniqueness of the spanning tree T k i is that the breadth span is without direction and in each level, nodes can appear more than once, depending on their ancestors (Fig. 2). All motifs containing the node v i are connected k nodes sub-graphs of T k i and inversely all connected k nodes subgraphs of T k i are motifs. However, different subgraphs of T k i can be the same motif. In order to count each such motif only once, we have developed a counting pattern. The counting pattern is based on a systematic increase in the analyzed depth. We first count all subgraphs of depth 1 (i.e. v i and its (k 1) of its direct neighbors). We then count all motifs with (k 2) of level 1 and one node of level 2 and so on. In Fig. 1, we show the application of this principle for k ¼ 3 5. For example, in the 4-motif count, we first count v source with additional 3 nodes from level two, next (from left to right), we then count permutations of two nodes from level two and one of their sons from level three, the next count is all the possibilities of one node from level two and two of his sons from level three, the last pattern is one node from level two, his son from level three, and his grandson from level four. The exact recognition of the motif is also fast and performed with a CPU cost of O(1) and a constant memory usage, as will be further explained. Finally after passing on the motif tree spanned by v source, we remove v source and all edges connected to it from the network. The motif enumeration through the graph does not consume memory. 3. Four size motif algorithm count We now describe in details the k ¼ 4 motif counting algorithm, the algorithm is similar for all other k. The four-motif count algorithm is based on four counting patterns. The first count pattern is all v source and three of his sons (Fig. 1(4), left drawing). The second pattern is all the possibilities of v source with two of his sons (including t 2 from T 2 and one of his grandson that t 2 is his father (Fig. 1(4), middle left drawing)). The third pattern is v source with his son t 2 and two of t 2 s sons (Fig. 1(4), middle right drawing). The last pattern is v source his son t 2, t 2 s son t 3 and t 3 s son (Fig. 1(4), right drawing). Different counts can actually represent the same motif. For example, if t 2 s son t 3 is also his brother, we would count every motif containing v source, t 2, t 3 twice.
4 R. Itzhack et al. / Physica A 381 (2007) Fig. 2. Motif counting tree of a directed network. The leftmost diagram is the network itself. We span a tree from v source ¼ 1. The middle tree spans three motifs and the rightmost tree spans four motifs. Each level consists of either incoming or outgoing edges to one of the fathers at the level above. Nodes can appear more than once at a specific level with a specific root, but not at different levels of the same root. In order to count every motif only once, we move from left to right in the tree and before including a node in a count, we check that it does not appear as one of the ancestors brother. Note that in the k ¼ 5, we also avoid cousins and so on. In order to minimize the cost of the search, we first order all nodes. The neighbors of every nodes are ordered when the BFS tree is created. The cost of searching for uncles is thus only O(log(c)). The ordering of the graph is done only once with the marginal cost of O(E log(c)). We examplify the process on a simple tree (Fig. 2). We first insert the first ordered couple (1,2). We then insert the first ordered triplet (1,2,3) and only then count all the quadruplets in the first counting pattern (i.e. (1,2,3,4), (1,2,3,5), (1,2,3,6). Once done with the first counting pattern based on the (1,2,3) triplet, we move to the second counting pattern based on it, and count (1,2,3,7), (1,2,3,9) and (1,2,3,10). For each such count, we check if the last node that added to the quartet is not its own uncle or left cousin, the cost of each such checking is either log(c) or log 2 ðcþ. We then continue with the next triplet of (1,2), i.e. (1,2,4) and proceed in the same way. We obviously do not take into account at this stage any node to the left of 4. After the (1,2,6) triplet is done, we move to the quadruplets containing (1,2,7) (1,2,7,9), (1,2,7,10). Note again that we do not count (1,2,7,5), since 5 is his own uncle. If 10 was also a son of 1, we would not count (1,2,7,10) either. We then continue in a similar way with all triplets containing (1,2). At the end of the (1,2) pair count, we move to the (1,3) pair count and proceed in the same way with all parts of the tree to the right of 3. The time complexity of the algorithm is Oðc 3 log 2 ðcþþ for a single node. When all motifs involved in a node are counted, the node is removed from the network, with a cost of O(c). Note that E (the number of edges) and following it c decays as we decompose the network. The algorithm cost is thus bounded by OðVc 3 log 2 ðcþþ ¼ OðEc 2 log 2 ðcþþ. The double check that we have to make on all the pattern counts, except the first contribute the factor of log 2 ðcþ, but since c is usually very small it does not affect the CPU cost in significant way. The log k 2 ðcþ factor is the difference between the cost of our algorithm and the number of motifs, the code of the 4 count motif algorithm is demonstrated in Fig. 3, the colors represents special motif pattern count. 4. Motif recognition The above-mentioned algorithm only checks if a quadruplet represents a motif, but it does not check which motif it is. In order to relate a quadruplet to a 4 motif in a single operation, each quadruplet is represented by 4 4 size square matrix M. M ij is 1 if there is direct edge from node i to node j, otherwise it is 0. Since we only treat simple graphs, M ii is set to 0. The matrix is then represented by a k (k 1) bitstring with 2 k (k 1) possible values. Each such bitstring represents a single motif, although multiple bitstring can represent the same motif. There are 64, 4096 and 1,048,576 different possible bitsrings and 13,199,5946 motifs for k ¼ 3, 4 and 5, respectively. For k ¼ 6, there are over 10 9 possible bitstrings and over 10 6 motifs making the interpretation very cumbersome, we thus limit our code to k ¼ 5 motifs. Once a set of nodes representing a motif is counted, we add one to a motif counting array at the cell represented by the appropriate bitstring. The cost of this stage is k 2.
5 486 R. Itzhack et al. / Physica A 381 (2007) Fig. 3. Four motif count algorithm: we perform four count algorithm, the count patterns located according to the legend at bottom, first we count source node and all the permutations that assemble three from his neighbors, after we count source node with his sons and one of his grandsons, the third pattern is source node with one of his son and two of his grandsons, the count of all the possibilities of source node and one of his sons, one of his grandsons and great grandsons. The function count motif preformed as described in motif recognition part. At the end of the entire process, we sum all the array cells, whose bitstring represents isomorphic motif to get the full motif count. In order to find all the isomorphic patterns, we calculate for each motif pattern all the k! permutations of switches between rows and columns in the matrix, for each permutated matrix we remove the diagonal and update the number of the original motif at the place at the table that is equal to the value of the permutated bit string. 5. Running time comparisons We have used two data sets for the comparison of the running time of the mfinder exhaustive algorithm and our algorithm. The first data set is Erdos Reiny random networks sizing from 10 to 50,000 nodes, with varying average connectivities. We have varied the connectivity and kept the network size constant and the varied the network size, while maintaining the connectivity constant. The second data set is scale free networks with 50 50,000 nodes and a power of 2. The running time of the networks was compared for the 3 and 4 motif algorithms. The running time of the mfinder exhaustive search method is much slower than that of our algorithm either for 3 or 4 motif. For example, the mfinder run time for a network composed of 50,000 nodes and 400,000 edges was 6 days compared with 1.2 min in our algorithm (Figs. 4 and 5). The run time ratio between our algorithm and mfinder increases from 10 to 500, as we increase the network s size from 10 to
6 R. Itzhack et al. / Physica A 381 (2007) Node Number Nodes Number Fig. 4. Three motif counting running time comparison for the original network and randomized networks as a function of the network size in ER networks. The dotted blue line represents our algorithm for 100 networks, the dashed red line represents the FANMOD sampling algorithm for a 1000 networks and the full green line represents the mfinder1.2 algorithm for 100 networks. The networks had an average connectivity of eight neighbors per node. The inner figure shows the comparison in logarithmic scale. The growth rate of the running time is bigger in the sampling algorithm than in our algorithm. 50,000 nodes (with a constant connectivity). Similar results are obtained for the ER networks when increasing the connectivity from 5 to 50 with constant network size (1000 nodes) (Fig. 6) and for SF networks (Fig. 7). 6. Comparison to sampling algorithms The minimal CPU and memory cost of our algorithm allows us to enlarge the algorithm to higher values of k and to networks that were not considered possible before. This algorithm, while providing a full motif enumeration, actually outperforms sampling algorithms. We have compared our results with the FANMOD algorithm that samples 100,000 subgraphs of each network (which in cases of large networks is less than 0.01% of the total subgraph number in the network). In order to find the motifs significantly over(under) represented in our algorithm, we compare the frequency of each motif in the original network with its frequency in another 100 networks of similar degree distribution, as proposed by Kashtan et al. [5]. Note that FANMOD needs a larger comparison due to the limitations of the sampling method. For all the abovementioned networks, the running time of the FANMOD sampling algorithm was higher than that of our algorithm for both 3 and 4 motifs. For example, the running time for a network composed of 5000 nodes and 15,000 edges the running time of FANMOD (for the original network and 1000 networks of similar degree distribution) was 6 h compared to 12 min with our algorithm (with a 100 networks of similar degree distribution). Even assuming that both algorithms require the same amount of random networks, our algorithm would outperform FANMOD. Moreover, the running time ratio between the FANMOD algorithm and our algorithm increases (weakly) with network size and connectivity (Figs. 4 7). To summarize, not only is our analysis more accurate, it is much faster than sampling methods.
7 488 R. Itzhack et al. / Physica A 381 (2007) Node Number Node Number Fig. 5. Four motif counting running time comparison for the original network and randomized networks as a function of the network size in ER networks. x Node Number Nodes Number Fig. 6. Four motif running time comparison for scale free networks as a function of the network size. The dotted blue line represents our algorithm for 100 networks, the dashed red line represents the FANMOD sampling algorithm for 1000 networks and the full green line represents the mfinder1.2 algorithm for 100 networks. The inner figure shows the comparison in logarithmic scale. The results are similar to the ER results.
8 R. Itzhack et al. / Physica A 381 (2007) x Nodes Average Connectivity Nodes Average Connectivity Fig. 7. Four motif comparison for randomized networks as a function of the connectivity. The networks consists of 1000 nodes and the number of edges grows from 5000 to 50,000 edges. The dotted blue line represents our algorithm for 100 networks, the dashed red line represents the FANMOD sampling algorithm for a 1000 networks and the full green line represents the mfinder1.2 algorithm for 100 networks. The inner figure shows the comparison in logarithmic scale. 7. Discussion Motif counting is now a common practice in the analysis of networks [9 15]. This practice was limited up to now to small networks, since the standard algorithms are extremely slow, and memory costly for large networks. In order to approximate the number of motifs in large networks, sampling methods were developed [4,5]. We here present an optimal algorithm to count k-size motifs that is fast enough to allow motif counting in networks containing millions of edges in a reasonable time, and a constant amount of memory. The efficacy of our algorithm makes it often even more efficient than sampling methods, allowing a precise count where only approximations were possible before. The cost difference between our algorithm and the standard counting algorithm grows sharply with motif size, connectivity and node number. For example in a 50,000 nodes and 400,000 edges ER network, our algorithm is over 2000 times faster than the standard algorithm (5 days compared with 3.5 min). For larger networks, we could not compare the performances, since the standard algorithm cannot handle their motif count. The main principle of our algorithm is the counting of all motifs passing through a node and the removal of this node. The systematic removal of nodes eventually leads to the network decomposition. We further improve the efficiency of the algorithm by avoiding subgraph redundancy among the neighbors of a given node. Network decomposition can be done in SF networks by the removal of a small number of high degree nodes (hubs). However, the cost of computing all motifs passing through these nodes is very large and the order at which nodes are removed from the network has a minor effect on the algorithm efficacy (less than a factor of 2). We have also optimized the network scrambling required to compare the network to random networks with similar one-directional and bidirectional edge distributions, but this did not affect the running time, since the main cost is the motif search and not scrambling. Finally, our fast algorithm opens the way for the exact analysis of large networks, such as the WWW, without the requirement to depend on estimations of sampling methods.
9 490 R. Itzhack et al. / Physica A 381 (2007) References [1] R. Milo, S. Shen-Orr, S. Itzkovitz, et al., Science 298 (5594) (2002) 824. [2] S.S. Shen-Orr, R. Milo, S. Mangan, et al., Nat. Genet. 31 (1) (2002) 64. [3] R. Milo, S. Itzkovitz, N. Kashtan, et al., Science 303 (5663) (2004) [4] S. Wernicke, F. Rasche, Bioinformatics 22 (9) (2006) [5] N. Kashtan, S. Itzkovitz, R. Milo, U. Alon, Bioinformatics 20 (11) (2004) [6] Mfinder1.2 software. / [7] R. Albert, H. Jeong, A.L. Barabasi, Nature 406 (6794) (2000) 378. [8] N. Alon, R. Yuster, U. Zwick, Algorithmica 17 (3) (1997) 209. [9] M. Babu, N.M. Luscombe, L. Aravind, et al., Curr. Opin. Struct. Biol. 14 (3) (2004) 283. [10] D. Li, J. Li, S. Ouyang, et al., Proteomics 6 (2) (2006) 456. [11] Y. Louzoun, L. Muchnik, S. Solomon, Bioinformatics 22 (5) (2006) 581. [12] H.W. Ma, B. Kumar, U. Ditges, et al., Nucleic Acids Res. 32 (22) (2004) [13] R.J. Prill, P.A. Iglesias, A. Levchenko, PLoS Biol. 3 (11) (2005) e343. [14] R.V. Sole, S. Valverde, Trends Ecol. Evol. 21 (8) (2006) 419. [15] O. Sporns, J.D. Zwi, Neuroinformatics 2 (2) (2004) 145.
Efficient Counting of Network Motifs
Efficient Counting of Network Motifs Dror Marcus School of Computer Science Tel-Aviv University, Israel Email: drormarc@post.tau.ac.il Yuval Shavitt School of Electrical Engineering Tel-Aviv University,
More informationIncoming, Outgoing Degree and Importance Analysis of Network Motifs
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.758
More informationLocality-sensitive hashing and biological network alignment
Locality-sensitive hashing and biological network alignment Laura LeGault - University of Wisconsin, Madison 12 May 2008 Abstract Large biological networks contain much information about the functionality
More informationLesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008
Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial
More informationStructure of biological networks. Presentation by Atanas Kamburov
Structure of biological networks Presentation by Atanas Kamburov Seminar Gute Ideen in der theoretischen Biologie / Systembiologie 08.05.2007 Overview Motivation Definitions Large-scale properties of cellular
More informationResponse Network Emerging from Simple Perturbation
Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced
More informationCSCI5070 Advanced Topics in Social Computing
CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Graphs Origins Definition Spectral Properties Type of
More informationGraph and Digraph Glossary
1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose
More informationGraphs. The ultimate data structure. graphs 1
Graphs The ultimate data structure graphs 1 Definition of graph Non-linear data structure consisting of nodes & links between them (like trees in this sense) Unlike trees, graph nodes may be completely
More informationGraph similarity. Laura Zager and George Verghese EECS, MIT. March 2005
Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationPart 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm
In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.
More information(b) Linking and dynamic graph t=
1 (a) (b) (c) 2 2 2 1 1 1 6 3 4 5 6 3 4 5 6 3 4 5 7 7 7 Supplementary Figure 1: Controlling a directed tree of seven nodes. To control the whole network we need at least 3 driver nodes, which can be either
More informationAdvanced Algorithms and Models for Computational Biology -- a machine learning approach
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Biological Networks & Network Evolution Eric Xing Lecture 22, April 10, 2006 Reading: Molecular Networks Interaction
More informationComplementary Graph Coloring
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,
More informationIdentifying Layout Classes for Mathematical Symbols Using Layout Context
Rochester Institute of Technology RIT Scholar Works Articles 2009 Identifying Layout Classes for Mathematical Symbols Using Layout Context Ling Ouyang Rochester Institute of Technology Richard Zanibbi
More informationLecture 10. Elementary Graph Algorithm Minimum Spanning Trees
Lecture 10. Elementary Graph Algorithm Minimum Spanning Trees T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo
More informationMemory As an Organizer of Dynamic Modules In a Network of Potential Interactions
Memory As an Organizer of Dynamic Modules In a Network of Potential Interactions Mesut Yucel 1 and Uri Hershberg 2* Abstract Networks are amongst the most common tools to describe interactions in a spatial
More informationOur Graphs Become Larger
Our Graphs Become Larger Simple algorithms do not scale O(n k ) for size k graphlets Two approaches: Find clever algorithms for counting small graphlets Approximate count for larger graphlets Graph Induced
More informationDecreasing the Diameter of Bounded Degree Graphs
Decreasing the Diameter of Bounded Degree Graphs Noga Alon András Gyárfás Miklós Ruszinkó February, 00 To the memory of Paul Erdős Abstract Let f d (G) denote the minimum number of edges that have to be
More informationIntroduction to Graph Theory
Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex
More informationDiffusion Wavelets for Natural Image Analysis
Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................
More informationOn the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments
Electronic Journal of Graph Theory and Applications 3 (2) (2015), 191 196 On the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments Rafael Del Valle
More informationGRAPH THEORY and APPLICATIONS. Factorization Domination Indepence Clique
GRAPH THEORY and APPLICATIONS Factorization Domination Indepence Clique Factorization Factor A factor of a graph G is a spanning subgraph of G, not necessarily connected. G is the sum of factors G i, if:
More informationComparison of Centralities for Biological Networks
Comparison of Centralities for Biological Networks Dirk Koschützki and Falk Schreiber Bioinformatics Center Gatersleben-Halle Institute of Plant Genetics and Crop Plant Research Corrensstraße 3 06466 Gatersleben,
More informationFinding and counting small induced subgraphs efficiently
Information Processing Letters 74 (2000) 115 121 Finding and counting small induced subgraphs efficiently Ton Kloks a,, Dieter Kratsch b,1, Haiko Müller b,2 a Department of Mathematics and Computer Science,
More informationSolutions to Exam Data structures (X and NV)
Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it
More informationL10 Graphs. Alice E. Fischer. April Alice E. Fischer L10 Graphs... 1/37 April / 37
L10 Graphs lice. Fischer pril 2016 lice. Fischer L10 Graphs... 1/37 pril 2016 1 / 37 Outline 1 Graphs efinition Graph pplications Graph Representations 2 Graph Implementation 3 Graph lgorithms Sorting
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationLocal Algorithms for Sparse Spanning Graphs
Local Algorithms for Sparse Spanning Graphs Reut Levi Dana Ron Ronitt Rubinfeld Intro slides based on a talk given by Reut Levi Minimum Spanning Graph (Spanning Tree) Local Access to a Minimum Spanning
More informationMotif mining based on network space compression
Zhang and Xu BioData Mining 2014, 7:29 BioData Mining METHODOLOGY Motif mining based on network space compression Qiang Zhang * and Yuan Xu Open Access * Correspondence: zhangq30@yahoo.com Key Laboratory
More informationBacktracking and Branch-and-Bound
Backtracking and Branch-and-Bound Usually for problems with high complexity Exhaustive Search is too time consuming Cut down on some search using special methods Idea: Construct partial solutions and extend
More informationChapter 11: Graphs and Trees. March 23, 2008
Chapter 11: Graphs and Trees March 23, 2008 Outline 1 11.1 Graphs: An Introduction 2 11.2 Paths and Circuits 3 11.3 Matrix Representations of Graphs 4 11.5 Trees Graphs: Basic Definitions Informally, a
More informationSample Solutions to Homework #4
National Taiwan University Handout #25 Department of Electrical Engineering January 02, 207 Algorithms, Fall 206 TA: Zhi-Wen Lin and Yen-Chun Liu Sample Solutions to Homework #4. (0) (a) Both of the answers
More informationGRAPHS (Undirected) Graph: Set of objects with pairwise connections. Why study graph algorithms?
GRAPHS (Undirected) Graph: Set of objects with pairwise connections. Why study graph algorithms? Interesting and broadly useful abstraction. Challenging branch of computer science and discrete math. Hundreds
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationCS6200 Information Retreival. The WebGraph. July 13, 2015
CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationCME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17
CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 03/02/17 1. Consider a model of a nonbipartite undirected graph in which
More informationHi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we
Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we have to talk about the way in which we represent the
More informationThe Simplex Algorithm
The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly
More informationSolution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.
Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree. The basic approach is the same as when using a suffix tree,
More informationGeneralized Network Flow Programming
Appendix C Page Generalized Network Flow Programming This chapter adapts the bounded variable primal simplex method to the generalized minimum cost flow problem. Generalized networks are far more useful
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationProperties of Biological Networks
Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási
More informationRepresentations of Graphs
ELEMENTARY GRAPH ALGORITHMS -- CS-5321 Presentation -- I am Nishit Kapadia Representations of Graphs There are two standard ways: A collection of adjacency lists - they provide a compact way to represent
More informationGraphs II: Trailblazing
Graphs II: Trailblazing Paths In an undirected graph, a path of length n from u to v, where n is a positive integer, is a sequence of edges e 1,, e n of the graph such that f(e 1 )={x 0,x 1 }, f(e 2 )={x
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationApproximating the number of Network Motifs
Approximating the number of Network Motifs Mira Gonen 1 and Yuval Shavitt 2 1 Bar Ilan University, Ramat Gan, Israel 2 Tel-Aviv University, Ramat Aviv, Israel Abstract. World Wide Web, the Internet, coupled
More informationCounting the Number of Isosceles Triangles in Rectangular Regular Grids
Forum Geometricorum Volume 17 (017) 31 39. FORUM GEOM ISSN 1534-1178 Counting the Number of Isosceles Triangles in Rectangular Regular Grids Chai Wah Wu Abstract. In general graph theory, the only relationship
More informationTHE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS
THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman
More informationMatching and Planarity
Matching and Planarity Po-Shen Loh June 010 1 Warm-up 1. (Bondy 1.5.9.) There are n points in the plane such that every pair of points has distance 1. Show that there are at most n (unordered) pairs of
More informationProblem Set 1. Solution. CS4234: Optimization Algorithms. Solution Sketches
CS4234: Optimization Algorithms Sketches Problem Set 1 S-1. You are given a graph G = (V, E) with n nodes and m edges. (Perhaps the graph represents a telephone network.) Each edge is colored either blue
More informationGraphBLAS Mathematics - Provisional Release 1.0 -
GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,
More informationInterleaving Schemes on Circulant Graphs with Two Offsets
Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical
More informationLecture 5: Multiple sequence alignment
Lecture 5: Multiple sequence alignment Introduction to Computational Biology Teresa Przytycka, PhD (with some additions by Martin Vingron) Why do we need multiple sequence alignment Pairwise sequence alignment
More informationDiscrete mathematics II. - Graphs
Emil Vatai April 25, 2018 Basic definitions Definition of an undirected graph Definition (Undirected graph) An undirected graph or (just) a graph is a triplet G = (ϕ, E, V ), where V is the set of vertices,
More informationCS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam
CS 441 Discrete Mathematics for CS Lecture 26 Graphs Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Final exam Saturday, April 26, 2014 at 10:00-11:50am The same classroom as lectures The exam
More informationSpace Filling Curves and Hierarchical Basis. Klaus Speer
Space Filling Curves and Hierarchical Basis Klaus Speer Abstract Real world phenomena can be best described using differential equations. After linearisation we have to deal with huge linear systems of
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationOn the Relationships between Zero Forcing Numbers and Certain Graph Coverings
On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,
More informationMCL. (and other clustering algorithms) 858L
MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted
More informationTreaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19
CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types
More informationStructured System Theory
Appendix C Structured System Theory Linear systems are often studied from an algebraic perspective, based on the rank of certain matrices. While such tests are easy to derive from the mathematical model,
More informationLecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions
U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front
More informationThe Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs
The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent
More informationGoals! CSE 417: Algorithms and Computational Complexity!
Goals! CSE : Algorithms and Computational Complexity! Graphs: defns, examples, utility, terminology! Representation: input, internal! Traversal: Breadth- & Depth-first search! Three Algorithms:!!Connected
More informationAnalyzing the Peeling Decoder
Analyzing the Peeling Decoder Supplemental Material for Advanced Channel Coding Henry D. Pfister January 5th, 01 1 Introduction The simplest example of iterative decoding is the peeling decoder introduced
More informationAssignment 4 Solutions of graph problems
Assignment 4 Solutions of graph problems 1. Let us assume that G is not a cycle. Consider the maximal path in the graph. Let the end points of the path be denoted as v 1, v k respectively. If either of
More informationGraphs: basic concepts and algorithms
: basic concepts and algorithms Topics covered by this lecture: - Reminder Trees Trees (in-order,post-order,pre-order) s (BFS, DFS) Denitions: Reminder Directed graph (digraph): G = (V, E), V - vertex
More information2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,
More informationAtCoder World Tour Finals 2019
AtCoder World Tour Finals 201 writer: rng 58 February 21st, 2018 A: Magic Suppose that the magician moved the treasure in the order y 1 y 2 y K+1. Here y i y i+1 for each i because it doesn t make sense
More informationNetwork Motif & Triad Significance Profile Analyses On Software System
Network Motif & Triad Significance Profile Analyses On Software System Zhang Lin, Qian GuanQun,Zhang Li School of Compute Science and Engineering Beihang University 37# Xueyuan Rd, Beijing CHINA zhanglin@cse.buaa.edu.cn,qianguanqun@cse.buaa.edu.cn,lily@buaa.edu.cn
More informationCS/COE 1501 cs.pitt.edu/~bill/1501/ Graphs
CS/COE 1501 cs.pitt.edu/~bill/1501/ Graphs 5 3 2 4 1 0 2 Graphs A graph G = (V, E) Where V is a set of vertices E is a set of edges connecting vertex pairs Example: V = {0, 1, 2, 3, 4, 5} E = {(0, 1),
More informationD-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview
Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,
More informationCME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15
CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 02/26/15 1. Consider a model of a nonbipartite undirected graph in which
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationCombinatorics Summary Sheet for Exam 1 Material 2019
Combinatorics Summary Sheet for Exam 1 Material 2019 1 Graphs Graph An ordered three-tuple (V, E, F ) where V is a set representing the vertices, E is a set representing the edges, and F is a function
More informationOutline. Graphs. Divide and Conquer.
GRAPHS COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges books. Outline Graphs.
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More informationAn Introduction to Graph Theory
An Introduction to Graph Theory CIS008-2 Logic and Foundations of Mathematics David Goodwin david.goodwin@perisic.com 12:00, Friday 17 th February 2012 Outline 1 Graphs 2 Paths and cycles 3 Graphs and
More informationA Partition Method for Graph Isomorphism
Available online at www.sciencedirect.com Physics Procedia ( ) 6 68 International Conference on Solid State Devices and Materials Science A Partition Method for Graph Isomorphism Lijun Tian, Chaoqun Liu
More information2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.
1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:
More informationHYPERSPECTRAL IMAGE COMPRESSION
HYPERSPECTRAL IMAGE COMPRESSION Paper implementation of Satellite Hyperspectral Imagery Compression Algorithm Based on Adaptive Band Regrouping by Zheng Zhou, Yihua Tan and Jian Liu Syed Ahsan Ishtiaque
More informationTrees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.
Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial
More informationScanning Real World Objects without Worries 3D Reconstruction
Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects
More informationHEURISTICS optimization algorithms like 2-opt or 3-opt
Parallel 2-Opt Local Search on GPU Wen-Bao Qiao, Jean-Charles Créput Abstract To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with
More informationFrom Centrality to Temporary Fame: Dynamic Centrality in Complex Networks
From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks Dan Braha 1, 2 and Yaneer Bar-Yam 2 1 University of Massachusetts Dartmouth, MA 02747, USA 2 New England Complex Systems Institute
More informationMotif-based Classification in Journal Citation Networks
J. Software Engineering & Applications, 2008, 1: 53-59 Published Online December 2008 in SciRes (www.scirp.org/journal/jsea) Motif-based Classification in Journal Citation Networks Wenchen Wu 1, Yanni
More information1 Motivation for Improving Matrix Multiplication
CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n
More informationCOMPSCI 311: Introduction to Algorithms First Midterm Exam, October 3, 2018
COMPSCI 311: Introduction to Algorithms First Midterm Exam, October 3, 2018 Name: ID: Answer the questions directly on the exam pages. Show all your work for each question. More detail including comments
More informationNeMo: Fast Count of Network Motifs
NeMo: Fast Count of Network Motifs Michel KOSKAS 1, Gilles GRASSEAU 2, Etienne BIRMELÉ 2, Sophie SCHBATH 3 and Stéphane ROBIN 1 1 UMR 518 AgroParisTech / INRA, 16 rue C. Bernard, 75005, Paris, France {michel.koskas,stephane.robin}@agroparistech.fr
More informationIntroduction to Algorithms
6.006- Introduction to Algorithms Lecture 13 Prof. Constantinos Daskalakis CLRS 22.4-22.5 Graphs G=(V,E) V a set of vertices Usually number denoted by n E VxV a set of edges (pairs of vertices) Usually
More informationEfficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid
Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of
More informationCHAPTER 2. Graphs. 1. Introduction to Graphs and Graph Isomorphism
CHAPTER 2 Graphs 1. Introduction to Graphs and Graph Isomorphism 1.1. The Graph Menagerie. Definition 1.1.1. A simple graph G = (V, E) consists of a set V of vertices and a set E of edges, represented
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationReview: Graph Theory and Representation
Review: Graph Theory and Representation Graph Algorithms Graphs and Theorems about Graphs Graph implementation Graph Algorithms Shortest paths Minimum spanning tree What can graphs model? Cost of wiring
More informationInheritance Metrics: What do they Measure?
Inheritance Metrics: What do they Measure? G. Sri Krishna and Rushikesh K. Joshi Department of Computer Science and Engineering Indian Institute of Technology Bombay Mumbai, 400 076, India Email:{srikrishna,rkj}@cse.iitb.ac.in
More informationHere, we present efficient algorithms vertex- and edge-coloring graphs of the
SIAM J. ALG. DISC. METH. Vol. 7, No. 1, January 1986 (C) 1986 Society for Industrial and Applied Mathematics 016 EFFICIENT VERTEX- AND EDGE-COLORING OF OUTERPLANAR GRAPHS* ANDRZEJ PROSKUROWSKIf AND MACIEJ
More information