Supporting Information

Similar documents
arxiv: v1 [physics.soc-ph] 22 Oct 2018

Structure of biological networks. Presentation by Atanas Kamburov

Overlay (and P2P) Networks

Random Generation of the Social Network with Several Communities

M.E.J. Newman: Models of the Small World

Using graph theoretic measures to predict the performance of associative memory models

From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks

CAIM: Cerca i Anàlisi d Informació Massiva

Specific Communication Network Measure Distribution Estimation

Summary: What We Have Learned So Far

Complex Networks. Structure and Dynamics

SD 372 Pattern Recognition

Higher order clustering coecients in Barabasi Albert networks

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

MODELS FOR EVOLUTION AND JOINING OF SMALL WORLD NETWORKS

CS-E5740. Complex Networks. Scale-free networks

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

CSCI5070 Advanced Topics in Social Computing

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

EFFECT OF VARYING THE DELAY DISTRIBUTION IN DIFFERENT CLASSES OF NETWORKS: RANDOM, SCALE-FREE, AND SMALL-WORLD. A Thesis BUM SOON JANG

Incoming, Outgoing Degree and Importance Analysis of Network Motifs

Critical Phenomena in Complex Networks

Erdős-Rényi Model for network formation

Attack vulnerability of complex networks

Learning a Manifold as an Atlas Supplementary Material

Convex skeletons of complex networks

arxiv:cond-mat/ v3 [cond-mat.dis-nn] 30 Jun 2005

Properties of Biological Networks

Representation of 2D objects with a topology preserving network

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems

Free head motion eye gaze tracking using a single camera and multiple light sources

Unsupervised learning in Vision

Research on Community Structure in Bus Transport Networks

Memory As an Organizer of Dynamic Modules In a Network of Potential Interactions

The role of Fisher information in primary data space for neighbourhood mapping

Applying Social Network Analysis to the Information in CVS Repositories

Exercise set #2 (29 pts)

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Package fastnet. September 11, 2018

A BRIEF SURVEY ON FUZZY SET INTERPOLATION METHODS

Package fastnet. February 12, 2018

Supplementary Figure 1. Decoding results broken down for different ROIs

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

Locality-sensitive hashing and biological network alignment

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

An Adaptive Eigenshape Model

Response Network Emerging from Simple Perturbation

Supplementary Information for: Global similarity and local divergence in human and mouse gene co-expression networks

arxiv:cond-mat/ v1 21 Oct 1999

Motif Iteration Model for Network Representation

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

Role of modular and hierarchical structure in making networks dynamically stable

Application of Characteristic Function Method in Target Detection

Topic II: Graph Mining

Network Traffic Measurements and Analysis

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Characteristics of Preferentially Attached Network Grown from. Small World

Patch-based Object Recognition. Basic Idea

Small World Graph Clustering

Clustering Part 4 DBSCAN

Exploiting the Scale-free Structure of the WWW

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

arxiv:cond-mat/ v5 [cond-mat.dis-nn] 16 Aug 2006

Dynamic Aggregation to Support Pattern Discovery: A case study with web logs

An Empirical Analysis of Communities in Real-World Networks

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Artificial Neural Networks Unsupervised learning: SOM

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

FACE RECOGNITION USING INDEPENDENT COMPONENT

SI Networks: Theory and Application, Fall 2008

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

10-701/15-781, Fall 2006, Final

Schedule for Rest of Semester

ECE 158A - Data Networks

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

An Evolving Network Model With Local-World Structure

Introduction to network metrics

Exploring Curve Fitting for Fingers in Egocentric Images

Simplicial Complexes of Networks and Their Statistical Properties

Specific Communication Network Measure Distribution Estimation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Second-Order Assortative Mixing in Social Networks

Onset of traffic congestion in complex networks

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 3 Aug 2000

University of Florida CISE department Gator Engineering. Clustering Part 4

Master s Thesis. Title. Supervisor Professor Masayuki Murata. Author Yinan Liu. February 12th, 2016

RANDOM-REAL NETWORKS

Universal Behavior of Load Distribution in Scale-free Networks

Clustering and Visualisation of Data

Smarter Balanced Vocabulary (from the SBAC test/item specifications)

Collaboration networks and innovation in Canada s ICT Hardware Cluster. Catherine Beaudry and Melik Bouhadra Polytechnique Montréal

CS6716 Pattern Recognition

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering

Road Sign Visualization with Principal Component Analysis and Emergent Self-Organizing Map

Math 7 Glossary Terms

Transcription:

Supporting Information C. Echtermeyer 1, L. da F. Costa 2, F. A. Rodrigues 3, M. Kaiser *1,4,5 November 25, 21 1 School of Computing Science, Claremont Tower, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK 2 Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, PO Box 369, 135-97 São Carlos, São Paulo, Brazil 3 Departamento de Matemática Aplicada e Estatística, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, PO Box 668, 135-97 São Carlos, São Paulo, Brazil 4 Institute of Neuroscience, The Medical School, Framlington Place, Newcastle University, Newcastle-upon-Tyne NE2 4HH, UK 5 Department of Brain and Cognitive Sciences, Seoul National University, Seoul 151-746, Korea * Corresponding author. Electronic address: m.kaiser@newcastle.ac.uk Contents 1 Supplementary Figures 2 2 Notes on the Software-Implementation 6 2.1 Kernel-Bandwidth.................................... 6 2.2 Number of Singular Nodes w.............................. 6 2.3 Number of Motif Groups k............................... 6 3 Run-Time Complexity 6 4 A Small-World Emerging - Detailed Result Discussion 7 List of Figures S1 Run-time complexity of local network measures.................... 2 S2 Re-wiring process generating a small-world network.................. 2 S3 Graphical user interface for the BtA-workflow..................... 3 S4 Contour plots of PDFs estimated using two different Gaussian kernels....... 4 S5 Density functions of 2D-Gaussian with different correlation............. 5 1

1 Supplementary Figures 2.5 ER WS BA α 2 1.5 1% 25% edge-density 5% 1 r cv cc loc cc 2 K Figure S1: Run-time complexity O(n α ) of the used local measures (r, cv, cc, loc, cc 2, K) increases polynomially in network size n (average value for networks). Growth determining exponent α depends on edge-density (1%, 25%, 5%) and to lesser extent on network model (ER, WS, BA). 2 2 5 2 2 8 8 8 8 12 12 12 12 1 1 1 1 1 1 1 1 18 18 18 18 5 15 nz = 1 5 15 nz = 1 5 15 nz = 1 5 15 nz = 1 2 3 2 2 5 2 8 8 8 8 12 12 12 12 1 1 1 1 1 1 1 1 18 18 18 18 5 15 nz = 1 5 15 nz = 1199 5 15 nz = 1199 5 15 nz = 1199 Figure S2: Adjacency matrix and belonging network during rewiring process as described by Watts and Strogatz [14] (number of steps in upper right corner; inset: network representation with nodes arranged on a circle). Beginning with a perfectly regular ring lattice ( nodes) where each node is linked to its 6 closest neighbours (upper left), nodes are visited successively (one per step) and connections are randomly rewired with a probability of %. On the k th visit to a node, it is the link to the k th neighbour on the right, which is potentially rewired. After steps (lower right) every node has been visited three times and on average % of all links have changed. 2

a b c sorted node probabilities mean mean - SD relative differences d e f Figure S3: Graphical user interface for the BtA-workflow: a Nodes mapped to PCA-plane where their probability is coded by colour. b Sorted node probabilities and relative differences. Red and green colour indicates singular and regular nodes, respectively. Mean probability indicated by black line; blue line marks mean minus one standard deviation. Stems (cyan) indicate relative differences between their two adjacent probabilities. c Manual workflow-parameter control and options for result export. By default, changed settings show immediate effect in all plots (a,b,d f). d Contour plot of PDF with reduced feature vectors superimposed, whose colour indicates whether they are classified regular or singular. e PCA-plane (rescaled by standard deviations) showing differently coloured motif groups. f Bar plot showing the relative frequency for each motif-region. A brief characterisation of each motif is given above its bar. 3

a b PC1 PC2 PC2 PC1 Figure S4: Contour plots of PDFs estimated using two different Gaussian kernels (upper left inlet). a Identical kernel bandwidth in both dimensions. With this symmetric kernel, the estimated PDF shows broad spreading of probability mass along vertical axis. b Kernel bandwidths scaled according to standard deviation along corresponding PC-axis. This adopted kernel results in a thinner and better matching PDF. 4

a ρ =. b ρ =.6 c ρ =.9 Figure S5: Density functions of 2D-Gaussian (σ x = σ y = 1) with different correlation (ρ =.,.6,.9). a Uniformity of probability mass distribution around centre (µ x = µ y = ) without correlation. b Gradually increasing correlation leads to tilting and c concentration of probability mass along the diagonal. 5

2 Notes on the Software-Implementation Our implementation of the workflow including the automatic parameter determination is publicly available (http://www.biological-networks.org/). Below we briefly mention the workflowalternatives of the software. 2.1 Kernel-Bandwidth By default, the kernel-bandwidth is scaled according to the standard deviation along each principal component (PC) axis. Variability-based re-shaping of the kernel function improves the overall fit of the PDF to the points (Fig. S4). The kernel can also be made symmetric by deactivating the tick-box below the panel shown in Fig. S3a. 2.2 Number of Singular Nodes w By default, our implementation of the workflow chooses the number of singular nodes w according to equation (1). This setting can be overwritten by the user, who is provided with a plot of all nodes probabilities together with their relative differences (Fig. S3b). Manually chosen values for w can thereby be easily related to the default setting. 2.3 Number of Motif Groups k Our implementation of the workflow provides 3 alternatives to determine k: By default motifgroups are determined deterministically through cliques of overlapping ellipses, as illustrated in Fig. 4. The user can also choose to determine the number of clusters using the ellipses, but perform clustering with k-means++. As the last alternative, k-means++ can be applied with a customised number of motif-groups. 3 Run-Time Complexity The bulk of the runtime of the BtA-workflow is spent on step 1 where all selected local network measures are computed. Run-time complexity here depends on the measures that are chosen to characterise each node. We estimated how computational costs scale for six common local measures [4]. Like Costa et al. [5], we selected the normalised average degree r, the coefficient of variation of the degrees of the immediate neighbours of a node cv, the clustering coefficient cc [7, 14], the locality index loc, the hierarchical clustering coefficient of level two cc 2 [3], and the normalised node degree K. These measures have been applied to random networks, which have been generated according to the Erdős-Rényi (ER) [6], Watts-Strogatz (WS) [14], and Barabási and Albert (BA) [1] model. A polynomial function was fitted (root mean square error) to the average run-times to determine their dependence on network size. Additional to the size of the network, its edge density might also affect run-time, which is why we repeated the process while varying sparseness 1. The results show relatively stable growth rates, irrespective of network model or connection density: Our naïve implementations of the six measures show run-time complexities ranging from linear to less than cubic (Fig. S1). Costs are thus comparatively cheap considering methods that identify specific connectivity patterns by counting occurrences of particular subgraphs (e.g. [2, 8 11, 13]); such motif-counts also scale at least linearly in network size, but they show exponentially growing costs as the size of the motif-pattern increases [8]. In practice this often means that counts can not be determined for patterns involving 1 nodes or more [12], which renders some domains computationally intractable for this approach. In these cases the BtA-methodology might still be applicable: Local networks measures that only scale polynomially are comparatively fast to calculate and exceptional network characteristics can therefore even be identified in very large networks. 1 For each random network-model (ER, BS, WS) any combination of network size (n = 1,..., nodes) and edge density (1%, 25%, and 5%) has been evaluated times. 6

4 A Small-World Emerging - Detailed Result Discussion In total we identified 5 singular node motifs, which differ in frequency and time of emergence (Fig. 3c): Motifs 2, 3 and 5 appear right from the beginning of the rewiring process; motifs 2 and 5 gradually become more common over time, whereas 3 levels out after a transient peak. The remaining motifs 1 and especially 4 only become apparent at later stages towards which both become more frequent. The motifs temporally dependent expression levels can be understood by looking at their individual characteristics (Fig. 3d): 1. A node according to motif 1 has relatively few connections in contrast to its well connected neighbourhood. Other nodes that were initially linked to it have rewired themselves and because connections only change in % of the cases, motif 1 is rarely observed in early stages. 2. This contrasts the early appearance of motif 2 for which corresponding nodes are signified by many connections to a rather sparsely connected neighbourhood. From the starting point of a ring lattice such configuration occurs, as re-linking one of the initial regular connections destroys the local neighbourhood structure; if multiple nodes re-wire to the same target its degree grows, which makes the node a candidate for motif 2. 3. Motif 3-nodes have relatively few connections and nodes in their neighbourhood are similar in number of links and corresponding targets. This characterisation fits nodes linked to others that have been disconnected from the direct neighbours only. Such is likely to be observed during the first steps of the rewiring process, where links to the closest neighbour are replaced, which is in agreement with motif 3 s early peak in frequency. Later, when connections to further away neighbours are lost, the locality index decreases and fewer nodes fulfil the profile of motif 3. 4. The 4 th motif mostly starts to appear when nodes are visited for the third time and some of the longest initial connections are replaced. At these late stages the ring lattice has undergone substantial perturbation, such that nodes differ widely in their degree and interconnectivity. Motif 4 describes rarely connected nodes whose neighbours have a diverse number of connections; but instead of being linked between each other, neighbours share other common targets. 5. The final motif 5 can be best characterised by its relation to the rest of the network, which shows a higher degree of connectivity than any node involved in the motif. Neighbours of the motif-node further vary in their number of connections and do not link to each other. This motif emerges early on, but its frequency rises more quickly during the last re-wiring-pass. During that time the last initial links are broken up and motif 5 emerges, as more parts of the network finally become sparse enough. 7

References [1] Albert-László Barabási and Réka Albert. Emergence of Scaling in Random Networks. Science, 286(5439):59 12, 1999. [2] Ilaria Bordino, Debora Donato, Aristides Gionis, and Stefano Leonardi. Mining Large Networks with Subgraph Counting. In 8th IEEE International Conference on Data Mining (ICDM). IEEE, 8. ISBN 978--7695-352-9. [3] Luciano Da Fontoura Costa and Filipi Nascimento Silva. Hierarchical Characterization of Complex Networks. Journal of Statistical Physics, 125(4):841 76, 6. [4] Luciano Da Fontoura Costa, Francisco A. Rodrigues, G. Travieso, and P. R. Villas Boas. Characterization of complex networks: A survey of measurements. Advances in Physics, 56 (1):167 242, 7. [5] Luciano Da Fontoura Costa, Francisco A. Rodrigues, Claus C. Hilgetag, and Marcus Kaiser. Beyond the average: Detecting global singular nodes from local features in complex networks. Europhysics Letters, 87(July):188, 9. [6] P. Erdös and A. Rényi. On Random Graphs I. Publ. Math. (Debrecen), 6:29 7, 1959. [7] Marcus Kaiser, Matthias Goerner, and Claus C. Hilgetag. Criticality of spreading dynamics in hierarchical cluster networks without inhibition. New Journal of Physics, 9:11, 7. [8] N Kashtan, S Itzkovitz, R Milo, and U. Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 2(11):1746 58, 4. [9] M Kuramochi and G Karypis. Frequent Subgraph Discovery. In Proceedings of the 1 IEEE International Conference on Data Mining, pages 313 2. IEEE Computer Society, 1. ISBN -7695-1119-8. [1] Manuel Middendorf, Etay Ziv, and Chris H. Wiggins. Inferring network mechanisms: the Drosophila melanogaster protein interaction network. Proceedings of the National Academy of Sciences of the United States of America, 12:3192 7, 5. [11] R Milo, S. Shen-Orr, S Itzkovitz, N Kashtan, D Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824 7, 2. [12] Pedro Ribeiro, Fernando Silva, and Marcus Kaiser. Strategies for Network Motifs Discovery. In 5th IEEE International Conference on e-science, pages 8 7. IEEE, 9. ISBN 978-- 7695-3877-8. [13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, 1994. ISBN 52138778. [14] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of small-world networks. Nature, 393:4 2, 1998. 8