Introduction to network metrics

Similar documents
Basics of Network Analysis

CAIM: Cerca i Anàlisi d Informació Massiva

Summary: What We Have Learned So Far

Complex-Network Modelling and Inference

Machine Learning and Modeling for Social Networks

Networks in economics and finance. Lecture 1 - Measuring networks

V2: Measures and Metrics (II)

arxiv:cond-mat/ v5 [cond-mat.dis-nn] 16 Aug 2006

Graph-theoretic Properties of Networks

Characteristics of Preferentially Attached Network Grown from. Small World

Graph Theory for Network Science

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

beyond social networks

1 Homophily and assortative mixing

CS-E5740. Complex Networks. Network analysis: key measures and characteristics

Critical Phenomena in Complex Networks

1 More configuration model

An introduction to the physics of complex networks

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT

Topic II: Graph Mining

Introduction to Complex Networks Analysis

Higher order clustering coecients in Barabasi Albert networks

Deterministic small-world communication networks

Overlay (and P2P) Networks

Second-Order Assortative Mixing in Social Networks

arxiv:cond-mat/ v3 [cond-mat.dis-nn] 30 Jun 2005

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems

M.E.J. Newman: Models of the Small World

SNA 8: network resilience. Lada Adamic

Exercise set #2 (29 pts)

Extracting Information from Complex Networks

TELCOM2125: Network Science and Analysis

Graph Theory for Network Science

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

Centrality Book. cohesion.

Topic mash II: assortativity, resilience, link prediction CS224W

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

THE DEPENDENCY NETWORK IN FREE OPERATING SYSTEM

Universal Properties of Mythological Networks Midterm report: Math 485

caution in interpreting graph-theoretic diagnostics

CSCI5070 Advanced Topics in Social Computing

TELCOM2125: Network Science and Analysis

Analytical reasoning task reveals limits of social learning in networks

Erdős-Rényi Model for network formation

The Mathematical Description of Networks

NETWORK ANALYSIS. Duygu Tosun-Turgut, Ph.D. Center for Imaging of Neurodegenerative Diseases Department of Radiology and Biomedical Imaging

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Network Thinking. Complexity: A Guided Tour, Chapters 15-16

Impact of Clustering on Epidemics in Random Networks

The network of scientific collaborations within the European framework programme

- relationships (edges) among entities (nodes) - technology: Internet, World Wide Web - biology: genomics, gene expression, proteinprotein

A Document as a Small World. selecting the term which co-occurs in multiple clusters. domains, from earthquake sequences [6] to register

Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows arxiv: v1 [physics.soc-ph] 12 May 2008

Immunization for complex network based on the effective degree of vertex

Response Network Emerging from Simple Perturbation

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Signal Processing for Big Data

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Introduction to Networks and Business Intelligence

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

Statistical analysis of the airport network of Pakistan

Path Length. 2) Verification of the Algorithm and Code

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

ECE 158A - Data Networks

Disclaimer. Lect 2: empirical analyses of graphs

First-improvement vs. Best-improvement Local Optima Networks of NK Landscapes

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Plan of the lecture I. INTRODUCTION II. DYNAMICAL PROCESSES. I. Networks: definitions, statistical characterization, examples II. Modeling frameworks

Heuristics for the Critical Node Detection Problem in Large Complex Networks

CS-E5740. Complex Networks. Scale-free networks

Modelling data networks research summary and modelling tools

Mathematics of Networks II

Social Data Management Communities

Using graph theoretic measures to predict the performance of associative memory models

An Evolving Network Model With Local-World Structure

arxiv: v1 [physics.soc-ph] 13 Jan 2011

Smallest small-world network

Failure in Complex Social Networks

Community Structure and Beyond

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Modeling and Simulating Social Systems with MATLAB

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Constructing a G(N, p) Network

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

Small World Graph Clustering

Structure of Social Networks

Random graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling

Characterization of complex networks: A survey of measurements

1 Degree Distributions

Empirical analysis of online social networks in the age of Web 2.0

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Transcription:

Universitat Politècnica de Catalunya Version 0.5 Complex and Social Networks (2018-2019) Master in Innovation and Research in Informatics (MIRI)

Instructors Argimiro Arratia, argimiro@cs.upc.edu, http://www.cs.upc.edu/~argimiro/ Marta Arias, marias@cs.upc.edu, http://www.cs.upc.edu/~marias/ Please go to http://www.cs.upc.edu/~csn for all course s material, schedule, lab work, etc.

Network analysis Two major approaches: visual and statistical analysis (e.g., large scale properties). (from Webopedia) Statistical analysis: compression of information (e.g., one value that summarizes some aspect of the network).

Perspectives Metrics as compression of an adjacency matrix. Three perspectives: Distance between nodes. Transitivity Mixing (properties of vertices making an edge).

Geodesic path Geodesic path between two vertices u and v = shortest path between u and v [Newman, 2009] d ij : length of a geodesic path from the i-th to the j-th vertex (network or topological distance between i and j). dij = 1 if i and j are connected. dij = if i and j are in different connected components. Computed with a breadth-first search algorithm (in unweighted undirected networks).

Local distance measures l i : mean geodesic distance from vertex i Definitions: l i = 1 N d ij N l i = 1 N 1 j=1 N j=1(i j) C i : closeness centrality of vertex i. Definition (harmonic mean) C i = 1 N 1 as d ii = 0. Better than C i = 1/l i. or d ij as d ii = 0 N j=1(i j) 1 d ij,

Global distance metrics Diameter: largest geodesic distance. Mean (geodesic distance): l = 1 N N i=1 l i Problem: l might be. Solutions: focus on the largest connected component, mean over l within each connected component,... Mean closeness centrality: C = 1 N N i=1 C i

Global distance metrics Closeness measures have rarely been used (for historical reasons). The closeness centrality of a vertex can be seen as measure of the importance of a vertex (alternative approaches: degree, PageRank,...).

Transitivity Zachary s Karate Club A relation is transitive if a b and b c imply a c. Example: a b = a and b are friends. Edges as relations. Perfect transitivity: clique (complete graph) but real network are not cliques. Big question: how transitive are (social) networks?

Clustering coefficient A path of length two uvw is closed if u and w are connected. number of closed paths of length 2 C = number of paths of length 2 A proportion of transitive triples C = 1 perfect transitivity / C = 0 no transitivity (e.g.,:?). Algorithm: Consider each vertex as v in the path uvw, checking if u and w are connected (only vertices of degree 2 matter). Number of paths of length 2 =?. Equivalently: C = number of triangles 3 number of connected triples of vertices Key: triangle = set of three nodes forming a clique; number of connected triples = number of labelled trees of 3 vertices

Alternative clustering coefficient Watts & Strogatz (WS) clustering coefficient [Watts and Strogatz, 1998] Local clustering: C i = number of pairs of neighbors of i that are connected number of pairs of neighbours of i Assuming undirected graph without loops: N j 1 j=1 k=1 C i = a ija ik a jk ) Global clustering: C WS = 1 N ( ki 2 N i=1 C i

Comments on clustering coefficients I Given a network, C and C WS can differ substantially. C WS has been used very often for historical reasons (C WS was proposed first). C is can be dominated by the contribution of vertices of high degree (which have many adjancent nodes). C WS is can be dominated by the contribution of vertices of low degree (which are many in the majority of networks). C WS needs taking further decision on C i when k i < 2 (C is more elegant from a mathematical point of view).

Comments on clustering coefficients II Conclusion 0: C and C WS measure transitivity in different ways (different assumptions/goals). Conclusion 1: each measure has its strengths and weaknesses. Conclusion 2: explain your methods with precision!

Comments on efficient computation Computational challenge: time consuming computation of metrics on large networks. Solution: Monte Carlo methods for computing. Instead of computing N C WS = 1 N i=1 C i estimate C WS from a mean of C i over a small fraction of randomly selected vertices. High precision exploring a small fraction of nodes (e.g., 5%).

Degree correlations I What is the dependency between the degrees of vertices at both ends of an edge? Assortative mixing (by degree): high degree nodes tend to be connected to high degree nodes, typical of social networks (coauthorship in physics, film actor collaboration,...). Disassortative mixing (by degree): high degree nodes tend to be connected to low degree nodes, e.g., neural network (C. Elegans), ecological networks (trophic relations). No tendency (e.g., Erdös-Rényi graph, Barabási-Albert model).

Degree correlations II k i : degree of the i-th vertex. k i = k i 1: remaining degree of the i-th after discounting the edge i j. Correlation correlation between k i and k j for every edge i j. correlation between k i and k j for every edge i j. metric ρ: 1 ρ 1.

Interclass correlation Theoretical (interclass) correlation: ρ(x, Y ) = COV (X, Y ) σ X σ Y = E[(X E[X ])(Y E[Y ])] σ X σ Y = E[XY ] E[X ]E[Y ] σ X σ Y Symmetry: ρ(x, Y ) = ρ(y, X ), ρ S (X, Y ) = ρ S (Y, X ). Empirical correlation: Paired mesurements: (x 1, y 1 ),...,(x i, y i ),...,(x n, y n ). Sample (interclass) correlation: ρ s (X, Y ) = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i x) 2

Intraclass correlation Theoretical intraclass correlation: Empirical correlation: ρ = COV intra(x ) σ(x ) 2 Paired measurements: (x 1,1, x 1,2 ),...,(x i,1, x i,2 ),...,(x n,1, x n,2 ) σ 2 s = 1 ρ s = (N 1)σs 2 x = 1 2N 1 2(N 1) n (x i,1 x)(x i,2 x) i=1 n (x i,1 + x i,2 ) i=1 n [ (xi,1 x) 2 + (x i,2 x) 2] i=1

Interclass vs intraclass correlation Interclass correlation: Correlation between two variables. Intraclass correlation: Correlation between two different groups (same variable) Extent to which members of the same group or class tend to act alike.

Degree correlations III Intraclass Pearson degree correlation: in an edge i j, X = k i and Y = k j [Newman, 2002]. Three possibilities Assortative mixing (by degree): ρ > 0, ρ s 0 Disassortative mixing (by degree): ρ < 0, ρ s 0 No tendency ρ = 0, ρ s 0 See Table I of [Newman, 2002] arxiv.org.

General comments on degree correlations I A priori, a least two ways of measuring degree correlations: X = ki and Y = k j (Pearson correlation coefficient) X = rank(ki ) and Y = rank(k j ) (Spearman rank correlation) rank(k): the smallest k has rank 1, the 2nd smallest k has rank 2 and so on. In case of tie, the degrees in a tie are assigned a mean rank. Example: Sorted degrees 1 3 5 6 6 6 8 4+5+6 4+5+6 The ranks are 1 2 3 3 3 4+5+6 3 7

General comments on degree correlations II For historical and sociological reasons, Pearson correlation coefficient has been dominant if not the only approach. A test of significance of ρ S has been missing (potentially problematic for ρ S close to 0). Spearman rank correlation can capture non-linear dependencies. Both can fail if the dependency is not monotonic.

General comments on degree correlations II Some general myths about correlations: ρ S must be large to be informative (e.g. ρ S > 0.5). A low value of ρ S can be significant (very small p-value). Rigorous testing is the key. Low but significant ρs can be due to: trends with lots of noise, or clear trends in a narrow domain. No useful information can be extracted from clouds of points. Counterexamples: Vietnam draft (see pp. 248-249 of Gnuplot in action, by Phillipp K. Janert). Menzerath s law in genomes.

General comments on degree correlations III The limits of degree correlations Degree correlations are global measures. The kind of mixing of a vertex might depend on its degree. Solution: The mean degree of nearest neighbours of degree k, i.e. k nn (k) An estimate of E[k k] = k k p(k k), the expected degree k of 1st neighbours (adjacent nodes) of a node of degree k. [Lee et al., 2006]. Statistical properties of sampled networks. Fig. 10 of arxiv.org / Fig. 9 of doi: 10.1103/PhysRevE.73.016102

References I Lee, S. H., Kim, P.-J., and Jeong, H. (2006). Statistical properties of sampled networks. Phys. Rev. E, 73:016102. Newman, M. (2009). Networks: an introduction. Oxford University Press. Newman, M. E. J. (2002). Assortative mixing in networks. Phys. Rev. Lett., 89:208701. Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-world networks. Nature, 393(6684):440.