Signal Processing for Big Data

Similar documents
Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

CAIM: Cerca i Anàlisi d Informació Massiva

Extracting Information from Complex Networks

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Summary: What We Have Learned So Far

CSCI5070 Advanced Topics in Social Computing

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

Social Network Analysis

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Social-Network Graphs

Introduction to Graph Theory

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Properties of Biological Networks

ECE 158A - Data Networks

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

A Generating Function Approach to Analyze Random Graphs

TELCOM2125: Network Science and Analysis

V2: Measures and Metrics (II)

Critical Phenomena in Complex Networks

Assessing and Safeguarding Network Resilience to Nodal Attacks

Zhibin Huang 07. Juni Zufällige Graphen

Graph Theory and Network Measurment

My favorite application using eigenvalues: partitioning and community detection in social networks

1. Lecture notes on bipartite matching

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

Machine Learning for Data Science (CS4786) Lecture 11

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Special-topic lecture bioinformatics: Mathematics of Biological Networks

Complex Networks. Structure and Dynamics

TELCOM2125: Network Science and Analysis

Using Complex Network in Wireless Sensor Networks Abstract Keywords: 1. Introduction

Machine Learning and Modeling for Social Networks

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Graph drawing in spectral layout


TELCOM2125: Network Science and Analysis

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

CS-E5740. Complex Networks. Scale-free networks

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Overlay (and P2P) Networks

CS 534: Computer Vision Segmentation and Perceptual Grouping

Community Detection. Community

Erdős-Rényi Model for network formation

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

Intro to Random Graphs and Exponential Random Graph Models

Algebraic Graph Theory- Adjacency Matrix and Spectrum

caution in interpreting graph-theoretic diagnostics

Why is a power law interesting? 2. it begs a question about mechanism: How do networks come to have power-law degree distributions in the first place?

Graph Theory for Network Science

Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks

Basics of Network Analysis

Graph-theoretic Properties of Networks

Network Mathematics - Why is it a Small World? Oskar Sandberg

Networks in economics and finance. Lecture 1 - Measuring networks

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

Topic II: Graph Mining

EECS 1028 M: Discrete Mathematics for Engineers

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Part II. Graph Theory. Year

Constructing a G(N, p) Network

Branching Distributional Equations and their Applications

M.E.J. Newman: Models of the Small World

1 More configuration model

Complex-Network Modelling and Inference

Spectral Clustering on Handwritten Digits Database

Mining Social Network Graphs

E6885 Network Science Lecture 5: Network Estimation and Modeling

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

V10 Metabolic networks - Graph connectivity

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Exercise set #2 (29 pts)

Restricted edge connectivity and restricted connectivity of graphs

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

Constructing a G(N, p) Network

Introduction to network metrics

TELCOM2125: Network Science and Analysis

Source. Sink. Chapter 10: Iterative Programming Maximum Flow Problem. CmSc250 Intro to Algorithms

Lecture 6: Spectral Graph Theory I

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

CSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks

Discrete mathematics , Fall Instructor: prof. János Pach

Geodesics in heat: A new approach to computing distance

Networks and Discrete Mathematics

Transcription:

Signal Processing for Big Data Sergio Barbarossa 1

Summary 1. Networks 2.Algebraic graph theory 3. Random graph models 4. OperaGons on graphs 2

Networks The simplest way to represent the interaction between different entities (machines, agents, people, ) is a graph A graph is composed of vertices and edges connecting pairs of vertices A powerful theory to extract network features from a graph is Algebraic Graph Theory 3

Networks More complex representations of interactions are hypergraphs or simplicial complexes as they incorporate more information than just pair relations 4

Networks Examples 1. Technological networks 1.1 Internet The vertices are routers The edges are physical links (fiber optic, wireless link, ) 5

Networks Examples 1. Technological networks 1.2 Power grid The vertices are generating stations and switching substations The edges are high voltage transmission lines Spatial distribution of load on the European power grid 6

Networks Examples 2. Information networks 2.1 World Wide Web The vertices are webpages The edges are hyperlinks between pages 7

Networks Examples 2. Information networks 2.2 Citation networks The vertices are papers or disciplines The edges represent citations Curiosity: Erdos number 8

Networks Examples 3. Biological networks - Gene regulatory networks (GRN) The vertices are proteins or genes that code for them A directed edge from A to B indicates that A regulates the expression of B In a GRN, a gene may either promote or inhibit a transcription factor 15/10/17 Signal Processing for Big Data 9

Networks Examples 3. Biological networks - Gene regulatory networks (GRN) Example: Finding the GRN including the protein p53, helped to identify cancer inducing mechanisms p53 plays a key role in a series of chemical reactions involved in DNA repair, cell apoptosis and cell cycle arrest A mutation of p53 induces a series of undesired behaviors 10

Algebraic graph theory Consider a graph with N vertices and E edges Adjacency matrix A (NxN): a ij = 1 if there is an edge between node i and node j, otherwise a ij = 0 Degree matrix D (NxN): diagonal matrix with d ii = NX a ij j=1 Incidence matrix B (NxE): B ij = 1, if vertex i is in the tail of edge j B ij = -1, if vertex i is in the head of edge j B ij = 0, otherwise Laplacian matrix L (NxN): L = D A = BB T Edge Laplacian L e (ExE): L = B T B 11

Algebraic graph theory Example 1 2 5 A = 2 6 4 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 0 3 7 5 3 4 L = 2 6 4 2 1 1 0 0 1 3 1 0 1 1 1 4 1 1 0 0 1 2 1 0 1 1 1 3 3 7 5 12

Algebraic graph theory Example 1 3 2 4 5 B = A = 2 6 4 2 6 4 1 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 0 0 3 7 5 3 7 5 Note: for a directed graph (digraph) L = D A 6= BB T 13

Algebraic graph theory Properties - The total number of paths of length k between two nodes i and j is [A k ] ij - The total number of loops of length k starting from node i is - The total number of loops of length k is - The number of triangles in a graph is tr(a k ) tr(a 3 )/6 [A k ] ii 1 - By construction, L1= 0, hence is an eigenvector of L associated to the zero eigenvalue Given a vector x defined over the vertices of a graph, the disagreement is x T Lx= X u,v2e (x u x v ) 2 14

Algebraic graph theory Properties If G is a graph with c connected components rank (B) = N c Sketch of the proof: Let us look at the (left) null space of B z T B = 0 if (u,v) is an edge of the graph z u z v = 0 z is constant over each connected component How many independent z? c Null space of B = c Equivalently, If G is a graph with c connected components rank (L) = N c 15

Algebraic graph theory Properties Let us denote by 1 apple 2 apple...apple N the eigenvalues of L - By construction, the minimum eigenvalue of L is - The eigenvector associated to is composed of all ones 1 =0 1 =0 - The multiplicity of equals the number of connected components 1 =0 16

Algebraic graph theory Conductance Let S @S be a subset of the vertex set V denotes the boundary of S, i.e. the set of edges with one end in S and the other end outside S Conductance: := min S @S S with S apple V /2 Theorem: 2 2 2 = second smallest eigenvalue of L measures graph connectivity 17

Algebraic graph theory Eigen-decomposition of L From Rayleigh-Ritz theorem min(l) apple xt Lx x T x apple u 1 = arg min x subject to u i = arg min x subject to max(l) x T Lx x T x ku 1 k =1 x T Lx x T x ku i k =1 u T i u j =0, j =1,...,i 1 18

Algebraic graph theory Examples of eigenvectors u 2 u 3 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0.5 0.5 1 1 0.5 0 0.5 1 1.5 1 1 0.5 0 0.5 1 1.5 19

Graph features Graph features Diameter: maximal distance (number of hops along the geodesic path) between any pair of nodes Denoting with the average degree in a random graph If the graph is composed of isolated trees If a giant cluster appears If concentrated around the graph is totally connected and the diameter is 20

Graph features - Clustering coefficient The clustering coefficient C i for a vertex v i is given by the proportion of links between the vertices within its neighborhood divided by the max number of links that could possibly exist between them The clustering coefficient for the whole system is the average of the clustering coefficients: 21

Graph features - Degree centrality: d i n 1 - Closeness centrality: l(i, j) n 1 l(i, j) j=i where denotes the number of links in the shortest path between i and j - Betweenness centrality: X k6=j;i6=k,j P i (kj)/p (kj) (n 1)(n 2)/2 where denotes the number of geodesics (shortest paths) between k and j passing through node i, whereas P (kj) is the number of geodesics between k and j 22

Graph features Betweenness centrality Example: fifteenth century Florence BC(Medici) = 0.522 BC(Strozzi) = 0.103 BC(Guadagni) = 0.255 23

Graph features Eigenvector centrality Idea: Importance of a vertex in a network increases by having connections to other vertices that are themselves important x i = NX A ij x j j=1 The solution is given by the eigenvector associated to the largest eigenvalue of A x = Ax 24

Random graph models Erdos-Renyii Each node is connected with to each of the other n 1 nodes with probability p The presence of links are statistically independent event The degree distribution is then p k = n 1 k p k (1 p) n 1 k mean degree: (n 1)p ; standard deviation: p (n 1)p(1 p) If the average of nodes s steps away from a random node is number of steps necessary to reach any node is diameter: D ln n ln c c s, the average 25

Random graph models Erdos-Renyii Giant component (asymptotic behavior) Let us denote by u the fraction of nodes not belonging to a giant component For a vertex i not to belong to the giant component it must not be connected to the giant component via any other vertex For every other vertex j in the graph, either (a) i is not connected to j by an edge, or (b) i is connected to j but j is itself not a member of the giant component The total probability of not being connected to the giant component via vertex j is 1 p + pu u =(1 p + pu) n 1 26

Random graph models Erdos-Renyii Giant component (asymptotic behavior) The fraction S of nodes in the giant component satisfies where c =(n 1)p S =1 e cs 1 0.9 Size of the giant component S 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Mean degree c Note: transition phase

Random graph models Phase transition in random graphs Random graphs often exhibit phase transition phenomena as many physical systems, like water-ice transition, magnetism, Phase transitions are often regulated by small variations of a single parameter, e.g. average degree, 28

Random graph models Small world networks Motivation Purely random graphs exhibit a small average shortest path length (varying typically as the logarithm of the number of nodes) along with a small clustering coefficient However, many real-world networks have a small average shortest path length, but also a clustering coefficient significantly higher than expected by chance Milgram experiment (six degrees of separation) A small-world network is a graph with high clustering coefficient, where most nodes are not neighbors of each other, but they can be reached from every other by a small number of hops 29

Random graph models Watts and Strogatz model: (i) a small average shortest path length, (ii) a large clustering coefficient - starting from a regular graph regular small-world (uncorrelated) random - rewiring edges with equal and independent probability p r p r = 0 increasing p r = 1 randomness 30

Random graph models Watts and Strogatz model: (i) a small average shortest path length, (ii) a large clustering coefficient 1 for intermediate values of p r : 0.5 small-world behavior: average clustering (C) high 0 p r = 0 0.01 1 average distance (L) low 31

Random graph models Scale-free model In most real networks, the degree distribution follows a polynomial law decay, as opposed to exponential decay of purely random networks Scale-free networks exhibit polynomial decay Scale-free networks can be grown through a preferential attachment rule random networks scale-free networks 32

Random graph models Scale-free model The distinguishing characteristic of scale-free networks is that their degree distribution follows a power law relationship defined by P (k) k In words, some nodes act as "highly connected hubs" (high degree), but most nodes have a low degree The scale-free model has a systematically shorter average path length than a random graph (thanks to the hub nodes) 33

Random graph models Network building rules (dynamic) 1. The network begins with an initial network of m0 (>1) nodes 2. Growth: New nodes are added to the network one at a time 3. Preferential attachment: Each new node is connected to m of the existing nodes with a probability proportional to the number of links that the existing node already has. Formally, the probability that the new node is connected to node i is where is the degree of node i (rich get richer) 34

Random graph models Random geometric graphs A random geometric graph is a random undirected graph drawn on a bounded region It is generated by: 1. Placing vertices at random uniformly and independently on the region 2. Connecting two vertices, u, v if and only if the distance between them is smaller than a threshold r 35

Random graph models Random geometric graphs Def.: A graph is said to be k connected (k=1,2,3,...) if for each node pair there exist at least k mutually independent paths connecting them Equivalently, a graph is k connected if and only if no set of (k 1)nodes exists whose removal would disconnect the graph The maximum value of k for which a connected graph is k connected is the connectivity κ of G. It is the smallest number of nodes whose failure would disconnect G As r0 increases, the resulting graph becomes k connected at the moment it achieves a minimum degree d min equal to k 36

Random graph models Random geometric graphs Thm (Gupta & Kumar): Given a graph G(n, r n ), with r n = r log n + cn n the graph is connected with probability one as n goes to infinity if and only if lim c n = 1 n!1 Example: r n = r 2 log n n 37

Operations on graphs Graph partitioning Given a graph, split in two complementary subsets S and S c, let us associate different labels to nodes belonging to different subsets: Note s i = 1, if i belongs to S, s i = -1, if i belongs to S c 0.5 (1-s i s j ) = 0, if i and j belong to the same set, 0.5 (1-s i s j ) = 1, if i and j belong to different sets DefiniGon: Cut size = R = 1 X X A ij (1 s i s j ) 4 i j Problem: Split a graph in two subsets in such a way that the cut size is minimum 38

Operations on graphs Graph partitioning Cut size can be rewrixen as R = 1 4 st Ls Constraints: - number of nodes / cluster - bounded norm Problem formulagon: s = argmin s T Ls subject to s i 2 { 1, 1} This is a combinatorial problem NX s i = n 1 n 2 i=1 39

Operations on graphs Graph partitioning Relaxed problem: Lagrangian: L(s;,µ)= s = argmin s T Ls NX subject to s 2 i = N NX NX L jk s j s k + Se\ng the gradient to zero, we get i=1 NX s i = n 1 n 2 i=1 0 @N NX 1 0 X N A +2µ @n 1 n 2 s 2 j s j k=1 j=1 j=1 j=1 Ls= s + µ 1 1 A 40

Operations on graphs Graph partitioning Relaxed problem: MulGplying from the le^ side by 1 T, we get µ = n 2 n 1 N Introducing the vector x := s + µ 1 = s + n 2 n 1 N 1 we get Lx= x x is then an eigenvector of L The cut size can be rewrixen as R = n 1n 2 N 41

Operations on graphs Graph partitioning Relaxed problem: x is then the eigenvector associated to the second smallest eigenvalue of L : u 2 The (real) solugon is then s R = x + n 1 n 2 N 1 s T s R The closest binary solugon is obtained by maximizing the scalar product s The opgmal is achieved by assigning s i =+1 to the n 1 vergces with the largest x i +(n 1 n 2 )/N and s i = 1 to the other vergces 42

Operations on graphs Graph partitioning Example 1.6 1.4 u 2 0.06 1.2 0.04 1 0.02 0.8 0 0.6 0.02 0.4 0.04 0.2 0.06 0 1 0.5 0 0.5 1 1.5 43

Operations on graphs Graph partitioning 2 u 2 u 3 2 1.5 0.06 1.5 0.08 0.04 0.06 1 0.02 1 0.04 0.5 0 0.5 0.02 0 0.02 0 0 0.04 0.02 0.5 0.06 0.5 0.04 1 1 0.5 0 0.5 1 1.5 1 1 0.5 0 0.5 1 1.5 0.06 44

References 1. M. Newman, Networks: An IntroducGon, Oxford Univ. Press, 2010 2. C. Godsil, and G. Royle, Algebraic Graph Theory, Springer, New York, 2001 3. M. Mesbahi, M. Egerstedt, Graph TheoreGc Methods in MulGagent Networks, Princeton Univ. Press, 2010 4. R. Albert and A.-L. Barabasi, StaGsGcal mechanics of complex networks," Reviews of Modern Physics, 74(1), pp.47-97, 2002. 45