Lecture 2: From Structured Data to Graphs and Spectral Analysis

Similar documents
Lecture 2: Geometric Graphs, Predictive Graphs and Spectral Analysis

REGULAR GRAPHS OF GIVEN GIRTH. Contents

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Graph drawing in spectral layout

Testing Isomorphism of Strongly Regular Graphs

Algebraic Graph Theory- Adjacency Matrix and Spectrum

CSCI5070 Advanced Topics in Social Computing

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Assessing and Safeguarding Network Resilience to Nodal Attacks

Random Generation of the Social Network with Several Communities

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

Mathematics of Networks II

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Chapter 2. Spectral Graph Drawing

Basics of Graph Theory

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Introduction to Graph Theory

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Local Fiedler Vector Centrality for Detection of Deep and Overlapping Communities in Networks

TELCOM2125: Network Science and Analysis

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Clustering in Networks

Social-Network Graphs

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

Mining Social Network Graphs

Ma/CS 6b Class 26: Art Galleries and Politicians

Machine Learning for Data Science (CS4786) Lecture 11

Paths, Circuits, and Connected Graphs

MAT 280: Laplacian Eigenfunctions: Theory, Applications, and Computations Lecture 18: Introduction to Spectral Graph Theory I. Basics of Graph Theory

Extracting Information from Complex Networks

ON THE STRONGLY REGULAR GRAPH OF PARAMETERS

My favorite application using eigenvalues: partitioning and community detection in social networks

Complex-Network Modelling and Inference

Design of Orthogonal Graph Wavelet Filter Banks

Lecture 3: Descriptive Statistics

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

An Empirical Analysis of Communities in Real-World Networks

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Maximizing edge-ratio is NP-complete

The clustering in general is the task of grouping a set of objects in such a way that objects

Application of Spectral Clustering Algorithm

CS388C: Combinatorics and Graph Theory

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

Graphs, Zeta Functions, Diameters, and Cospectrality

Social Network Analysis

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Extremal Graph Theory: Turán s Theorem

AMS /672: Graph Theory Homework Problems - Week V. Problems to be handed in on Wednesday, March 2: 6, 8, 9, 11, 12.

Algebraic Graph Theory

Extremal Graph Theory. Ajit A. Diwan Department of Computer Science and Engineering, I. I. T. Bombay.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized Rumor Spreading in Social Networks

Spectral Clustering on Handwritten Digits Database

Distances in power-law random graphs

Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin

Introduction aux Systèmes Collaboratifs Multi-Agents

Ma/CS 6b Class 13: Counting Spanning Trees

Conic Duality. yyye

CMPSCI611: The Simplex Algorithm Lecture 24

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

Overlay (and P2P) Networks

Signal Processing for Big Data

Bounds on graphs with high girth and high chromatic number

CS7540 Spectral Algorithms, Spring 2017 Lecture #2. Matrix Tree Theorem. Presenter: Richard Peng Jan 12, 2017

Models and Algorithms for Complex Networks

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

On the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments

Solving problems on graph algorithms

HW Graph Theory SOLUTIONS (hbovik) - Q

Introduction to spectral clustering

Cospectral digraphs from locally line digraphs

THE SHRIKHANDE GRAPH. 1. Introduction

Two-graphs revisited. Peter J. Cameron University of St Andrews Modern Trends in Algebraic Graph Theory Villanova, June 2014

A graph is a mathematical structure for representing relationships.

NOTE ON MINIMALLY k-connected GRAPHS

Erdős-Rényi Model for network formation

CSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

Networks in economics and finance. Lecture 1 - Measuring networks

Non Overlapping Communities

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

CS256 Applied Theory of Computation

Assignment 1 Introduction to Graph Theory CO342

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Random Simplicial Complexes

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

8. The Postman Problems

ON INDEX EXPECTATION AND CURVATURE FOR NETWORKS

Transcription:

Lecture 2: From Structured Data to Graphs and Spectral Analysis Radu Balan February 9, 2017

Data Sets Today we discuss type of data sets and graphs. The overarching problem is the following: Main Problem Given a graph, discover if it can be explained as a structured data graph, or just as a random graph.

Data Sets Today we discuss type of data sets and graphs. The overarching problem is the following: Main Problem Given a graph, discover if it can be explained as a structured data graph, or just as a random graph. We shall discuss first how to construct a sequence of nested graphs from a data set. Two types of data: 1 Percolation model 2 Weighted graphs

Data Sets Percolation Models Fix a set of points in R d. Example, for d = 2: n = 10 2 = 100 Uniform (regular) lattice.

Data Sets Percolation Models Fix a set of points in R d. Example, for d = 2: n = 10 2 = 100 Nonuniform (irregular) lattice.

Data Sets Percolation Models Fix a set of points in R d. Example, for d = 2: n = 10 2 = 100 Nonuniform (irregular) lattice. Created by random perturbation of the regular lattice.

Data Sets Percolation Models Construct the matrix of pairwise distances: ( ) V = r k r j, r k = (x k, y k ). 1 k,j n

Data Sets Percolation Models Construct the matrix of pairwise distances: ( ) V = r k r j, r k = (x k, y k ). 1 k,j n Then sort the set of distances in an ascending order. This way we define an order on the set of pairs of points. Implicitly this defines an ascending order on the set of edges. We obtain a sequence of nested graphs (G t ) t 0 0 t m = n(n 1)/2 where t indicates the number of edges in the graph G t. Thus G t has n nodes and t edges.

Data Sets Percolation Models Play Examples: n = 100, regular/irregular, different types of norms: r k r j 2 = (x k x j ) 2 + (y k y j ) 2 r k r j = max( x k x j, y k y j )

Data Sets Weighted Graphs A different class of graphs: weighted graphs, (V, E, W ). Examples: 1 Joint co-authorship papers: V is the set of all authors; E is the list of joint papers; w(e i,j ) is the number of papers where both i and j are co-authors. 2 Protein-protein interaction or simultaneous expression. 3 Social networks: Facebook, LinkedIn: V is the set of users; E is the list of friendship links, or connections; w(e i,j ) is a measure of activity between i and j, e.g. number of endorsements, or like, or comments between i and j. 4 Communication networks... 5 Email datasets (Enron)

Data Sets Weighted Graphs The sequence of nested graphs is obtained by sort the edges according to their weights: start with the largest weight first, and then pick the next largest weight, and so on.

Data Sets Weighted Graphs The sequence of nested graphs is obtained by sort the edges according to their weights: start with the largest weight first, and then pick the next largest weight, and so on.

Data Sets Data Size Size matters:

Data Graphs Data Sets Data Size Size matters: Radu Balan () Graphs 1

Data Graphs Data Sets Data Size Size matters: By Paul Signac - Ophelia2, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12570159 (L Hirondelle Steamer on the Seine) Radu Balan () Graphs 1

Data Sets Public Datasets On Canvas you can find links to several public databases: 1 Duke: https://dnac.ssri.duke.edu/datasets.php 2 Stanford: https://snap.stanford.edu/data/ 3 Uni. Koblenz: http://konect.uni-koblenz.de/ 4 M. Newman (U. Michigan): http://www-personal.umich.edu/ mejn/netdata/ 5 A.L. Barabasi (U. Notre Dame): http://www3.nd.edu/ networks/resources.htm 6 UCI: https://networkdata.ics.uci.edu/resources.php

Spectral Analysis Basic Properties Last time we learned how to construct: the Adjacency matrix A, the Degree matrix D, the (unnormalized symmetric) graph Laplacian matrix = D A, the normalized Laplacian matrix = D 1/2 D 1/2, and the normalized asymmetric Laplacian matrix L = D 1.

Spectral Analysis Basic Properties Last time we learned how to construct: the Adjacency matrix A, the Degree matrix D, the (unnormalized symmetric) graph Laplacian matrix = D A, the normalized Laplacian matrix = D 1/2 D 1/2, and the normalized asymmetric Laplacian matrix L = D 1. We denote: n the number of vertices (also known as the size of the graph), m the number of edges, d(v) the degree of vertex v, d(i, j) the distance between vertex i and vertex j (length of the shortest path connecting i to j), and by D the diameter of the graph (the largest distance between two vertices = longest shortest path ).

Spectral Analysis Basic Properties In this section we summarize spectral properties of the Laplacian matrices. Theorem 1 = T 0, = T 0 are positive semidefinite matrices. 2 eigs( ) = eigs(l) [0, 2]. 3 0 is always an eigenvalue of,, L with same multiplicity. Its multiplicity is equal to the number of connected components of the graph. 4 λ max ( ) 2 max v d(v), i.e. the lagest eigenvalue of is bounded by twice the largest degree of the graph.

Spectral Analysis Basic Properties Theorem Let 0 = λ 0 λ 1 λ n 1 2 be the eigenvalues of (or L), that is eigs( ) = {λ 0, λ 1,, λ n 1 } = eigs(l). Then: n 1 1 i=0 λ i n. 2 n 1 i=0 λ i = n if and only if the graph is connected (i.e. no isolated vertices). 3 λ 1 n n 1. 4 λ 1 = n n 1 if and only if the graph is complete (i.e. any two vertices are connected by an edge). 5 If the graph is not complete then λ 1 1. 6 If the graph is connected then λ 1 > 0. If λ i = 0 and λ i+1 0 then the graph has exactly i + 1 connected components. 7 If the graph is connected (no isolated vertices) then λ n 1 n n 1.

Spectral Analysis Smallest nonnegative eigenvalue Theorem Assume the graph is connected. Thus λ 1 > 0. Denote by D its diameter and by d max, d, d H the maximum, average, and harmonic avergae of the degrees (d 1,, d n ): d max = max d j, d = 1 j n n d j, j=1 1 d H = 1 n n j=1 d j. Then 1 λ 1 1 nd. 2 Let µ = max 1 j n 1 1 λ j. Then 1 + (n 1)µ 2 n d H (1 (1 + µ)( d d H 1)).

Spectral Analysis Smallest nonnegative eigenvalue Theorem [continued] 3 Assume D 4. Then dmax 1 λ 1 1 2 (1 2 d max D ) + 2 D.

Spectral Analysis Comments on the proof Ingredients and key relations: 1. Let f = (f 1, f 2,, f n ) R n be a n-vector. Then: f, f = (f x f y ) 2 x y where x y if there is an edge between vertex x and vertex y (i.e. A x,y = 1). This proves positivity of all operators. 2. Last time we showed eigs( ) = eigs(l) because and L are similar matrices. 3. 0 is an eigenvalue for with eigenvector 1 = (1, 1,, 1). If multiple connected components, define such a 1 vector for each component (and 0 on rest). 4. λ max ( ) = 1 + λ max (D 1/2 AD 1/2 ).

Spectral Analysis Comments on the proof - 2 λ max (D 1/2 AD 1/2 ) = max f =1 D 1/2 AD 1/2 f, f = max f =1 i,j f i A i,j di f j dj Next use Cauchy-Schwartz to get f i f A i,j j di dj f 2 i A i,j = i,j d i i j i f 2 i = f 2 = 1. Thus λ max ( ) 2. Similarly λ max ( ) 2(max i d i ).

Spectral Analysis Comments on the proof - 2 λ max (D 1/2 AD 1/2 ) = max f =1 D 1/2 AD 1/2 f, f = max f =1 i,j f i A i,j di f j dj Next use Cauchy-Schwartz to get f i f A i,j j di dj f 2 i A i,j = i,j d i i j i f 2 i = f 2 = 1. Thus λ max ( ) 2. Similarly λ max ( ) 2(max i d i ). 5. If the graph is connected, trace( ) = n = n 1 i=0 λ i. Since λ 0 = 0 we get all statements of Theorem 2.

Spectral Analysis Comments on the proof - 2 λ max (D 1/2 AD 1/2 ) = max f =1 D 1/2 AD 1/2 f, f = max f =1 i,j f i A i,j di f j dj Next use Cauchy-Schwartz to get f i f A i,j j di dj f 2 i A i,j = i,j d i i j i f 2 i = f 2 = 1. Thus λ max ( ) 2. Similarly λ max ( ) 2(max i d i ). 5. If the graph is connected, trace( ) = n = n 1 i=0 λ i. Since λ 0 = 0 we get all statements of Theorem 2. 6. Theorem 3 is slightly more complicated (see [2]).

Spectral Analysis Special graphs: Cycles and Complete graphs Cycle graphs: like the hexagon in HW2. Remark: Adjacency matrices are circulant, and so are, = L.

Spectral Analysis Special graphs: Cycles and Complete graphs Cycle graphs: like the hexagon in HW2. Remark: Adjacency matrices are circulant, and so are, = L. Then argue the FFT forms a ONB of eigenvectors. Compute the eigenvalues as FFT of the generating sequence.

Spectral Analysis Special graphs: Cycles and Complete graphs Cycle graphs: like the hexagon in HW2. Remark: Adjacency matrices are circulant, and so are, = L. Then argue the FFT forms a ONB of eigenvectors. Compute the eigenvalues as FFT of the generating sequence. Consequence: The normalized Laplacian has the following eigenvalues: 1 For cycle on n vertices: λ k = 1 cos 2πk n, 0 k n 1. 2 For the complete graph on n vertices: λ 0 = 0, λ 1 = = λ n 1 = n n 1.

References B. Bollobás, Graph Theory. An Introductory Course, Springer-Verlag 1979. 99(25), 15879 15882 (2002). F. Chung, Spectral Graph Theory, AMS 1997. F. Chung, L. Lu, The average distances in random graphs with given expected degrees, Proc. Nat.Acad.Sci. 2002. R. Diestel, Graph Theory, 3rd Edition, Springer-Verlag 2005. P. Erdös, A. Rényi, On The Evolution of Random Graphs G. Grimmett, Probability on Graphs. Random Processes on Graphs and Lattices, Cambridge Press 2010. J. Leskovec, J. Kleinberg, C. Faloutsos, Graph Evolution: Densification and Shrinking Diameters, ACM Trans. on Knowl.Disc.Data, 1(1) 2007.