Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

Similar documents
Erdős-Rényi Model for network formation

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

TCOM 501: Networking Theory & Fundamentals. Lecture 11 April 16, 2003 Prof. Yannis A. Korilis

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Social Network Analysis

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CSCI5070 Advanced Topics in Social Computing

Link Analysis and Web Search

Complex-Network Modelling and Inference

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Extracting Information from Complex Networks

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Networks in economics and finance. Lecture 1 - Measuring networks

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

Chapter 9 Graph Algorithms

Degree Distribution: The case of Citation Networks

Constructing a G(N, p) Network

Introduction III. Graphs. Motivations I. Introduction IV

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Economics of Information Networks

Chapter 9 Graph Algorithms

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Constructing a G(N, p) Network

CAIM: Cerca i Anàlisi d Informació Massiva

11/26/17. directed graphs. CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10. undirected graphs

Lecture 3: Recap. Administrivia. Graph theory: Historical Motivation. COMP9020 Lecture 4 Session 2, 2017 Graphs and Trees

CMSC 380. Graph Terminology and Representation

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs

Using! to Teach Graph Theory

2. CONNECTIVITY Connectivity

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

The Network Analysis Five-Number Summary

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Algorithms. Graphs. Algorithms

Chapter 9 Graph Algorithms

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Graph Theory for Network Science

Lecture 27: Learning from relational data

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

V2: Measures and Metrics (II)

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

TELCOM2125: Network Science and Analysis

Basics of Network Analysis

Mining Social Network Graphs

CENTRALITY AND PRESTIGE. Measures of centrality and prestige

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Algorithms: Graphs. Amotz Bar-Noy. Spring 2012 CUNY. Amotz Bar-Noy (CUNY) Graphs Spring / 95

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

A brief history of Google

Mathematics of Networks II

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory

Modeling and Simulating Social Systems with MATLAB

1 More configuration model

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Lecture 22 Tuesday, April 10

Introduction to Graph Theory

Graph and Digraph Glossary

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Basic Network Concepts

Graph Theory and Network Measurment

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Graph and Link Mining

Week 5 Video 5. Relationship Mining Network Analysis

Supplementary material to Epidemic spreading on complex networks with community structures

GRAPHS, GRAPH MODELS, GRAPH TERMINOLOGY, AND SPECIAL TYPES OF GRAPHS

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

Centrality. Peter Hoff. 567 Statistical analysis of social networks. Statistics, University of Washington 1/36

Material handling and Transportation in Logistics. Paolo Detti Dipartimento di Ingegneria dell Informazione e Scienze Matematiche Università di Siena

Graphs, Part 1. Please take a handout & Sit in row H or forward. Week 12, Class 21. CS 60, Spring 2016 B D

An Empirical Analysis of Communities in Real-World Networks

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

CS-E5740. Complex Networks. Network analysis: key measures and characteristics

6.207/14.15: Networks Lecture 5: Generalized Random Graphs and Small-World Model

Symmetric Product Graphs

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

1 Degree Distributions

Paths, Circuits, and Connected Graphs

Graphs And Algorithms

Social Network Analysis

Centrality Book. cohesion.

Social-Network Graphs

CSE 332: Data Structures & Parallelism Lecture 19: Introduction to Graphs. Ruth Anderson Autumn 2018

David Easley and Jon Kleinberg January 24, 2007

Clustering analysis of gene expression data

Lecture 3: Descriptive Statistics

3.1 Basic Definitions and Applications

1 Graphs and networks

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Mathematics of networks. Artem S. Novozhilov

Non Overlapping Communities

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

Preliminaries: networks and graphs

Ordinary Differential Equation (ODE)

Transcription:

Network Science: Graph Theory Ozalp abaoglu ipartimento di Informatica Scienza e Ingegneria Università di ologna www.cs.unibo.it/babaoglu/ ranch of mathematics for the study of structures called graphs used to model pairwise relations between objects Invented by Swiss mathematician Leonhard Euler (5 pril 77 8 September 78) Gives us the language and basic concepts to reason about networks Terminology and notation Formally, a graph is a pair G = (N, E ) where N is the set of nodes (vertices) and E is the set of edges (links, arcs) We let n denote the number of nodes and m denote the number of edges in the graph Example (n = 4, m = 4): Use letters to label nodes, node pairs to label edges N ={,,, } E ={(, ), (, ), (, ), (, )} Graph visualization It is customary to draw the nodes as circles and the edges as lines that join two nodes Is a visualization for the graph G = ({,,, }, {(, ), (, ), (, ), (, )}) 4

Graph visualization The graph is defined by the list of nodes and edges, not by its particular visualization The same graph may have many different visualizations ll represent the same graph but some visualizations can be better than others 5 inary relations Graphs represent arbitrary binary relations among objects Nodes are the objects, the presence of an edge indicates that some relation R holds between the nodes, the absence indicates that relation R does not hold R is true R is false Examples of binary relation R : greater than, is a friend of, trusts, loans money to, co-authored paper with, sits on a board-of-directors with 6 inary relations Note that binary relations are limiting For example, co-authorship among three people cannot be expressed through binary relations If authors, and publish a paper together, the coauthorship graph will represent this through three binary relations ut loses the information that they actually co-authored a common paper 7 irected graphs n edge as we have defined it, is undirected and corresponds to a symmetric binary relation R is true and R is true n asymmetric binary relation holds in one direction only and is represented by a directed edge R is true and R is false Examples of asymmetric binary relations: follows (on Twitter), trusts, connected by a direct flight, loans money to, has a URL to 8

irected graphs Weighted graphs irected graphs are more general than undirected graphs is equivalent to oth directed and undirected graphs can have a weight associated with edges to represent the strength of the relation Examples of weighted graphs: co-authorship (how many joint publications) actors (number of joint films) citations (number of times one author cites another) Edge (, ) Edges (, ) and (, ) flight routes (number of daily non-stop flights) interstate highway (distance between cities) Internet (transmission capacity of a link) 9 Some basic facts Some basic facts What is the maximum number of edges that an undirected graph with n nodes can have? Every node has an edge to every other node How many different undirected graphs with nodes can there be? ( )/ = = 8 Excluding self edges, each node will have n edges, for a total of n(n )/ edges (corrected for double counting) Thus, for any undirected graph, m n(n )/ How many different undirected graphs with n nodes can there be? There can be at most n(n )/ edges Each edge can be present or absent Resulting in a total of n(n )/ combinations

Some basic facts Node degree How does n(n )/ grow with the number of nodes? egree of a node counts the number of edges that are incident on it its neighbors n n(n )/ 5,4 6,768 7,97,5 8 68,45,456 9 68,79,476,76 5,84,7,88,8 5 4,564,89,7,,4,847,894,5,57,.569 57 For a directed graph, we distinguish between the in-degree and the out-degree of a node in: out: in: out: 4.4 8 8.87 in: out: in: out: 4 Node degree distribution Paths, cycles In a graph with n nodes, the node degrees are in the range between and n (excluding self loops) How are node degrees distributed in this interval? re all degrees equally likely or are some degrees more common than others? 5 Frequency 5 4 4 5 6 7 egree 5 path in a graph is an alternating sequence of nodes and edges of the graph If the graph is directed, the path must respect the direction of edges simple path is a path where the nodes do not repeat cycle is a path where the first and last nodes are the same, but otherwise all nodes are distinct 6

Paths, cycles istance The length of a path in a graph is the number of steps it contains from beginning to end the number of edges : simple path : path but not a simple path : cycle HEFG length 5 E FG length H G length The distance between two nodes in a graph is the length of the shortest path between them istance between and G is G F istance between and is istance between and is infinite (or undefined) 7 8 iameter onnectivity, components iameter of a graph is the longest of the distances between all pairs of nodes the longest shortest path subgraph is connected if there is a path between every pair of nodes component of a graph is a maximal connected subgraph E E H H H G G G F F iameter iameter iameter E F Not a component (not maximal) omponent omponent 9

ridge onnectivity, components onnectivity, components graph is connected if it contains a single component For directed graphs, definitions extended to stronglyconnected components and strongly-connected graphs taking into consideration the direction of edges Not connected onnected Strongly-connected component Strongly-connected graph Giant components If the largest component of a graph contains a significant proportion of all nodes, it is called the giant component ridge n edge in a graph is a bridge if deleting it increases the number of components of the graph ridge 4

lustering coefficient of a node lustering coefficient of a node lustering is a measure of how bunched up (unevenly distributed) the edges of a graph are Formally, the clustering coefficient of node is defined as the probability that two randomly selected friends of are friends themselves The fraction of all pairs of s friends who are also friends efined only if has at least two friends (otherwise ) The clustering coefficient is always between and Missing edges has four friends mong the four friends, there are (4 )/=6 possible friendships ut only four of them are actually present Two are missing Thus, the clustering coefficient of node is 4/6=.6666 5 6 lustering coefficient of a graph The clustering coefficient of graph G is the average of the clustering coefficients of all nodes in G lustering coefficient of a graph /( /)= /( /)=/ /(4 /)=/ /( /)=/ /( /)= = (+/+/++/)/5 =.7666 ll nodes are identical and have 4 neighbors Possible edges between pairs of neighbors is 4 / = 6 How many pairs of neighbors are actually connected? lustering coefficient of any node: /6 =.5 lustering coefficient of the entire graph: =.5 7 8

lustering coefficient of a graph lustering quantifies the likelihood that nodes that share a common neighbor are neighbors themselves lustering coefficient of a graph lternative definition of clustering coefficient of a graph: Proportion of all possible triangles that are actually closed Pick two neighbors re they neighbors? In social networks, it is very likely that triangles will indeed close over time triadic closure Number of possible triangles is (5 choose = 5!/!!) Number of closed triangles is lustering coefficient is /=. (compare to.7666) 9 Highly clustered Highly clustered Recall that clustering quantifies the likelihood that nodes that share a common neighbor are neighbors themselves Is alone sufficient to conclude that a graph is highly clustered? close to highly clustered? Is the triangle closed? close to not highly clustered? lustering coefficient of the entire graph,, is the proportion of all possible triangles that are actually closed Not necessarily true! Some number of triangles in a graph could be closed simply by chance graph is highly clustered only if the actual likelihood of a triangle being closed is substantially greater than what we would expect due to pure chance

Edge density Edge density of a graph is the actual number of edges in proportion to the maximum possible number of edges Sparse and dense graphs If! is small, then graph is sparse If! is large, then the graph is dense learly, the edge density of any graph is between and Suppose we pick two nodes of a graph at random without regard to the graph structure (e.g., whether the two nodes share a common neighbor or not) What is the probability p that the two nodes are connected? It is given exactly by the edge density of the graph Sparse (!=/(8 7/)=/8=.7) enser (!=/8=.98) 4 Highly clustered We will compare the clustering coefficient of a graph to its edge density! We consider a graph to be highly clustered if! Highly clustered onsider a ring with eight nodes / lustering coefficient: = Edge density:!=x8/56=.857 = /8 =.75! =.4 Not high / = (6+4/)/8 =.966! =.98 High 5 What if there are one thousand nodes? lustering coefficient: = Edge density:!= /( 999)=. 6

Highly clustered onsider an augmented ring with eight nodes lustering coefficient: =.5 Edge density:!=x6/56=.574 entrality metrics Try to identify nodes in a graph that are important, influential or popular Pucci astellan Peruzzi Strozzi Ridolfi ischeri Tornabuon Medici Guadagni lbizzi Lambertes What if there are one thousand nodes? lustering coefficient: =.5 Edge density:!= /( 999)=.4 arbadori Salvati Ginori cciaiuol Pazzi Why were the Medici an important family in 5th century Florence? 7 8 entrality metrics ifferent notions of centrality egree well connectedness etweenness criticality for connectedness loseness short distances to the rest of the graph Eigenvector importance entrality is a property of a single node but in the context of the entire graph We can also define a global notion of centrality that applies to the entire graph centralization 9 Pucci astellan Peruzzi Strozzi 4 arbadori entrality metrics egree centrality the greater the degree of a node, the more important ppropriate for some settings (social networks) since nodes with high degree are better connected and can serve as introducers Ridolfi cciaiuol ischeri Tornabuon Medici 6 Salvati 4 Guadagni Pazzi lbizzi Ginori Lambertes 4

entrality metrics etweenness Problems with degree-based centrality 6 4 egree-based centrality is not able to capture the notion of brokerage ability of a node in a graph to act as a bridge between different components efine betweenness of node u to be the fraction of all pairwise shortest paths that go through u where g ij = total number of shortest paths between i, j g ij (u) = number of shortest paths between i, j that go through u 4 4 etweenness loseness 5.5.5 What if it is not important to have many friends Or be in a broker position? Important to be in a central position, close to the rest of the graph 7 5 6 5 7 Pucci astellan Peruzzi Strozzi Ridolfi ischeri Tornabuon Medici Guadagni lbizzi Lambertes 4 4=6 all shortest paths between the 4 nodes to the left and the 4 nodes to the right 6 (6 )/=/=5 possible pairs among the 6 neighbors of the central node and all shortest paths go through it 4 +/=.5 the node gets full credit for the shortest paths that go through it but only half the credit for the two shortest paths between the top and bottom nodes arbadori Salvati Ginori cciaiuol Pazzi cciaiuol have degree, betweenness but are just one hop from the Medici 4 44

loseness loseness.5454.5454.47.684 efine closeness of node u based on the average of the shortest path lengths between node u and every other node in the graph.5454.5454.5454.5454.58.475.58.5.47.684 where d(u,i) = length of shortest path between nodes u and i.944.4.4..5..4.4.944 6/(+++++)=6/=.5454 7/(++++++)=7/4=.5 45 46 loseness entrality metrics in directed graphs Pucci astellan Peruzzi Strozzi 4/=.4 arbadori 4/=.4 Ridolfi 4/8=.5 ischeri cciaiuol 4/8=.6 Tornabuon 4/9=.48 Medici 4/5=.56 Salvati 4/6=.8 Guadagni 4/6=.5 Pazzi lbizzi Lambertes Ginori egree, betweenness and closeness centrality definitions extend naturally to directed graphs Out-degree centrality based on out-degree In-degree centrality based on in-degree etweenness centrality of a node becomes the fraction of all pairwise shortest directed paths that go through it In-closeness based on path lengths from all other nodes to the given node Out-closeness based on path lengths from the given node to all other nodes 47 48

Eigenvector centrality asic idea: the importance of a node in a graph is determined by the importance of its neighbors Recursive definition! Extremely relevant and important for the web graph Implemented for directed graphs by the PageRank algorithm that was the main technological innovation behind Google On the web, what counts is not how many pages point to a given page but which pages point to that page The slashdot effect 49 Eigenvector entrality Page Rank Informally, an important node in a directed graph is pointed to by lots of other important nodes out() R(t, ) Let R(t, ) be the rank of at /me t and let out() be its outdegree distributes its rank evenly over its out-edges so that each one receives R(t, )/out() The rank of at /me t+ is obtained by summing the ranks over all of its in-edges 5 Eigenvector entrality Page Rank We have an equation like this for every node in the graph: How to assign ranks to all nodes such that the set of equations for the entire graph is consistent (stable)? Formally, the solution is equivalent to solving for the eigenvector of a matrix (describing the connectivity of the graph) an be approximated algorithmically by iterating contribution of Larry Page and Sergey rin while at Stanford that lead to the Google search engine 5.5 E Eigenvector entrality Page Rank F.5.8.8 G J.5 K.55 L.5 R(, G) = R(, )/ + R(, F) + R(, J)/ + R(, L)/4 = / + + / + /4 =.8 I H M.5 5

Recap lasses of graph properties Global patterns macroscopic aspects of graph structure egree distribution onnectivity Path lengths iameter Edge density Local patterns microscopic aspects of graph structure egree lustering coefficient entrality a single node in context (position) of graph etweenness loseness Eigenvector Software tools Gephi: interactive visualization and exploration platform for networks https://gephi.github.io/ NetLogo: programmable multi-agent environment for modeling network dynamics https://ccl.northwestern.edu/netlogo/ 5 54