Graph theoretic concepts Devika Subramanian Comp 140 Fall 2008
The small world phenomenon The phenomenon is surprising because Size of graph is very large (> 6 billion for the planet). Graph is sparse in the sense that each person is connected to at most k other people (k about a 1000). Graph is decentralized; there is no dominant central vertex to which other vertices are directly connected. Graph is highly clustered, in that most friendship circles are strongly overlapping 2
History of research Karinthy (1929) Hungarian novelist: Chains (5 degrees of separation) Solomonoff and Rapoport (1951) Theoretical biology: random graphs, phase transition in connectedness Erdos and Renyi (1960) Pure mathematics: founders of random graph theory (giant components) Milgram and Travers (1967): Sociology and Psychology: acquaintance network, six degrees of separaration Leskovec and Horvitz (2008) 3
Models for small world? Erdos-Renyi model n nodes, each node has a probability p of being connected. k = average degree 4
Erdos-Renyi model Average degree k < 1 in ER(n,p) graph Small, isolated clusters Small diameters Average degree k = 1 in ER(n,p) graph A giant component appears Diameter peaks Average degree k > 1 in ER(n,p) graph Almost all nodes connected Diameter shrinks 5
Erdos-Renyi model 6
Giant component property In many real-world networks, we see Small diameter Few connected components: often just one giant component that emerges at a threshold probability Tipping points of Malcolm Gladwell Degree distribution follows a power law 7
Power law Power law: y = f(x) = x^{-a} 8
Degree distributions of real-world networks 9
Barabasi-Albert model Graph not static, but grows with time. Preferential attachment: The probability that a new vertex will be connected to vertex i depends proportionally on its degree ki over the sum of all degrees in the graph 10
BA graph generation Start with a small fully connected graph Add vertex one by one, attaching m edges from new vertex to other vertices probabilistically in proportion to number of edges that vertex already has 11
Properties of BA model Small diameters Threshold phenomena Degree distribution follows power law Explains formation of many graphs in the real world: WWW, collaboration networks, power networks, protein networks, citation networks, etc. networkx has a barabasi_albert() function to generate such graphs. 12
Graph representations Devika Subramanian Comp 140 Fall 2008
Adjacency matrix representation For a graph with n vertices, represent edges by n x n array If there is an edge between vertex i and vertex j, position (i,j) in array is a 1, otherwise it is a 0. Can extend this representation to weighted graphs by replacing 1s and 0s by other numbers. 14
Adjacency list representation For each vertex in a graph, associate a list of adjacent vertices. For weighted graphs, associate a list of tuples (vertex,weight) representing adjacent vertices and their edge weights/costs. 15
Graph representations 0 0 1 2 3 4 1 0 0 1 0 1 0 1 1 0 1 1 0 2 2 0 1 0 1 1 4 3 3 1 1 1 0 0 4 0 0 1 0 0 16
Graph representations 0 0 [1,3] 1 2 1 2 3 4 [0,2,3] [1,3,4] [0,1,2] [2] 4 3 17
Weighted graph representation 210 0 0 1 2 3 4 1 0 0 210 0 440 0 203 314 440 1 210 0 203 314 0 2 2 0 203 0 260 270 270 4 260 3 3 440 314 260 0 0 4 0 0 270 0 0 18
Weighted graph representations 210 0 0 [(1,210),(3,440)] 1 203 314 440 2 1 2 3 [(0,210),(2,203),(3.314)] [(1,203),(3,260),(4,270)] [(0,440),(1,314),(2,260)] 270 4 260 3 4 [(2,270)] 19
networkx graph representation Graphs packaged as objects An object is some data together with a set of methods for accessing and manipulating the data. Noun-oriented programming (Guzdial), ask, don t touch philosophy(kay) An abstraction that hides implementation details and exposes a clean interface to you. 20
networkx Interface import networkx as nx G = nx.graph() G is an instance of a Graph for i in range(10): G.add_edge(i,i+1) nx.diameter(g) nx.connected_component_subgraphs(g) G = nx.binomial_graph(100,0.05) 21
Python classes A class is a blueprint for an object Defines how to create an object Defines the interface to interact with the object class instance 22
networkx graph class https://networkx.lanl.gov/reference/ networkx/ Constructor Class Graph(object): def init (self,data=none,name= ): self.adj={} if data is not None: convert.from_whatever(data.create_using=self) self.name=name def nodes(self): return self.adj.keys() Accessor self refers to the object itself 23
Graph class Defines variables adj and name which are local to the graph object Instead of passing adjacency lists, node lists, we encapsulate the data in an object and pass the object; much cleaner! Can change underlying representation of graph object, without having package users change their code. 24
networkx graph constructor def init (self,data=none,name= ): self.adj = {} if data is not None: convert.from_whatever(data.create_using=self) self.name = name https://networkx.lanl.gov/reference/networkx/ 25
networkx graph representation def add_node(self,n): if n not in self.adj: self.adj[n] = {} def nodes(self): return self.adj.keys() def neighbors(self,n): return self.adj[n].keys() def add_edge(self,u,v=none): if v is None: (u,v) = u if u not in self.adj: self.adj[u] = {} if v not in self.adj: self.adj[v] = {} if u == v: return self.adj[u][v] = None self.adj[v][u] = None 26
Dictionary of dictionaries 0 0 1 3 None None 1 2 1 2 3 0 2 3 1 None None None None 4 3 None 4 3 2 4 None None 0 1 None None 2 None 27
Special graphs: digraphs https://networkx.lanl.gov/reference/ networkx/ Inheritance Basic functions are inherited New methods specific to digraphs are added Some functions are over-ridden. Advantage: code reuse 28
Public and private data G = nx.graph() G.adj can be set to anything we like Convention: anything with two leading underscores is private. Encapsulation or data hiding, so people access data via functions, rather than directly manipulate the internal structures. G.add_node() G.add_edge() G.nodes() G.edges() 29
Advantages of encapsulation By defining a specific interface you can keep other modules from doing anything incorrect to your data By limiting the functions you are going to support, you leave yourself free to change the internal data without messing up your users Makes code more modular, since you can change large parts of your classes without affecting other parts of the program, so long as they only use your public functions 30