Empirical Study of Graph Properties with Particular Interest Towards Random Graphs

Size: px

Start display at page:

Download "Empirical Study of Graph Properties with Particular Interest Towards Random Graphs"

Lindsay Walsh
5 years ago
Views:

1 Empirical Study of Graph Properties with Particular Interest Towards Random Graphs Lee Weinstein May 13, 2005 Abstract This paper is an empirical study mainly of graph properties for various graphs including both deterministic graphs, those with a set structure, and random graphs. The main properties that are analyzed are graph diameter, radius, the eccentricity distribution, and the degree of the graph. The goal is to see if random graphs models are ideal for networks which seek to minimize network traffic and still keep the distance between nodes small. 1

2 Contents 1 Introduction 4 2 Graph Theory Definitions and Terminology 4 3 All Pairs Shortest Path Algorithms Floyd-Warshall all-pairs shortest path Faster - All Pairs Shortest Distance (APD) Diameter and Radius Deterministic Graphs Tests Histogram of Eccentricity Histogram of Shortest Path Lengths Graphs and Results Complete Graph Line Graphs Grid Graphs - 2D and 3D Complete K-ary Trees Free Cayley Graphs Comparison Random Graphs Tests D Histogram of Eccentricity vs Shortest Path Length D Plot of Shortest Path Lengths D Histogram of Degree vs Eccentricity Graphs and Results Traditional Random Graphs Scale Free Graphs Comparison Conclusion 34 7 Appendix APD Graph Generation Random Graph Code BA Algorithm Code LCD Algorithm Code Line Graph Free Cayley Graphs Functions to Get Data

3 Acknowledgements Before I begin, I would like to thank a few people. Professor Steven Lindell, this paper is the result of a long and difficult journey but we made it in the end. I want to especially thank you for all your help during Passover, without those extra meetings this paper would not exist. Professor Robert Manning, thank you for helping me with both Mathematica and Tex. Congratulations on receiving tenure, you deserve it. Without both Professor Manning and Professor Lindell this paper would not be possible. I would also like to thank all of my professors and teachers that I have had throughout my life. Without them I would not be the person I am today. Finally, I thank my entire family for supporting me throughout my life. 3

4 1 Introduction Imagine a world where in order to call a friend on the telephone you first had to call India, then Israel, then South Africa and then finally call your friend located 10 miles away. This process would become annoying after only a few phone calls to your friend. A telephone network is similar to a complete network: all phones can call all other phones by dialing the correct phone number. Fortunately, phone companies route phone calls but imagine if in order to call anyone there needed to be a direct connection to each phone. This would require ( n 2) wires where n is the number of phones. This is a lot of wires and very inefficient though the time to actually call a person is very fast. Fortunately, the telephone company has simplified this process so they do not need wires from each phone but only from a phone to a local call box which then is connected to a local call center. The local call center is connected to many call boxes and then is connected to a regional call center. Then many regional call centers are connected. Basically, the telephone companies divided the network so that phones only have one wire going out. Then they have various stages which have many wires connecting the remaining network. When designing this network, the phone company was interested in many different properties. They knew that they only wanted each phone to have one wire but how did they know how many stages to divide the network? Why didn t they have each phone connect to the main call center and phone calls would be routed from there? This answer seems very efficient. It has very minimal wires but there is a lot of traffic at the main call center. Therefore, the phone companies needed to minimize the traffic but still keep the number of wires to a minimum; a phone call goes through a low number of wires in order for the person making a call to feel like the connection is instantaneous. This is one real life example of the complexities of real world networks. This paper will empirically study graphs of various structures. The focus of the study will be to better understand the problem of organizing a network in order to minimize both traffic at any one spot in the network and the keep the length between any two nodes in the network small. 2 Graph Theory Definitions and Terminology Before the results of the study can be examined it is important to have a good background in graph theory terminology and definitions. It is important to understand that graph theory is studied in many disciplines; because of this each field has its own terminology. Therefore in this paper it is important to know that node(s) and vertex(vertices) are the same object and interchangeable. Additionally, edge(s) and link(s) are the same object and are interchangeable. Definition 2.1 A graph G = (V, E) is comprised of a vertex set V and an edge set E of ordered pairs or vertices that describe which nodes are connected. 4

5 The following is an example of a graph. In this example, the vertex set consists of seven different vertices and the edge set contains eight different edges. Mathematically, this graph can be written in the following notation: (V, E) = ({a, b, c, d, e, f, g}, {(a, b), (a, f), (a, e), (b, c), (b, g), (b, f), (d, g), (e, g)}) In a graph each node represents some object in the network. An edge (i, j) represents a connection between node i and node j in the network. It is important to understand that an edge between two nodes in a graph is a physical connection (when pictured) but in real networks that the graphs represent, the connection can be physical or abstract. An example of a network that has physical connections between nodes is an airline routing network where the nodes are the airports and the edges are the flights between two airports. On the other hand, a network with abstract connections is a social network where the nodes are individuals and the edges are their friendships, which is an abstract concept. Additionally, graphs can be either directed or undirected. This means that the edges of a graph have a direction. Both the airline routing network and social networks are undirected networks. Normally planes fly from one airport to another and then back again. In the case of a social network, friendship is reciprocal. An example of a directed network is the network formed by web pages. The nodes are the web pages and the edges are the links on the page. On my personal home page I might have a link to ESPN.com but ESPN probably does not have a link to my home page. So from my home page an Internet surfer can reach ESPN.com but from ESPN.com he can not return to my home page through a link. Definition 2.2 We say a graph is directed if we consider (i, j) (j, i). We say a graph is undirected if we consider (i, j) = (j, i) Definition 2.3 A Path exists between node i and node j if there exist a way to traverse the edges from i and reach j. In a path the edges can be repeated but it is not necessary. If there is a path between each edge then the graph is connected. The distance between node i and node j is the length of the shortest path between the two 5

6 nodes. In this paper, we will restrict ourselves to connected networks where the length of each edge is 1. Therefore, the shortest path between 2 nodes is the minimum number of edges which must be followed in order to reach one another. The following four properties of graphs follow directly from the knowledge of the shortest distance between all nodes. Definition 2.4 The eccentricity of a node i is the maximum distance between node i and all other nodes in the graph. Definition 2.5 The diameter of a graph is the maximum eccentricity. Definition 2.6 The radius of a graph is the minimum eccentricity. Definition 2.7 The center of a graph is the set of all nodes whose eccentricity equal to the radius of the graph. If a graph is disconnected the eccentricity is defined to be infinite. Therefore both the diameter and radius will also be infinite. For this reason, this paper will focus only on connected graphs. In addition, all graphs will be undirected. The reason for this is that it is possible that a graph appears connected but because of the direction of the edges the graph is in fact disconnected. For example, the undirected graph graph above is a connected graph because there is a path between all nodes but the directed version of the graph below is disconnected. On inspection of this graph it is easy to see that all edges from node a are directed out of the node. Therefore there is no path from nodes b, c, d, e, f or g to node a. Therefore the diameter and radius for this graph is infinity. The diameter and radius of the undirected version of this graph can be found with the use of the following chart. 6

7 This chart shows the distance between each node. Since the eccentricity is the maximum distance for each node to all other nodes, this statistic is easily extracted from the chart and is located to the right of the distance chart. With the list of eccentricities, the diameter and radius are found to be the maximum and minimum element respectively. Therefore the diameter is 3 and the radius is 2. Theorem 2.8 The diameter of a graph d is bounded by the following inequality: where r is the radius of the graph. r d 2r Proof: Case 1: The upper bound on d is 2r: Suppose that the radius of a graph is r. This implies there is a node c in the center of the graph whose eccentricity equals the radius. Given any 2 nodes i and j, the distance between i and c, a node in the center set, is less than or equal to r. Additionally, the distance between j and c is also less than or equal to r. Therefore, by the triangle inequality, the distance between i and j is less than or equal to 2r. The following pictures shows this. Case 2: The lower bound on d is r: The radius r is the minimum eccentricity in the graph by the definition of radius. The diameter d is the maximum eccentricity in the graph by the definition of the diameter. Clearly, the minimum eccentricity is less than or equal to the 7

8 maximum eccentricity. QED Definition 2.9 The degree of a node in a directed graph is the sum of its indegree and out-degree, where the in-degree of node i is the number of edges (k,i) for some k, or inward edges, and the out-degree of node i is the number of edges (i,k) for some k, or outward edges. In undirected graphs, the degree of a node is either its in-degree or out-degree (which are always equal). 3 All Pairs Shortest Path Algorithms In order to compute the radius, diameter, and center of a graph it is essential to know the shortest distances from every node to every other node. This is a moderately difficult problem in graph theory and computer scientist s are still searching for the most efficient method. 3.1 Floyd-Warshall all-pairs shortest path Currently, the most popular algorithm to find the shortest distance between all nodes is the Floyd-Warshall All-Pairs Shortest Path algorithm. Here is the pseudo code for this algorithm: FLOYD-WARSHALL(W ) : W is a 0-1 matrix 1 n rows[w ] 2 D (0) W 3 For k 1 to n 4 For i 1 to n 5 For j 1 to n 6 d (k) ij ( MIN d (k 1) ij, d (k 1) ik ) + d (k 1) kj 7 Return D (n) The input W is an n n matrix representing the length of the edge from i, j which is defined as the adjacency matrix. Here is the matrix in mathematical terms. 0, i = j w ij = 1, (i, j) E where E is the edge set of the graph (1), (i, j) / E The output matrix D is an n n matrix representing the length of the shortest path from i, j. Here is the matrix in mathematical terms. { 0, i = j d ij = (2) Length(i, j), otherwise 8

9 It is important to remember that the graphs are connected graphs so all paths from i to j exist when i j. With the adjacency matrix, all edges in the graph are known. The Floyd- Warshall algorithm makes use of this by scanning through the adjacency matrix for paths between each node. This is the two inner nested loops on line 4 and 5. The outer loop on line 3 is the interesting one. The idea of this algorithm is to find the shortest path between node i and j which does not go through any node higher than k. If the current path between node i and j (the element at d (k 1) ij ) is longer than the current path between node i and k (the element at d (k 1) ik ) and the path from node k to j (the element at d (k 1) kj ) then replace the length of the path between i and j with Length(i, k)+ Length(k, j). If Length(i, k)+ Length(k, j) is greater than the length(i, j) then keep the current value. This process run through all nodes will result in the all pairs shortest distance solution. The following picture explains this process very well. This algorithm is short and easy to understand but has three nested For loops of O(1) operations which makes the complexity is O(n 3 ). 3.2 Faster - All Pairs Shortest Distance (APD) The Floyd-Warshall algorithm is somewhat slow running (cubic time) and is the bottle neck to applications that require the all pairs shortest distance solution of a network. It was the most costly algorithm in this study which is the reason that most of the graphs in this paper are not very large. The All Pairs Distance algorithm proposed by Raidmund Seidel [23] is not very well known in the literature but provides an alternative to the Floyd-Warshall algorithm with a smaller complexity due to the use of matrix multiplication. Before the algorithm can be presented, it is necessary to first understand how matrix multiplication can be used to count paths of lengths. Theorem 3.1 Let A be an n n adjacency matrix for a graph G. A 0 [i, j] is the number of paths between i and j of length 0. A 1 [i, j] is the number of paths between i and j of length 1. A 2 [i, j] is the number of paths between i and j of length 2. This theorem is now proved for values of k = 0, 1, 2 because that is all that is needed to understand the All Pairs Distance algorithm of Seidel. 9

10 Proof: Case 1: A 0 Let A 0 be the identity matrix. This is clearly the number of paths of length 0 in the graph because there are zeros in every entry except if i = j where there is a 1. There is always a path from a node to itself of length 0. Case 2: A 1 Let A 1 be the adjacency matrix for the graph. There is a path of length 1 just in case there is an edge from node i to j, which is the adjacency matrix. If i = j there is no path of length 1 since there are no self loops in these graphs so following one edge will never take a node to the same node. Case 3: A 2 A 2 = A 1 A 1 A 2 [i, j] = A[i, m] A[m, j] where m = 1...n The A[i, m] A[m, j] for all m counts the paths between i and m and between m and j. The following chart and picture shows how this process is the same as a logical AND. (i, m) (m, i) (j, m) (m, j) ((i, m) (m, i)) ((j, m) (m, j)) If there is an edge between i and m AND an edge between m and j, then there is a path of length two between i and j. Clearly adding all these up gives the total number of paths. QED Now that it is shown how matrix multiplication counts paths of length 0, 1 and 2 here is the All Pairs Distances pseudo code. 10

11 APD(A) : A is a 0-1 matrix 1 Z = A A{ 2 B = b ij = 1, iff i j and (a ij = 1 or z ij > 0) 0, otherwise 3 If b ij = 1 for all i j then Return D = 2B A else: 4 T =APD(B) 5 X = T A { 6 Return D = d ij = 2t ij, if x ij t ij degree(j) 2t ij 1, if x ij < t ij degree(j) The best way to prove this algorithm is to explain how each line of the algorithm leads to the correct answer for the all pairs shortest distance problem. The input to APD is the adjacency matrix of the graph just like the Floyd-Warshall algorithm. Additionally, the adjacency matrix must represent a connected graph or this algorithm will not produce the correct answer. The following is a line by line analyses of the code: 1. Z = A A Z represents a matrix of the number of paths of length 2. This is not necessarily the shortest path but just a path. { 1, iff i j and (a ij = 1 or z ij > 0) 2. B = b ij = 0, otherwise B represents a 0-1 matrix in which a one in entry b ij represents a noncyclic path between i and j of length 1 or 2. B can be thought of as an adjacency matrix for A that has edges of either length 1 or If b ij = 1 for all i j then Return D = 2B A This line is testing if the graph has a diameter of at most 2, i.e the distance between any two nodes is 2. In all three cases, we will show that D always has the correct length. Case 1: If the distance between i and j is 0 then i = j. This implies that on the diagonal of A and B there are zeros. Therefore, d ij = 2b ij a ij = 0 0 = 0. Case 2: If the shortest distance between node i and j is 1, then there is an edge between these two nodes. Therefore, there is a 1 in A at a ij because A is the adjacency matrix for the graph. There is also a 1 in B at b ij by the previous analysis of B. Therefore d ij = 2b ij a ij = 2 1 = 1. Case 3: If the shortest distance between node i and j is 2, then there is no edge between these two nodes. Therefore, there is a 0 in A at a ij but there is a 1 in B at b ij by the previous analysis of B. Therefore d ij = 2b ij a ij = 11

12 2 0 = 2. By these three cases, this algorithm is correction for graphs with diameter T =APD(B) If the graph does not have a diameter 2 then call APD recursively with the new adjacency matrix B for edges of length 2. After this recursive call, the matrix T represents path lengths in B. T can be represented pictorially with 2 cases depending on whether or not the path lengths in A are even or odd. In other words, T represents two times the path lengths of A minus 1 or two times the path lengths of A, as shown in the two pictures respectively. 5. X = T A The matrix X represents sums of distances in B. x ij is the the sum of the distances between node i and every neighbor of j. This is a result of the multiplication of a matrix of path lengths in B and the adjacency matrix of A. The following pictures explains how paths of even/odd length contribute to X, where x ij is the sum of distance of these paths for all k. 12

13 6. Return D = d ij = { 2t ij, 2t ij 1, if x ij t ij degree(j) if x ij < t ij degree(j) D represents the matrix of the shortest paths. By the picture above, the distances of a path between i and j in A is clearly either 2t ij or 2t ij 1 where the solid edges represent the path in B, the element at t ij. All that remains to be shown is that this condition, x ij t ij degree(j), decides between the two cases, even and odd. This condition can be simplified by dividing both sides by degree(j). Since x ij is the sum of all distances between i and k (a neighbor of j), dividing by degree(j) yields the average shortest path length in B between i and a neighbor of j. Therefore, we need to show how this average distance will determine the actual distance between i and j in A. The following picture helps explain this idea. Please note that this picture is a conceptional one and the degree of j can take on any value from 1 to n. 13

14 Break this into two cases where the shortest path between i and j is either even or odd. Case 1: Even shortest path Suppose that the shortest path in A between i and j is even. Let k be the neighbor of j along a shortest path. The picture shows that t ij = t ik with other paths between i and j having the following two possible values: { t ij, whenever k lies along a minimal path t ik = (3) t ij + 1, otherwise The first case is true because the path that k lies along can only be as short as the shortest path. The second case is true because the largest the distance t ik can be is t ij + 1 since there is a shortest path between i and j equal to t ij and k is a neighbor of j. Therefore a shortest path between i and k can not be greater than t ij + 1. Therefore x, the average distance in B between i and the neighbors of j, is the sum of at least one path of t ik = t ij and other paths of t ij + 1, all divided by degree(j). Clearly x t ij (the actual distance in B between i and j). The picture shows that this case corresponds to a distance in A between i and j of 2 times the distance in B between i and j or 2t ij. Case 2: odd shortest path Suppose that the shortest path in A between i and j is odd. Let k be the neighbor of j along a shortest path. The picture shows that the shortest path is t ij = t ik +1 with other paths between i and j having the following 14

15 two possible values: { t ij 1, t ik = t ij, whenever k lies along a minimal path otherwise (4) The first case is true because the path that k lies along can only be as short as the shortest path. The second case is true due to the condensing of two edges in A into one edge in B. Therefore, if the path is not the shortest then there will be at most one more edge along the path in A but this edge plus the previous one is condensed into one edge in B. This results in a path length of t ij. Therefore x, the average distance in B between i and neighbors of j is the sum of at least one path of t ik = t ij 1 and other paths of t ik = t ij divided by degree(j). Clearly x < t ij (the actual distance in B between i and j. The picture shows that this case corresponds to a distance in A between i and j of 2 times the distance in B minus 1 between i and j or 2t ij 1. As mentioned earlier, this faster algorithm to solve the all pairs distance problem has a better complexity than the Floyd-Warshall version of O(n 3 ). Seidel, says that the complexity of this algorithm is O(M(n) log n) where M(n) is the complexity of matrix multiplication. It is natural that the complexity of APD scales with the complexity of matrix multiplication since it is the most costly part of the algorithm. The complexity of matrix multiplication on two n n matrices is O(n ) based on the results of Coppersmith and Winograd. Therefore, the complexity of APD is O(n log n). It is possible that the complexity of this algorithm will decrease in the future if a faster algorithm for matrix multiplication is discovered. 3.3 Diameter and Radius The calculation of a graph s diameter and radius is trivial once the all pairs shortest distance problem is solved. The algorithms for each of these properties follow directly from their definitions. Let the function Ecc(j) = Max(i, j) for all i yield the eccentricity for a node. 1. Diameter = MAX(Ecc(j)) for all j 2. Radius = Min(Ecc(j)) for all j 4 Deterministic Graphs We define a deterministic graph to be a graph which has a set structure. In other words the vertex set and edge set are known values once it is known what the type of the graph is and the number of nodes in the graph. 15

16 4.1 Tests From running the faster All Pairs Shortest Distance algorithm APD with minor adjustments, I was able to find many graph properties. They are the following: diameter, radius, center set, degree set, eccentricity set, and the all the shortest path lengths. This raw data is analysed through the following charts: Histogram of Eccentricity A histogram of the nodes eccentricities is helpful because we know that the minimum eccentricity is the radius and the maximum is the diameter. It is ideal for networks to have many short paths so that it is not costly for nodes to communicate with one another. Therefore an ideal histogram of eccentricity would have a small domain. This means that the difference between the diameter and radius would be small and less than the R upper bound on the difference. Additionally, ideal networks will have most nodes eccentricity close to the radius. This is analogous to the curve formed by the histogram to be decreasing or negatively sloping. If the function is positively sloping this implies that most nodes have an eccentricity close to the diameter Histogram of Shortest Path Lengths An ideal network s histogram of path lengths would look similar to an ideal histogram of eccentricity. The shortest possible path length is 1 and the longest shortest path length is the diameter. If this difference is small then many nodes will have short shortest path lengths. It is also ideal for the curve formed by the histogram to be negatively sloping because this implies that most path lengths are short. If a majority of the path lengths are close in value to the diameter the network would be costly to navigate through. 4.2 Graphs and Results In the following five sections, the following graphs will be analysed: complete graphs, line graphs, two dimensional grid graphs, three dimensional grid graphs, k-ary trees, and free cayley trees. Due to the deterministic nature of these graphs, the properties of these graphs can be derived without the tests but it is important to see the results of the tests in order to help understand the results of more complex graphs Complete Graph A complete graph is a graph where there is an edge from every vertex to every other vertex. Therefore, a complete graph with n nodes has ( n 2) edges and each node has a degree of n 1. The following graph is an example of a complete graph with 25 nodes and 300 edges. 16

As mentioned in the introduction, a complete graph has nodes with direct connections to each other. Therefore all of the shortest paths are of length 1 which gives each vertex an eccentricity of 1.

17 As mentioned in the introduction, a complete graph has nodes with direct connections to each other. Therefore all of the shortest paths are of length 1 which gives each vertex an eccentricity of 1. It follows that the diameter and radius are also 1 and that all nodes are in the center of the graph. Therefore, a complete graph will produce uninteresting results from the tests. The histogram of the eccentricity will have n occurrences of eccentricity 1. The histogram of the path lengths will have ( n 2) occurrences of a path length 1. As shown below, the results from the tests confirm this. Complete networks have the best possible shortest path lengths since the diameter is 1 but have a large average degree which is costly to implement because of all the required edges Line Graphs A line graph is a graph in which the nodes are connected in a line. A line graph with n nodes has n 1 edges and all nodes except the two endpoints have degree 17

2 while the endpoints have degree 1. The following is a picture of a line graph with 10 nodes. Unlike a complete graph where all the shortest path lengths are 1, here they can be very long.

18 2 while the endpoints have degree 1. The following is a picture of a line graph with 10 nodes. Unlike a complete graph where all the shortest path lengths are 1, here they can be very long. In fact, the diameter is n 1 which is the path from either endpoints. The radius and the center can take on one of two values depending on the size of the network. { n 1 Radius = 2, n is even n 1 2, n is odd (5) Center = { { n 2, n 2 + 1}, n is even { n 2 }, n is odd (6) As a node is farther away from the center node, the eccentricity of the node increases. Since nodes move away from the center either to the left or to the right, there will be 2 nodes with each eccentricity between the the radius r and the diameter d. The histogram of path lengths should be a decreasing function because of the line structure of the edges, all nodes have low path lengths but as the path lengths increase only nodes farther away from the center node will have longer path lengths. As shown below, the results from the tests on a line graph of 50 nodes with d = 49 and r = 25 confirm this Grid Graphs - 2D and 3D A grid graph is a graph where the nodes are arrange on a n n grid or an n n n grid. In a two dimensional grid graph, there are n 2 nodes and 2(n 2 n) edges. The node s degree can be one of three values depending on the location of the node in the grid. If the node is one of the four nodes on the corner of the grid 18

it has a degree of 2. If the node is one of the 4(n 2) nodes on the border of the grid the degree is 3. Finally, if the node is one of the n 2 4(n 1) internal nodes it has a degree of 4.

19 it has a degree of 2. If the node is one of the 4(n 2) nodes on the border of the grid the degree is 3. Finally, if the node is one of the n 2 4(n 1) internal nodes it has a degree of 4. The diameter of a grid graph is 2(n 1) because the longest shortest path would be across the entire row which is n 1 edges and then up an entire column which is another n 1 edges. The radius is a little more complicated because it depends if n is odd or even because an odd n means that there will only be one center node while n even implies that there will be 4 center nodes. One shortest path with radius r = n for n even is a path n/2 across a row to the border and then n/2 up a column to a corner node. If n is odd then one path of the radius is n/2 edges across a row and n/2 edges up a column to a corner. This produces a radius of n 1. Three dimensional grid graphs are very similar. They have n 3 nodes and 2n(n 2 n)+n 2 (n 1) edges. The formula for the number of edges comes direction from the number of edges in a two dimensional grid graph. There are 2(n 2 n) edges in each n n grid and then there are n of these grids. The n 2 (n 1) terms is a result that it requires n(n 1) edges to connect one level of grids and then there are n levels to the grid. The nodes degrees in a 3 dimensional grid graph can be one of four values. The 8 corner nodes have a degree of 3, the 12(n 2) border nodes have a degree of 4. The internal border nodes have a degree of 5 while the internal nodes have a degree of 6. The diameter of the three dimensional grid graph is similar to the two dimensional version because d = 3(n 1). The radius is also similar, with r = n/2 + n/2 + n/2. Notice that the diameter and radius both decrease with a move from two dimensions to three dimensions (when comparing total nodes) because of the ability to fit more nodes per two dimensional grid in the three dimensional case. Here are two examples of a grid graph and a grid graph respectively. Due to the lattice nature of this graph, the eccentricity increases as the 19

20 nodes are farther from the center. Nodes near the center have a low eccentricity and there are not that many with eccentricity equal to the radius but as you go out (either linearly or radially) from the center more and more nodes have an increase in their eccentricities. The increase in eccentricities continues as a node is farther away from the center but as the nodes are closer to the border of the grid, there are less nodes with a high eccentricity. This causes some sort of bell shaped histogram of eccentricities. The path length histogram is slightly more difficult to imagine. Given that the histogram of eccentricities has a bell shape to it, the path length histogram will also have some sort of bell shape to it. There will many many path lengths slightly lower than the radius because the radius is the minimum of the longest shortest paths for each node. Therefore every node has paths which are at least as long as the radius. Also not many nodes will have path lengths as long as the diameter. Therefore the curve will decrease rapidly after the radius. As shown below the results for a two dimensional grid graph with d = 38 and r = 20 confirm this argument. The results for a three dimensional 8x8x8 grid graph with d = 21 and r = 12 also agree with the above argument. 20

21 4.2.4 Complete K-ary Trees A complete K-ary tree is a tree where there is a starting node called the root with k edges to k unique nodes called internal nodes which in turn have k edges to k unique nodes. This process continues until a depth where instead of the nodes having k edges they each have just 1 edge to 1 new node called a child. The number of times the process repeats with 1 node having k edges is determined based on the depth of the tree. A tree with just the root has a depth of 0. A K-ary tree has n = kdp+1 1 k 1 nodes, where dp is the depth, and n 1 edges. The root node has a degree of k while the k d p leaf nodes have degree 1. The k dp+1 1 k 1 (k d p + 1) remaining internal nodes have a degree of k + 1. The following is an example of a complete binary (2-ary) tree with dp = 5 and a complete 3-ary tree with dp = 3. In K-ary complete trees, the diameter is simply two times the depth because the longest shortest path in the graph will be between leaves in the network but not just any leaves. The longest shortest path will be between leaves that a shortest path must go through the root. Therefore, the radius is the depth because the root will have the smallest of all longest shortest paths. The eccentricity of a node in a complete K-ary tree is based on the depth of the node in the graph. Therefore, the eccentricity histogram will be an increasing function. Not only will it be an increasing function but a fast increasing function because the number of nodes at a given depth with eccentricity equal to the depth is k d p. The path length histogram is also an increasing function but not as fast because all nodes have short path lengths but only those nodes closer to the depth of the tree will have longer paths. This graph will even decrease slightly before the diameter because only the leaf nodes have a path equal to the diameter but all of the leaf nodes plus their parent nodes have path lengths of d 1. As shown below the results for a complete binary tree of depth 8, d = 16, and r = 8 confirm this argument. 21

22 The results for a 3-ary complete tree of depth 5, d = 10, and r = 5 also agree with the above argument Free Cayley Graphs Free Cayley graphs are a special type of tree structure. Like K-ary trees, there are n 1 nodes with one root node and a number of leaf nodes. The difference here is that in order to build a free Cayley graph it is necessary to know the degree of the root node (it turns out that the degree of the root node is also the depth of the tree). The root node has degree k and is connected to k new nodes with k edges. Then each of these k new nodes have degree k but they are each connected to k 1 new nodes. Each of these new nodes has a degree of k 1 and is connected to k 2 new nodes. This process continues until the new nodes are suppose to connect to 0 = k k new nodes. Therefore, there are n = 1 + k + k(k 1) + k(k 1)(k 2) k(k 1)(k 2)(k (k 1)) nodes. Because Free Cayley graphs are also trees, the diameter and radius are the same as they are for trees. Therefore, d = 2dp and r = dp. The following are two examples of Free Cayley graphs with k = dp = 3 and k = dp = 4 respectively. 22

23 Since Free Cayley graphs are trees, the basic shape of the histogram of eccentricities and path lengths will be the same but because the way these graphs decrease the degree of each node at a given depth as the depth increases there are less leaf nodes with eccentricity equal to the diameter than there would be in a K-ary tree with the same starting k. Therefore the histogram of eccentricities will increase less fast. The same idea applies to the histogram of path lengths and the occurrences of path lengths equal to the diameter will be even less so is a sharper drop in the curve near the diameter. As shown below the results for a Free Cayley graph with k = dp = 5, d = 10, and r = 5 confirm this argument. 4.3 Comparison The following chart is a collection of raw data for the previous graph structures with varying number of nodes. 23

24 By just looking at the diameter and radius data for these graphs it seems that tree graphs (either K-ary or Free Cayley graphs) have the lowest diameter and radius for the same number of nodes in the graph but this is misleading. As was shown above, it is actually the three dimensional grid graphs which 24

25 perform well. Though their diameter and radius is larger, most of the nodes are internal nodes in the grid and their eccentricity is closer to the radius than the diameter. The opposite happens in a tree structure because there are so many leaf nodes which have eccentricity equal to the diameter. The problem with three dimensional grid graphs is that there are many edges in the network but there is not a lot of traffic since there are many different paths between nodes i and j due to the grid structure. This is the opposite in tree graphs because there is only one shortest path between nodes i and j so there will be high traffic especially through the root node which is the only bridge between half of the nodes to the other half. The main conclusion from these results is that it is best to know the application and then choose which of these graphs is best. For example, if a network could be organized into partitions where the nodes do not communicate with each other if they are not in the same partition, a tree would be ideal because partitions could be arranged well as children from a given node. It might even be possible that the best of both world would be a tree of grids with a connection between grids if they are children or a parent. The partitions could then be the grids. 5 Random Graphs Unlike deterministic graphs, random graphs do not have a set structure. Instead, random graphs have a nondeterministic algorithm which describes how to build such a network. When given a network the method to determine if it is indeed random is to find the function of the degree distribution. Due to the randomness of these graphs, there are more long range edges in the network which help to reduce the shortest path lengths. According to this thought, random graphs should have a better diameter and radius than deterministic graphs for graphs with a similar number of nodes. 5.1 Tests In order to test the notion that random graphs accomplish the goal of minimizing the number of edges and traffic in a network and still have a low diameter and radius we will analyse two types of random graphs. When analyzing these graphs, we will use the two previous histograms that were used to analyze deterministic graphs and three new tests D Histogram of Eccentricity vs Shortest Path Length This histogram will show the distribution of path lengths for nodes with a given eccentricity. It is conceivable that a network will have some nodes with a high eccentricity which will contribute to a large diameter but there is only one shortest path for each node which is long and all other shortest paths are much less than the eccentricity. This histogram will show if nodes with a high eccentricity have many short paths or not. Three dimensional histograms are 25

26 costly in Mathematica and therefore, only networks with the number of nodes squared about 1000 will produce this type of three dimensional histogram D Plot of Shortest Path Lengths This is an experimental plot. The x and y axes are the nodes of the network. So the z axis represents the shortest path length between nodes x and y. This plot is similar to two dimensional histogram of path lengths but here it specifies which nodes produce a given path length. Therefore, the x and y axes must be sorted. Here they are sorted by eccentricity with node 1 representing the node with the lowest eccentricity or the radius. Additionally, each path length is graphed once since (i, j) = (j, i) because the graphs tested are undirected. Ideally, a trend should exist based on the eccentricity and path length with nodes with a low eccentricity also having short path lengths and nodes with high eccentricity having long path lengths D Histogram of Degree vs Eccentricity Here, it would be ideal if nodes with a large degree have a small eccentricity. The thought is that if a node is well connected in the network, then the node should have a small maximum shortest path in the network. Then nodes with low degree can also have a small eccentricity if the node is close to the node with high degree. Therefore, hopefully graphs with lots of high degree nodes will also have a low diameter. Three dimensional histograms are costly for Mathematica to implement so only graphs with the number of nodes roughly around 1000 will produce this type three dimensional histogram. 5.2 Graphs and Results As mentioned previously, we will analyse two types of random graphs. The first will be traditional random graphs. The second will be scale free graphs, a recent discovery in random graph theory. Before the introduction of scale free networks in 1999 by Albert-László Barabási and Réka Albert scholars believed that real world networks were best described by a traditional random graph. Barabási and his research team did not believe this. They studied many real world networks and discovered a new random graph model, which they called scale free graphs. This model is discussed in the upcoming section on scale free graphs. First we will discuss traditional random graphs Traditional Random Graphs A Traditional Random Graph is a graph where the edges connect randomly. More precisely, a traditional random graph has n fixed vertices where each of the possible ( n 2) edges is present with probability p and each edge is present independently of the others. Therefore, the expected value of the number of edges is p ( n 2). Due to the randomness in the edges, the degree of a node is 26

characterized by the following Poisson Distribution: P (i = d) = (pn)d e pn Where d is the degree of the node. The following is an example of a traditional random graph. d! Since the edges appear randomly, it is difficult to predict what the results of the tests will be for this graph.

27 characterized by the following Poisson Distribution: P (i = d) = (pn)d e pn Where d is the degree of the node. The following is an example of a traditional random graph. d! Since the edges appear randomly, it is difficult to predict what the results of the tests will be for this graph. The literature claims that traditional random graphs have a d ln n ln(pn) [5]. Therefore, the diameter can be expected to fairly small. This makes sense because it is conceivable that because a node has a chance to connect to every node in the network there will be some long range connections and the diameter and radius will be small. This is what we will be looking for in the results. The following is the results for the tests for a graph with 100 nodes and p =.1. The diameter is 4 while the radius is is 3. This is a larger diameter than expected diameter of 2 but the number of nodes is small and the constant term in the formula is not known. The two dimensional histograms of path lengths and eccentricities are not very interesting because the diameter is so small that all path lengths and eccentricities are small. What is interesting is the three dimensional histograms. 27

28 The Histogram of eccentricity and path lengths informs us that in these graphs that most nodes path lengths are less than the diameter. This is useful but since the diameter is so small not as interesting as the next plot. The histogram of degree and eccentricity shows that it is nodes with low degree that have a high eccentricity. It then shows that the nodes with the most common degree also have a low eccentricity which is good. With this data, it is expected that all of the path lengths are pretty short especially when the nodes are sorted on eccentricity. The following chart is of a traditional random graph with n = 40 and P =.1. We could not get this plot for the previous graph due to Mathematica s inability to handle the number of data points. The diameter for this graph is 7 so it is slightly larger but still worth looking at. 28

It can be seen in this picture that most nodes have path lengths less than the diameter but as the eccentricity between two nodes increases, there are more path lengths around the size of the

29 It can be seen in this picture that most nodes have path lengths less than the diameter but as the eccentricity between two nodes increases, there are more path lengths around the size of the diameter. It is also worth noting that for low values of p, these graphs are disconnected. This is evident in the chart at the end of this section Scale Free Graphs Scale free networks were discovered in 1999 by Albert-László Barabási, Réka Albert and their research team as a result of the degree distribution of real world networks not following a Poisson distribution. Therefore, they knew real world networks could not be traditional random networks. Through investigation they found that the degree distribution that real world networks follow is similar to a power law. Their research concluded that there are two basic properties of real world networks that contribute to their power law degree distribution. The first property is that real world networks grow in size as the time increases. The other is that when new nodes enter the network they connect to existing nodes with preferential attachment. The reason that Barabási decided to name these networks scale free is that, due to their power law degree distribution, the network appears to have no scale in terms of the degrees. In other worlds, there is no way to describe the network in terms of the most common degree of a node like a random graph would have (since a Poisson distribution has a maximum). For scale free networks, the probability that a given node i will have a degree of d follows a power law: P (k i = d) = c/d α where k i is the degree of node i, while c and α are positive constants. Scale free networks usually follow the 29

30 following bound: 2 < α < 3. To find α for a given scale free network, the degree distribution is plotted on a log-log plot. From this plot, a best fit line is found and the absolute value of the slope of the best fit line is α. Barabási s model of scale free networks is the most common form and is defined as the following model. Definition 5.1 The Barabási-Albert Model This model begins with m 0 vertices and at each time step t, a new vertex is added to the graph. The new vertex is connected to m m 0 other vertices. The k probability that the new vertex will connect to vertex i is i where k Σ t 1 i is the j=1 kj degree of vertex i. The following is a real life scale free network. The network is a map of the of protein-protein interactions in yeast [6]. A slight variation of this model is used here to build scale free networks. Like traditional random graphs, it is hard to speculate on properties other than the number of nodes (m 0 + t), number of edges (mt), and the power law degree distribution. Since new nodes connect to older more popular nodes it is likely that scale free graphs will have the smallest diameter yet of any of the graphs studied so far. In fact, the literature says that the diameter is log n for m = 1 and asymptotically log n log log n for m 2 [8]. It is necessary when studying random graphs to study different models that result in the same number of nodes and 30

31 roughly the same number of edges. Therefore, here are the results for a scale free network with n = 100, m = 5, and 518 edges. Now the results of this graph can be compared to the results of the traditional random graph. First, here is the results of this graphs. The first plot, the histogram of eccentricity and path lengths has similar results to the transitional random graphs, those nodes with low eccentricity have short path lengths. The second plot yields more interesting results. This graph shows that there are lots of nodes with low degree that have a low eccentricity. The reason for this is that even low degree nodes are connected by an edge with a high degree node. This makes for short distances in the network. This is ideal because the distance for the entire network is kept to a minimum. The three dimensional chart of path lengths confirms this for us. 31

As can be seen, most nodes have a short path between each other. Path lengths the size of the diameter are even less frequent in this network than with traditional random graphs.

32 As can be seen, most nodes have a short path between each other. Path lengths the size of the diameter are even less frequent in this network than with traditional random graphs. The only problem is that it is possible that the high degree nodes have too much traffic. This is probably not a problem though because as the size of the network increases there will be more high degree nodes to share the load. Note that this plot is of a scale free network with n = 40 and m = 2. This is a comparable graph to the graph in the previous section that had the same plot. 5.3 Comparison The following chart is a collection of raw data for the random graph structures with varying number of nodes. 32

33 Through inspection of the chart, one can see that for traditional random graphs and scale free graphs with similar number of nodes and edges the scale free graphs have a small diameter and radius but the traditional random graphs have a diameter and radius equal to infinity. To see this compare random graphs on line 14-15, 16-17, and to scale free graphs on line 31-32, 35-36, and respectively. This is not to say that scale free graphs are always better than traditional random graphs. The reason that the traditional random graphs have a diameter of infinity is because they have such a low value of p in order to have the same number of edges as the scale free graphs. The traditional random graph on line 1 and the scale free graph on line 20 have the same number of 33

34 nodes and similar number of edges and similar diameter and radius. This same property holds for the graph structures on line number 3 and 22. These two traditional random graphs have larger values of p. As long as both networks are connected, it is clear that their diameter and radius is fairly small and the difference between the diameter and radius is small too. Therefore all nodes will be able to communicate with each other quickly like in a complete graph but with a fraction of the number of edges. 6 Conclusion When comparing random graphs and deterministic graphs it is clear that random graphs have a much smaller diameter and radius. When comparing random networks and deterministic networks with about the same number of nodes (and sometimes even the same number of edges) it is obvious that the average eccentricity is much smaller for random graphs. The reason for this is that the diameter of random graphs is either less than or equal to the radius for comparable deterministic graphs. It is important to know that if comparing a traditional random graph, there is a chance of a disconnected graph if p is small enough. In this case, a deterministic graph with the same number of nodes and edges will always be superior since it is connected. Though it was not tested for in this study, it is still possible that even though the random graph is disconnected for small values of p the shortest path lengths that do exist might be very small compared to the deterministic graphs of the same size. Scale free networks seem that they are always superior to deterministic graphs. As long as m 2 and the number of nodes is sufficiently large, the graphs seem to always be connected and have a small diameter and radius. The diameter is so small compared to deterministic graphs that not even the average eccentricity or average path length is important. The study of scale free networks are still emerging but it is clear that these networks have a very low diameter. This fact should make these types of networks very attractive in applications concerned with quick communication. Also as scale free networks grow in size more high degree nodes appear which minimizes traffic by distributing the load on the network. 7 Appendix 7.1 APD APDrule[A_] := Block[{}, Z = A.A; (*matrix mutiplication in mathematica*) n = Length[A (*Always a square matrix*) B = Table[If[(i!= j && (A[[i]][[j]] == 1 Z[[i]][[j]] > 0)), 1, 0], {i, n}, {j, n} If[MapAll[Min, (B + IdentityMatrix[n])] == 1, Return[2B - A Print["didnt exit", 34

Dijkstra s Algorithm Last time we saw two methods to solve the all-pairs shortest path problem: Min-plus matrix powering in O(n 3 log n) time and the

Dijkstra s Algorithm Last time we saw two methods to solve the all-pairs shortest path problem: Min-plus matrix powering in O(n 3 log n) time and the Floyd-Warshall algorithm in O(n 3 ) time. Neither of