Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Average clustering coefficient of a graph Overall measure of network s clustering: the arithmetic mean of the C N for all nodes Can be used as another measure of network topology

Degree distribution Degree of a vertex specifies the # of edges by which it is connected to other vertices Degree distribution of a graph is a function measuring the total # of vertices in the graph with a given degree p( k ) = 1 N v V deg( v i 1 i ) = k Counts how many vertices have degree k

The degree distribution is a way to classify graphs into categories Random network: Poisson distribution Scale-free network: Power law linear distribution log-log plot

Remember scale-free networks Scale-free means that while the vast majority of vertices are weakly connected, there also exist some highly inter-connected super-vertices or hubs The term scale-free expresses that the ratio of highly to weakly connected vertices remains the same irrespective of the total # of links in the network Scale-free network is very simple, elegant & intuitive To produce an artificial scale-free network possessing small-world properties, only 2 basic rules need to be followed: Growth: the network is seeded with a small # of vertices Preferential attachment: the more connectivities a vertex has, the more likely new vertices will be connected to it (equivalent to the rich become richer!)

Power law A power law is any polynomial relationship that exhibits the property of scale invariance. The most common power laws relate two scalar quantities and have the form f ( x ) = k ax where a and k are constants. k is typically called the scaling exponent, where the word "scaling" denotes the fact that a power-law function satisfies f ( cx ) where c is a constant. Thus, a rescaling of the function's argument changes the constant of proportionality but preserves the shape of the function itself. Take the logarithm of both sides to clarify This expression has the form of a linear relationship with slope k. Rescaling the argument produces linear shift of the function up or down but leaves both the basic form and the slope k unchanged. [ f ( x )] k log x log a log = + f ( x ) [ f ( cx )] k log x k log c log a log = + + www.wikipedia.org

Power law An example power law graph, being used to demonstrate ranking of popularity. To the right is the long tail, to the left are the few that dominate. log-log plot Also known as the 80-20 rule: for many events, roughly 80% of the effects come from 20% of the causes. www.wikipedia.org

P(k) decays much slower for large k than the exponential decay of a Poisson distribution: highly connected hubs occur at much larger frequency than expected in random graphs.

In scale-free networks, the degree distribution follows a power law: p( k ) γ, = k γ R +

Degree distribution of a graph Measures the proportion of nodes in G that have degree k. m k is the number of nodes of degree k and m is the total number of all nodes P(k) is not a single number, but a set of numbers, with one degree value for each k (k = 0, 1, 2,...) Division by m assures that the sum of P(k) over all values of k is equal to 1

Biological systems: have clustering coefficients that are higher than those of randomly connected graphs with increasing size of the network, the clustering coefficient tends to decrease Implications: biological networks contain many nodes with low degree, but only relatively small numbers of hubs. Hubs are highly connected nodes that are surrounded by dense clusters of nodes with relatively low degree. Example: In both protein interaction network and the transcription regulatory network in yeast, hubs are much more often connected to nodes of low degree, rather than to other hubs

Characterizations of graphs (degree distributions, clustering coefficients, etc) are performed using comparisons with random graphs. Random graph: edges are randomly associated with nodes degree distribution is a Poisson distribution (skinny bell curve with small variance), most nodes have a degree that is close to average.

Most biological systems are not organized randomly A network whose degree distribution more or less follows a power law is called a scale-free network. In biological and other real-world systems: a few hubs are connected to disproportionally many other nodes, while most other nodes are associated with much fewer edges than in a random graph P(k) often follows a power-law distribution

Example of a scale-free network

Summary: network models A hierarchical architecture implies that sparsely connected areas, with communication between the different highly clustered neighborhoods being maintained by a few nodes.

A path in a graph is an alternating sequence of nodes and edges, beginning at a point and ending at a point, and which does not visit any point more than once. The first node is called the start node and the last node is called the end node. Both of them are called end or terminal nodes of the path. The other vertices in the path are internal nodes. Nodes A and D are connected by 5 paths: A B D A B E D A B E C D A E D A E C D

Two paths are independent (also called internally vertexdisjoint) if they do not have any internal node in common, Only 2 of these paths are independent: A B D and either A E D or A E C D. A B D A B E D A B E C D A E D A E C D

Example of paths

Shortest path (or distance) The shortest path between two nodes contains important information about the structure of the network. In random networks, the average shortest path is proportional to the logarithm of the number of nodes, log m. In scale-free networks, the average shortest path is shorter and increases more slowly for larger networks.

An algorithm to calculate the shortest path. See animation example in: http://www.sce.carleton.ca/faculty/chinneck/po.html Visit algorithm animations Dijkstra s Algorithm for Shortest Route Problems

Average shortest path (or distance) of metabolic networks In an analysis of over 40 metabolic networks with 200-500 nodes, Jeong and colleagues found that a few hubs dominated the connectivity patterns. As a consequence, the average shortest path was essentially independent of the size of the network and always had an amazingly low value of about three reaction steps. In a different interpretation, one may consider the average path as an indicator of how quickly information can be sent through the network. According to this criterion, biological networks are indeed very efficient.

Construction of different types of network

Small-world networks It is scale-free (has a power-law degree distribution and therefore a small average shortest distance). It has a high clustering coefficient. Special structure of small-world networks: consist of several loosely connected clusters of nodes, each surrounding a hub. In this organization, most nodes are not neighbors of each other, but moving from any one node to any other node in the network does not take many steps, because of the existence of the well-connected hubs.

Social networks have small world properties The small-world property is reminiscent of John Guare s famous play Six Degrees of Separation, whose premise is that essentially every human is connected to every other human by a chain of six or fewer acquaintances.

Properties of small-world networks The small-world property emerges automatically when a network grows by the addition of new links preferentially to nodes that are already highly connected. It is not surprising that the hubs in a gene or protein network are often among the most important components and that their destruction or mutation is frequently fatal. Identification of hubs may be important for drug target identification.