Comparison of Centralities for Biological Networks

Similar documents
Special-topic lecture bioinformatics: Mathematics of Biological Networks

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Response Network Emerging from Simple Perturbation

Coordinated Perspectives and Enhanced Force-Directed Layout for the Analysis of Network Motifs

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Exploration of biological network centralities with CentiBiN

Incoming, Outgoing Degree and Importance Analysis of Network Motifs

Centrality Measures to Identify Traffic Congestion on Road Networks: A Case Study of Sri Lanka

Graph Theory for Network Science

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

A Generating Function Approach to Analyze Random Graphs

V2: Measures and Metrics (II)

Graph Theory for Network Science

CentiBiN Centralities in Biological Networks

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT

1 Homophily and assortative mixing

Localized Bridging Centrality for Distributed Network Analysis

Graphs. Pseudograph: multiple edges and loops allowed

Course Introduction / Review of Fundamentals of Graph Theory

Properties of Biological Networks

2. CONNECTIVITY Connectivity

FastA & the chaining problem

FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Preliminaries: networks and graphs

Summary: What We Have Learned So Far

arxiv:cond-mat/ v1 [cond-mat.other] 2 Feb 2004

Graph Theory and Network Measurment

Structure of biological networks. Presentation by Atanas Kamburov

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Basics of Network Analysis

Centrality. Peter Hoff. 567 Statistical analysis of social networks. Statistics, University of Washington 1/36

Centralities for undirected graphs

Lectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Introduction aux Systèmes Collaboratifs Multi-Agents

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Why is a power law interesting? 2. it begs a question about mechanism: How do networks come to have power-law degree distributions in the first place?

Structured prediction using the network perceptron

Introduction to network metrics

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

The Generalized Topological Overlap Matrix in Biological Network Analysis

Random Graph Model; parameterization 2

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

A New Measure of Linkage Between Two Sub-networks 1

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Locality-sensitive hashing and biological network alignment

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

Networks in economics and finance. Lecture 1 - Measuring networks

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Graph. Vertex. edge. Directed Graph. Undirected Graph

Module 11. Directed Graphs. Contents

1 More configuration model

Varying Applications (examples)

Introduction to Mathematical Programming IE406. Lecture 16. Dr. Ted Ralphs

Centrality Estimation in Large Networks

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:

Critical Phenomena in Complex Networks

V 1 Introduction! Mon, Oct 15, 2012! Bioinformatics 3 Volkhard Helms!

V4 Matrix algorithms and graph partitioning

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Centrality Book. cohesion.

Paths, Circuits, and Connected Graphs

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Signal Processing for Big Data

COS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1. Database Searching

Computational Genomics and Molecular Biology, Fall

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Introduction to Graph Theory

Local higher-order graph clustering

Centralities (4) Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California Excellence Through Knowledge

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment

Social Network Analysis

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Weighted networks. 9.1 The strength of weak ties

W4231: Analysis of Algorithms

A PRIME FACTOR THEOREM FOR A GENERALIZED DIRECT PRODUCT

Extracting Information from Complex Networks

Supplementary material to Epidemic spreading on complex networks with community structures

TELCOM2125: Network Science and Analysis

Data mining --- mining graphs

On subgraphs of Cartesian product graphs

On Page Rank. 1 Introduction

MCL. (and other clustering algorithms) 858L

Online Social Networks and Media

A graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial.

A new general family of deterministic hierarchical networks

Bioinformatics explained: BLAST. March 8, 2007

Mathematics of Networks II

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

3.1 Basic Definitions and Applications. Chapter 3. Graphs. Undirected Graphs. Some Graph Applications

Robustness of Centrality Measures for Small-World Networks Containing Systematic Error

Chordal deletion is fixed-parameter tractable

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Simplicial Complexes of Networks and Their Statistical Properties

Graph and Digraph Glossary

Section 7.12: Similarity. By: Ralucca Gera, NPS

Transcription:

Comparison of Centralities for Biological Networks Dirk Koschützki and Falk Schreiber Bioinformatics Center Gatersleben-Halle Institute of Plant Genetics and Crop Plant Research Corrensstraße 3 06466 Gatersleben, Germany. koschuet,schreibe@ipk-gatersleben.de Abstract: The analysis of biological networks involves the evaluation of the vertices within the connection structure of the network. To support this analysis we discuss five centrality measures and demonstrate their applicability on two example networks, a protein-protein-interaction network and a transcriptional regulation network. We show that all five centrality measures result in different valuations of the vertices and that for the analysis of biological networks all five measures are of interest. 1 Introduction Centrality analysis is particularly useful in analyzing biological networks and hence in helping to understand the underlying biological processes. It has been shown that central vertices in protein-protein interaction networks are often functionally important and the deletion of such vertices is related to lethality [JMBO01]. In [WS03] three different types of centralities are defined and applied to metabolic, protein-protein interaction and domain sequence networks. Fell and Wagner discuss the possibility that metabolites with highest degree may belong to the oldest part of the metabolism [FW00]. However, it has also been shown that the degree of a vertex alone, as a specific centrality measure, is not sufficient to distinguish lethal proteins clearly from viable ones [Wu02], that in protein networks there is no relation between network connectivity and robustness against amino-acid substitutions [HCW02], and that for biological network analysis several centrality measures have to be considered [WS03]. To assist scientists in the exploration of biological networks, we discuss and compare five different centrality measures. Some of these are already known in biological sciences, others are transferred from different fields of sciences such as social network analysis. The application of these measures shows that some correlate strongly in one network and weakly in another. As a result, we conclude that for the analysis of biological networks several measures should be considered. This work was supported by the German Ministry of Education and Research (BMBF) under grant 0312706A. 199

This paper is organized as follows: in Sect. 2 we define the graph model on which we operate. Section 3 introduces the five centralities in networks, all are explained using one example graph. These measures are applied to typical biological networks in Sect. 4. 2 Definitions A undirected graph G = (V, E) consists of a finite set V of vertices (n = V ) and a finite set E V V of edges (m = E ). An edge e = (u, v) E connects two vertices u and v. The vertices u and v are said to be incident with the edge e and adjacent to each other. The set of all vertices which are adjacent to u is called the neighborhood N(u) of u (N(u) = {v : (u, v) E}). A graph is called loop-free if no edge connects a vertex to itself. An adjacency matrix A of a graph G = (V, E) is a (n n) matrix, where a ij = 1 if and only if (i, j) E and a ij = 0 otherwise. The adjacency matrix of any undirected graph is symmetric. The degree d(v) of a vertex v is the number of its incident edges. Let (e 1,..., e k ) be a sequence of edges in a graph G = (V, E). This sequence is called a walk if there are vertices v 0,..., v k such that e i = (v i 1, v i ) for i = 1,..., k. If the edges e i are pairwise distinct and the vertices v i are pairwise distinct the walk is called a path. The length of a walk or path is given by its number of edges, k = (e 1,..., e k ). A shortest path between two vertices u, v is a path with minimal length, all shortest paths between u, v are called geodesics. The distance (dist(u, v)) between two vertices u, v is the length of a shortest path between them. Two vertices u, v of a graph G = (V, E) are called connected if there exists a walk from vertex u to vertex v. If any pair of different vertices of the graph is connected, the graph G = (V, E) is called connected. If a walk starts at vertex u, chooses uniformly at random one of the incident edges of the current vertex until it finally reaches the target v then we call this walk a random walk between u and v. In the remainder of this paper we consider only non-trivial 1 undirected loop-free connected graphs. This restriction is required for a common definition of all centrality measures covered in this paper. Some of the centralities can easily be expanded to cover directed or unconnected graphs, even an extension towards weighted edges is possible. 3 Centralities in Networks Formally a centrality is a function C which assigns every vertex v V of a given graph G a value C(v) R. As we are interested in the ranking of the vertices of the given graph G we choose the convention that a vertex u is more important than another vertex v iff C(u) > C(v). In the following sections we explain five different centrality measures and show an example graph and the corresponding centrality values. 1 Graphs of at least two vertices and one edge. 200

3.1 Degree An obvious order of the vertices of a graph can be established by sorting them according to their degree. The corresponding centrality measure degree-centrality (C d ) is defined as C d (v) := d(v). See the work of Freeman [Fr79] for a long list of references to the usage of degree-centrality in social network analysis. For biological network analysis degreecentrality for example is used in [JMBO01] to correlate the degree of a protein in the network with the lethality of its removal. See Fig. 1 for an example graph and Table 1 for the corresponding centrality values. 3.2 Eccentricity The next three definitions of centralities all operate on the concepts of paths within the given graph. The simplest definition uses the distance between vertices. The eccentricity ecc of a vertex u is defined as ecc(u) := max v V dist(u, v) and the eccentricity-centrality. The reciprocal of ecc(u) is used to assure that more central vertices have a higher value of C e, because such central vertices are the ones with the smallest eccentricity value. An application of eccentricity within the biological context is shown in [WS03]. Again, as for degree-centrality, Fig. 1 and Table 1 show an example. (C e ) as C e (u) := 1 ecc(u) 3.3 Closeness In contrast to eccentricity, closeness-centrality uses not only the maximum distance between the vertex of interest and all other vertices but uses the sum of the distances of this 1 vertex and all other vertices. The closeness-centrality is defined as C c (u) := sumdist(u) with sumdist(u) = v V dist(u, v). Closeness-based centrality measures are cited extensively in the work of Freeman [Fr79]. Wuchty et al. [WS03] apply this centrality to different biological networks and show the correspondence with the service facility location problem. 3.4 Random Walk Betweenness Within networks a communication between two vertices u, v may be visible to a third vertex w if this vertex lies in the path of the communication between these vertices. To measure the centrality of a vertex the ability to observe communication is a feasible approach. Different methods to model communication are conceivable. There are for example communications over shortest paths, paths with maximum flow and random walks. All of these are potential models for betweenness and are covered in the literature [Fr77, FBW91, Ne03]. 201

1 13 14 15 2 3 9 4 5 6 7 8 10 11 12 Figure 1: A graph to show the five different centrality measures Newman s random walk approach models information transmission and therefore matches problems often modelled in biological networks. For the random-walk betweenness centrality (C r ) the centrality of a vertex w is equal to the number of times that a random walk from u to v goes through w, averaged over all u and v [Ne03]. A detailed coverage of the required calculations for the random-walk betweenness is beyond the scope of this paper and the reader is directed toward [Ne03]. 3.5 Bonacich s Eigenvector Centrality A different approach to order the vertices of a graph was suggested by Bonacich [Bo72]. His idea is based on the assumption that the value of a single vertex is determined by the values of the neighboring vertices. In contrast to the previous measures not only the position of a vertex within the graph is considered but also the centrality values of its neighbors. Bonacich suggested the following definition: C λ (u) := v N(u) C λ(v). If we consider the adjacency matrix representation of the graph, this is equivalent to C λ (v i ) := n j=1 a ijc λ (v j ). This leads directly to the well known problem of eigenvector computation λs = AS and the eigenvector of the largest eigenvalue is the eigenvector-centrality (C λ := S) [Bo72]. 3.6 Centralities shown on an example graph To demonstrate that all of the explained centrality measures usually give different results we use the example graph shown in Fig. 1. For this graph Table 1 shows all centrality values for the five centrality measures. 202

Vertex C d Vertex C e Vertex C c Vertex C r Vertex C λ 8 4 5 0.2500 7 0.0286 7 0.7429 7 0.5021 9 4 4 0.2000 6 0.0263 6 0.5619 8 0.4563 7 3 6 0.2000 8 0.0238 5 0.5143 9 0.4563 2 2 3 0.1667 9 0.0238 8 0.4762 6 0.2761 3 2 7 0.1667 5 0.0233 9 0.4762 10 0.1927 4 2 2 0.1429 4 0.0200 4 0.4476 11 0.1927 5 2 8 0.1429 10 0.0182 3 0.3619 12 0.1927 6 2 9 0.1429 11 0.0182 2 0.2571 13 0.1927 1 1 1 0.1250 12 0.0182 1 0.1333 14 0.1927 10 1 10 0.1250 13 0.0182 10 0.1333 15 0.1927 11 1 11 0.1250 14 0.0182 11 0.1333 5 0.1517 12 1 12 0.1250 15 0.0182 12 0.1333 4 0.0830 13 1 13 0.1250 3 0.0169 13 0.1333 3 0.0448 14 1 14 0.1250 2 0.0143 14 0.1333 2 0.0230 15 1 15 0.1250 1 0.0120 15 0.1333 1 0.0097 Table 1: The centrality values for the example graph. The vertices are ordered by descending centrality value 4 Application and Discussion We applied the presented centrality measures to two biological networks, a protein-proteininteraction (PPI) network and a transcriptional regulation (TR) network. The PPI network is based on the April 2004 release of the Homo sapiens network from the DIP- Database [SMS + 04]. It models proteins as vertices and interactions as edges. The TR network of Escherichia coli models operons as vertices and regulation between transcription factors and operons as edges, see [SOMMA02] for details and the link to the data. For the PPI-network we removed 51 self interactions as the graph has to be loop-free for the calculation of the eigenvector-centrality C λ. Furthermore, we removed vertices and edges not connected to the giant component, and the resulting graph consisted of 563 vertices and 870 edges. For the TR network we used the data from [SOMMA02] and applied the same strategy: we removed self regulation and used the giant component for the analysis. As the regulation data is directed we used the underlying undirected graph. The activation and repression information at the edges was not used as our framework is based on unweighted graphs. Finally, the graph for the TR network consisted of 325 vertices and 453 edges. The calculation of the five centrality measures for both graphs was done on a modern desktop PC (3 GHz, 2 GB Ram, Linux 2.6.x, Java 1.4.2) and took between less than a second (C d ) and several minutes (C r ). This was expected as the complexity of the different algorithms ranges from O(n) to O((n + m)n 2 ). To analyse a network we calculated the presented centralities for all vertices. Then for each centrality all vertices were ordered by descending centrality value and, for vertices with the same centrality value, by ascending vertex-label. The vertices were enumerated 203

from 1 to n, this number is the position of the vertex according to the centrality. Finally, we build a new table which for every vertex and every centrality contains the position of the vertex according to the centrality. With the calculated positions we made several analyses of the correlation. It shows that some of the measures, e.g. eccentricity C e and eigenvector C λ, are highly correlated in the PPI-network (see Fig. 2) while others are only weakly correlated. Within the TR network a strong correlation between eigenvector C λ and closeness C c was observed (see Fig. 3), but in contrast to the PPI network there is only a weak correlation between C e and C λ. Tables 2 and 3 show all correlation coefficients based on Pearson s method. Note that the strong correlation of degree C d and random-walk betweenness C r is discussed in [Ne03]. In conclusion, the analysis of biological networks clearly benefits from the application of several centrality measures. Our next step lies in the comparison of different centralities measures with biological information. 400 500 400 500 Degree Eccentricity Closeness RWB Eigenvector 400 500 400 500 400 500 Figure 2: Scatter plot matrix of the centrality positions for the PPI network C d C e C c C r C λ Degree Eccentricity Closeness RWB Eigenvector C d 1.0000 0.2794 0.3396 0.9534 0.2703 C e 0.2794 1.0000 0.4231 0.2776 0.9248 C c 0.3396 0.4231 1.0000 0.3843 0.4726 C r 0.9534 0.2776 0.3843 1.0000 0.2627 C λ 0.2703 0.9248 0.4726 0.2627 1.0000 Table 2: Correlation coefficients for the centrality positions for the PPI-network 204

0 5 0 5 Degree Eccentricity Closeness RWB Eigenvector 0 5 0 5 0 5 Figure 3: Scatter plot matrix of the centrality positions for the TR network C d C e C c C r C λ Degree Eccentricity Closeness RWB Eigenvector C d 1.0000 0.3974 0.5861 0.9700 0.5499 C e 0.3974 1.0000 0.2208 0.4172 0.0514 C c 0.5861 0.2208 1.0000 0.5856 0.9552 C r 0.9700 0.4172 0.5856 1.0000 0.5164 C λ 0.5499 0.0514 0.9552 0.5164 1.0000 Table 3: Correlation coefficients for the centrality positions for the TR network References [Bo72] Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology. 2:113 120. 1972. [FBW91] Freeman, L. S., Borgatti, S. P., and White, D. R.: Centrality in valued graphs: a measure of betweenness based on network flow. Social Networks. 13:141 154. 1991. [Fr77] [Fr79] [FW00] Freeman, L. C.: A Set of Measures of Centrality Based on Betweenness. Sociometry. 40(6):35 41. 1977. Freeman, L. S.: Centrality in social networks: Conceptual clarification. Social Networks. 1:215 239. 1979. Fell, D. A. and Wagner, A.: The small world of metabolism. Nature Biotechnology. 18:1121 1122. 2000. 205

[HCW02] [JMBO01] Hahn, M. W., Conant, G., and Wagner, A.: Molecular evolution in large genetic networks: connectivity does not equal importance. Technical report. Santa Fe Institute. 2002. 02-08-039. Jeong, H., Mason, S. P., Barabási, A. L., and Oltvai, Z. N.: Lethality and centrality in protein networks. Nature. 411:44. 2001. [Ne03] Newman, M. E. J.: A measure of betweenness centrality based on random walks. arxiv cond-mat/0309045. 2003. [SMS + 04] Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D.: The database of interacting proteins: 2004 update. Nucleic Acids Res. 32(1):449 451. 2004. [SOMMA02] Shen-Orr, S. S., Milo, R., Mangan, S., and Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics. 31(1):64 68. 2002. [WS03] Wuchty, S. and Stadler, P. F.: Centers of complex networks. J Theor Biol. 223(1):45 53. 2003. [Wu02] Wuchty, S.: Interaction and domain networks of yeast. Proteomics. 2:1715 1723. 2002. 206