Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Similar documents
Properties of Biological Networks

Structure of biological networks. Presentation by Atanas Kamburov

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Summary: What We Have Learned So Far

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Comparison of Centralities for Biological Networks

Why is a power law interesting? 2. it begs a question about mechanism: How do networks come to have power-law degree distributions in the first place?

CAIM: Cerca i Anàlisi d Informació Massiva

V 1 Introduction! Mon, Oct 15, 2012! Bioinformatics 3 Volkhard Helms!

Network Thinking. Complexity: A Guided Tour, Chapters 15-16

CS-E5740. Complex Networks. Scale-free networks

Overlay (and P2P) Networks

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 3 Aug 2000

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Incoming, Outgoing Degree and Importance Analysis of Network Motifs

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

Network Analysis. 1. Large and Complex Networks. Scale-free Networks Albert Barabasi Society

Critical Phenomena in Complex Networks

On Complex Dynamical Networks. G. Ron Chen Centre for Chaos Control and Synchronization City University of Hong Kong

A quick review. Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Complex Networks. Structure and Dynamics

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

Graph-theoretic Properties of Networks

The Betweenness Centrality Of Biological Networks

Biological Networks Analysis Dijkstra s algorithm and Degree Distribution

Topology and Dynamics of Complex Networks

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Signal Processing for Big Data

Biological Networks Analysis

Exercise set #2 (29 pts)

Overview of Network Theory, I

3 Global Properties of Networks

Biological Networks Analysis

An introduction to the physics of complex networks

Introduction to network metrics

Constructing a G(N, p) Network

Complex Networks: A Review

Networks and Discrete Mathematics

arxiv:cond-mat/ v1 21 Oct 1999

NETWORK BASICS OUTLINE ABSTRACT ORGANIZATION NETWORK MEASURES. 1. Network measures 2. Small-world and scale-free networks 3. Connectomes 4.

Supply chains involve complex webs of interactions among suppliers, manufacturers,

CSCI5070 Advanced Topics in Social Computing

COMP6237 Data Mining and Networks. Markus Brede. Lecture slides available here:

Exploiting the Scale-free Structure of the WWW

Biological Networks Analysis Network Motifs. Genome 373 Genomic Informatics Elhanan Borenstein

Failure in Complex Social Networks

V 2 Clusters, Dijkstra, and Graph Layout

An Evolving Network Model With Local-World Structure

Special-topic lecture bioinformatics: Mathematics of Biological Networks

Higher order clustering coecients in Barabasi Albert networks

M.E.J. Newman: Models of the Small World

Characteristics of Preferentially Attached Network Grown from. Small World

Case Studies in Complex Networks

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

Plan of the lecture I. INTRODUCTION II. DYNAMICAL PROCESSES. I. Networks: definitions, statistical characterization, examples II. Modeling frameworks

1 Random Graph Models for Networks

Examples of Complex Networks

Universal Behavior of Load Distribution in Scale-free Networks

Response Network Emerging from Simple Perturbation

Constructing a G(N, p) Network

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

The quantitative analysis of interactions takes bioinformatics to the next higher dimension: we go from 1D to 2D with graph theory.

V 2 Clusters, Dijkstra, and Graph Layout"

V 2 Clusters, Dijkstra, and Graph Layout

Networks in economics and finance. Lecture 1 - Measuring networks

Basics of Network Analysis

Graph Model Selection using Maximum Likelihood

Introduction to Bioinformatics

NETWORKS. David J Hill Research School of Information Sciences and Engineering The Australian National University

Networks and stability

Structured prediction using the network perceptron

Network analysis. Martina Kutmon Department of Bioinformatics Maastricht University

E6885 Network Science Lecture 5: Network Estimation and Modeling

Smallest small-world network

Machine Learning and Modeling for Social Networks

Lecture on Game Theoretical Network Analysis

Heuristics for the Critical Node Detection Problem in Large Complex Networks

A Big World Inside Small-World Networks

Memory As an Organizer of Dynamic Modules In a Network of Potential Interactions

motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs { }

MODELS FOR EVOLUTION AND JOINING OF SMALL WORLD NETWORKS

Eciency of scale-free networks: error and attack tolerance

A Generating Function Approach to Analyze Random Graphs

The Establishment Game. Motivation

Erdős-Rényi Model for network formation

EFFECT OF VARYING THE DELAY DISTRIBUTION IN DIFFERENT CLASSES OF NETWORKS: RANDOM, SCALE-FREE, AND SMALL-WORLD. A Thesis BUM SOON JANG

SNA 8: network resilience. Lada Adamic

Attack vulnerability of complex networks


Degree of Diffusion in Real Complex Networks

Modeling and Simulating Social Systems with MATLAB

Topic II: Graph Mining

UNIVERSITA DEGLI STUDI DI CATANIA FACOLTA DI INGEGNERIA

Small-World Models and Network Growth Models. Anastassia Semjonova Roman Tekhov

Introduction to Network Analysis. Some materials adapted from Lada Adamic, UMichigan

Economic Networks. Theory and Empirics. Giorgio Fagiolo Laboratory of Economics and Management (LEM) Sant Anna School of Advanced Studies, Pisa, Italy

arxiv:cond-mat/ v5 [cond-mat.dis-nn] 16 Aug 2006

Transcription:

Advanced Algorithms and Models for Computational Biology -- a machine learning approach Biological Networks & Network Evolution Eric Xing Lecture 22, April 10, 2006 Reading: Molecular Networks Interaction networks Regulatory networks Expression networks Nodes molecules. Links inteactions / relations. Metabolic networks 1

Other types of networks Disease Spread [Krebs] Electronic Circuit Food Web Internet [Burch & Cheswick] Social Network Metabolic networks KEGG database: http://www.genome.ad.jp/kegg/kegg2.html Nodes metabolites (0.5K). Edges directed biochemichal reactions (1K). Reflect the cell s metabolic circuitry. 2

Graph theoretic description of metabolic networks Graph theoretic description for a simple pathway (catalyzed by Mg 2+ -dependant enzymes) is illustrated (a). In the most abstract approach (b) all interacting metabolites are considered equally. Barabasi & Oltvai. NRG. (2004) 5 101-113 Protein Interaction Networks Nodes proteins (6K). Edges interactions (15K). Reflect the cell s machinery and signlaing pathways. 3

Experimental approaches Yeast Two-Hybrid Protein coip Graphs and Networks Graph: a pair of sets G={V,E} where V is a set of nodes, and E is a set of edges that connect 2 elements of V. Directed, undirected graphs Large, complex networks are ubiquitous in the world: Genetic networks Nervous system Social interactions World Wide Web 4

Global topological measures Indicate the gross topological structure of the network Degree Path length Clustering coefficient [Barabasi] Connectivity Measures Node degree: the number of edges incident on the node (number of network neighbors.) Undetected networks i Degree of node i = 5 Degree distribution P(k): probability that a node has degree k. Directed networks, i.e., transcription regulation networks (TRNs) Incoming degree = 2.1 each gene is regulated by ~2 TFs Outgoing degree = 49.8 each TF targets ~50 genes 5

Characteristic path length L ij is the number of edges in the shortest path between vertices i and j The characteristic path length of a graph is the average of the L ij for every possible pair (i,j) Diameter: maximal distance in the network. Networks with small values of L are said to have the small world property L (, i j) = 2 In a TRN, L ij represents the number of intermediate TFs until final target i j Starting TF Indicate how immediate a regulatory response is Average path length = 4.7 Final target 1 intermediate TF Path length = 1 Clustering coefficient The clustering coefficient of node i is the ratio of the number E i of edges that exist among its neighbors, over the number of edges that could exist: C I =2T I /n I (n I -1) 4 neighbours Measure how inter-connected the network is 1 existing link Average coefficient = 0.11 6 possible links Clustering coefficient The clustering coefficient for the entire network C is the average of all the C i = 1/6 = 0.17 6

A Comparison of Global Network Statistics (Barabasi & Oltvai, 2004) A. Random Networks [Erdos and Rényi (1959, 1960)] B. Scale Free [Price,1965 & Barabasi,1999] C.Hierarchial k k e k P ( k ) = k! Mean path length ~ ln(k) Phase transition: Connected if: p ln( k ) / k P(k) ~k γ, k >>1, 2 < γ Mean path length ~ lnln(k) Preferential attachment. Add proportionally to connectedness Copy smaller graphs and let them keep their connections. Local network motifs Regulatory modules within the network SIM MIM FBL FFL [Alon] 7

SIM = Single input motifs HCM1 ECM22 STB1 SPO1 YPR013C [Alon; Horak, Luscombe et al (2002), Genes & Dev, 16: 3017 ] MIM = Multiple input motifs SBF MBF SPT21 HCM1 [Alon; Horak, Luscombe et al (2002), Genes & Dev, 16: 3017 ] 8

FFL = Feed-forward loops SBF Yox1 Pog1 Tos8 Plm2 [Alon; Horak, Luscombe et al (2002), Genes & Dev, 16: 3017 ] FBL = Feed-back loops MBF SBF Tos4 [Alon; Horak, Luscombe et al (2002), Genes & Dev, 16: 3017 ] 9

frequency What network structure should be used to model a biological network? Strogatz S.H., Nature (2001) 410 268 lattice random Calculating the degree connectivity of a network 1 1 2 1 2 2 2 5 4 7 6 1 1 1 3 8 3 3 2 4 2 1 Degree connectivity distributions: 1 2 3 4 5 6 7 8 degree connectivity 10

Connectivity distributions for metabolic networks A. fulgidus (archaea) E. coli (bacterium) C. elegans (eukaryote) averaged over 43 organisms Jeong et al. Nature (2000) 407 651-654 Protein-protein interaction networks (color of nodes is explained later)\ Jeong et al. Nature 411, 41-42 (2001) Wagner. RSL (2003) 270 457-466 11

log frequency log frequency Random versus scaled exponential degree distribution Degree connectivity distributions differs between random and observed (metabolic and protein-protein interaction) networks. Strogatz S.H., Nature (2001) 410 268 y = a x y = x a log degree connectivity log degree connectivity What is so scale-free about these networks? No matter which scale is chosen the same distribution of degrees is observed among nodes 12

Models for networks of complex topology Erdos-Renyi (1960) Watts-Strogatz (1998) Barabasi-Albert (1999) Random Networks: The Erdős-Rényi [ER] model (1960): N nodes Every pair of nodes is connected with probability p. Mean degree: (N-1)p. Degree distribution is binomial, concentrated around the mean. Average distance (Np>1): log N Important result: many properties in these graphs appear quite suddenly, at a threshold value of PER(N) If PER~c/N with c<1, then almost all vertices belong to isolated trees Cycles of all orders appear at PER ~ 1/N 13

The Watts-Strogatz [WS] model (1998) Start with a regular network with N vertices Rewire each edge with probability p For p=0 (Regular Networks): high clustering coefficient high characteristic path length For p=1 (Random Networks): low clustering coefficient low characteristic path length QUESTION: What happens for intermediate values of p? WS model, cont. There is a broad interval of p for which L is small but C remains large Small world networks are common : 14

Scale-free networks: The Barabási-Albert [BA] model (1999) The distribution of degrees: ER Model ER Model WS Model actors power grid www In real network, the probability of finding a highly connected node decreases exponentially with k PK ( )~ K γ BA model, cont. Two problems with the previous models: 1. N does not vary 2. the probability that two vertices are connected is uniform The BA model: Evolution: networks expand continuously by the addition of new vertices, and Preferential-attachment (rich get richer): new vertices attach preferentially to sites that are already well connected. 15

Scale-free network model GROWTH: starting with a small number of vertices m 0 at every timestep add a new vertex with m m 0 PREFERENTIAL ATTACHMENT: the probability Π that a new vertex will be connected to vertex i depends on the ki connectivity of that vertex: ( ki ) = k j j Barabasi & Bonabeau Sci. Am. May 2003 60-69 Barabasi and Albert. Science (1999) 286 509-512 Scale Free Networks a) Connectivity distribution with N = m 0 +t=300000 and m 0 =m=1(circles), m 0 =m=3 (squares), and m 0 =m=5 (diamons) and m 0 =m=7 (triangles) b) P(k) for m0=m=5 and system size N=100000 (circles), N=150000 (squares) and N=200000 (diamonds) Barabasi and Albert. Science (1999) 286 509-512 16

Comparing Random Vs. Scalefree Networks Two networks both with 130 nodes and 215 links) Five nodes with most links First neighbors of red nodes The importance of the connected nodes in the scale-free network: 27% of the nodes are reached by the five most connected nodes, in the scale-free network more than 60% are reached. Modified from Albert et al. Science (2000) 406 378-382 Failure and Attack Albert et al. Science (2000) 406 378-382 Failure: Removal of a random node. Attack: The selection and removal of a few nodes that play a vital role in maintaining the network s connectivity. a macroscopic snapshot of Internet connectivity by K. C. Claffy 17

Failure and Attack, cont. Random networks are homogeneous so there is no difference between failure and attack Diameter of the network Fraction nodes removed from network Modified from Albert et al. Science (2000) 406 378-382 Failure and Attack, cont. Scale-free networks are robust to failure but susceptible to attack Diameter of the network Fraction nodes removed from network Modified from Albert et al. Science (2000) 406 378-382 18

The phenotypic effect of removing the corresponding protein: Yeast protein-protein interaction networks Lethal Slow-growth Non-lethal Unknown Jeong et al. Nature 411, 41-42 (2001) Lethality and connectivity are positively correlated Average and standard deviation for the various clusters. % of essential proteins Number of links Pearson s linear correlation coefficient = 0.75 Jeong et al. Nature 411, 41-42 (2001) 19

Genetic foundation of network evolution Network expansion by gene duplication A gene duplicates Inherits it connections The connections can change Gene duplication slow ~10-9/year Connection evolution fast ~10-6/year Barabasi & Oltvai. NRG. (2004) 5 101-113 The transcriptional regulation network of Escherichia coli. Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan & Uri Alon (2002) Nature Genetics 31 64-68 20

Motifs in the networks Deployed a motif detection algorithm on the transcriptional regulation network. Identified three recurring motifs (significant with respect to random graphs). Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan & Uri Alon (2002) Nature Genetics 31 64-68 Convergent evolution of gene circuits Are the components of the feed-forward loop for example homologous? Circuit duplication is rare in the transcription network Conant and Wagner. Nature Genetics (2003) 34 264-266 21

Acknowledgements Itai Yanai and Doron Lancet Mark Gerstein Roded Sharan Jotun Hein Serafim Batzoglou for some of the slides modified from their lectures or tutorials Reference Barabási and Albert. Emergence of scaling in random networks. Science 286, 509-512 (1999). Yook et al. Functional and topological characterization of protein interaction networks. Proteomics 4, 928-942 (2004). Jeong et al. The large-scale organization of metabolic networks. Nature 407, 651-654 (2000). Albert et al. Error and attack tolerance in complex networks. Nature 406, 378 (2000). Barabási and Oltvai, Network Biology: Understanding the Cell's Functional Organization, Nature Reviews, vol 5, 2004 22