Data mining --- mining graphs

Similar documents
Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach

The Coral Project: Defending against Large-scale Attacks on the Internet. Chenxi Wang

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Social Network Analysis

Models and Algorithms for Network Immunization

M.E.J. Newman: Models of the Small World

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Phase Transitions in Random Graphs- Outbreak of Epidemics to Network Robustness and fragility

Social Network Analysis

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Diffusion and Clustering on Large Graphs

ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

Modeling and Simulating Social Systems with MATLAB

Small-World Models and Network Growth Models. Anastassia Semjonova Roman Tekhov

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

PPI Network Alignment Advanced Topics in Computa8onal Genomics

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Heuristics for the Critical Node Detection Problem in Large Complex Networks

Graph-theoretic Properties of Networks

A Generating Function Approach to Analyze Random Graphs

Link Analysis and Web Search

Week 10: DTMC Applications Randomized Routing. Network Performance 10-1

1 Random Walks on Graphs

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

Plan of the lecture I. INTRODUCTION II. DYNAMICAL PROCESSES. I. Networks: definitions, statistical characterization, examples II. Modeling frameworks

Varying Applications (examples)

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

What is a Network? Theory and Applications of Complex Networks. Network Example 1: High School Friendships

Networks in economics and finance. Lecture 1 - Measuring networks

Impact of Clustering on Epidemics in Random Networks

Course Introduction / Review of Fundamentals of Graph Theory

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

Mathematics of Networks II

Theory and Applications of Complex Networks

Lecture 1: Examples, connectedness, paths and cycles

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CSCI5070 Advanced Topics in Social Computing

Supplementary material to Epidemic spreading on complex networks with community structures

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

TELCOM2125: Network Science and Analysis

Discrete Mathematics

Critical Phenomena in Complex Networks

Failure in Complex Social Networks

Graphs (MTAT , 6 EAP) Lectures: Mon 14-16, hall 404 Exercises: Wed 14-16, hall 402

Degree Distribution: The case of Citation Networks

L Modelling and Simulating Social Systems with MATLAB

TELCOM2125: Network Science and Analysis

Math 776 Graph Theory Lecture Note 1 Basic concepts

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Simplicial Complexes of Networks and Their Statistical Properties

Example 3: Viral Marketing and the vaccination policy problem

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Complex Networks: Ubiquity, Importance and Implications. Alessandro Vespignani

DNE: A Method for Extracting Cascaded Diffusion Networks from Social Networks

Tools for Large Graph Mining

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

PageRank and related algorithms

Overlay (and P2P) Networks

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Theory of Computing. Lecture 10 MAS 714 Hartmut Klauck

Special-topic lecture bioinformatics: Mathematics of Biological Networks

ECS 289 / MAE 298, Lecture 9 April 29, Web search and decentralized search on small-worlds

Graph Theory Mini-course

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Overview of Network Theory, I

On Finding Power Method in Spreading Activation Search

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

CSCI5070 Advanced Topics in Social Computing

Collaborative filtering based on a random walk model on a graph

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Girls Talk Math Summer Camp

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

STA 4273H: Statistical Machine Learning

GRAPHS, GRAPH MODELS, GRAPH TERMINOLOGY, AND SPECIAL TYPES OF GRAPHS

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Material handling and Transportation in Logistics. Paolo Detti Dipartimento di Ingegneria dell Informazione e Scienze Matematiche Università di Siena

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

University of Maryland. Tuesday, March 2, 2010

Extracting Information from Complex Networks

Link Analysis in the Cloud

L Modeling and Simulating Social Systems with MATLAB

Biological Networks Analysis

16 - Networks and PageRank

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Link Structure Analysis

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Lecture 3: Recap. Administrivia. Graph theory: Historical Motivation. COMP9020 Lecture 4 Session 2, 2017 Graphs and Trees

Lecture 27: Learning from relational data

Lecture 5: Graphs. Graphs! Euler Definitions: model. Fact! Euler Again!! Planar graphs. Euler Again!!!!

WUCT121. Discrete Mathematics. Graphs

How can we lay cable at minimum cost to make every telephone reachable from every other? What is the fastest route between two given cities?

Introduction III. Graphs. Motivations I. Introduction IV

Graph Data Management Systems in New Applications Domains. Mikko Halin

Analyzing Time-Series Data. Presentation by Colin Shea-Blymyer

Graph and Digraph Glossary

Transcription:

Data mining --- mining graphs University of South Florida Xiaoning Qian

Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank

New Science of Networks 1. M Faloutsos, P Faloutsos, C Faloutsos, On power-law relationships of the Internet topology. Comput. Commun. Rev. 29:251-262, 1999 2. JM Kleinberg, Navigation in a small world it is easier to find short chains between points in some networks than others. Nature, 406:845, 2000 3. AL Barabasi, Linked: The New Science of Networks. Cambridge, MA, 2002 4. DJ Watts, The new science of networks. Annu. Rev. Sociol. 30:243 70, 2004 5. DL Alderson, Catching the network science bug: Insight and opportunity for the operations researcher. Operations Research 56(5): 1047-1065, 2008 6. MEJ Newman, Networks: An Introduction. Oxford, 2010

Network Science The first network/graph problem Find a tour crossing every bridge just once, Euler, 1735 Bridges of Königsberg

Network Science 1. S Milgram, The small world problem. Psychol. Today 2:60 67, 1967 New Science Unprecedented number of empirical networks Much larger scale networks Visualization does not convey enough information Computer are much more powerful Highly interdisciplinary

Network Science Mining networks/graphs Topology/structure of complex networks Global: degrees, centrality, connectivity, etc. Scale-free (power-law) networks: 6 degree separation? Local: clustering (community), network motifs, etc. Dynamics/behavior of complex networks Global: the topological effect on dynamics How information, virus, disease, rumors, etc. propagate? Local: how individual nodes behave

Complex Networks (Yeast PPI)

Complex Networks (Yeast signaling)

Complex Networks (food web)

Complex Networks (friendship)

Complex Networks (romantic relation)

Complex Networks (author citation)

Complex Networks (Internet)

Complex Networks (Web)

Mathematics of Networks (Graphs) What is a network/graph? A collection of vertices/nodes joined by edges Different types of vertices and edges: Directed vs. Undirected Weighed vs. Binary Labeled vs. Nonlabeled Bipartite graphs Hypergraphs Mathematically, G = {V, E}

Mathematics of Networks (Graphs) Undirected network: <v i, v j > E => < v j, v i > E

Mathematics of Networks (Graphs) Adjacency matrix L Symmetric for undirected graphs Square matrix for (self-)graphs; rectangular for bipartite graphs L ij = e ij if <v i, v j > E Matrix analysis for graph mining! Simple graphs, connected graphs, complete graphs,

Mathematics of Networks (Graphs) Node degree c i The number of edges incident with vertex v i Neighbor set Input-, output-degrees Degree distributions (power-law) Trail (distinct edges), path (distinct nodes), cycle, cut,

Markov chain Sequential data

What is a Markov chain? Finite Markov chain -- (Q, P) Q = {q 1, q 2,, q s } : a finite set of states P : state transition probability matrix Given a sequence of observations: The probability of the sequence is: For first-order time-homogeneous Markov chain: Hence,

What is a Markov chain? Finite Markov chain -- (Q, P) Q = {B, q 1, q 2,, q s } : a finite set of states P : state transition probability matrix initial state probability: The probability of a sequence can be expressed with P: Note: The output are states at each time -- states are observable!!

An example 3-state Markov chain model for the weather: 0.4 0.3 0.8 R 0.1 S 0.2 0.3 0.2 0.1 C Q = {Rain (or snow), Cloudy, Sunny}; P is given in the figure; Initial state probability 0.6

Chapman-Kolmogorov Equation Chapman-Kolmogorov equation p(x n ) = P (n-1) p(x 1 ) Limiting distribution (stationary/steady-state distribution) Irreducibility, Periodicity, Ergocity p = P p How to solve p? Eigen-decomposition of P Power method

Random walk on graphs Random walk on graphs (network diffusion) is a Markov process.

What s behind Google? The algorithm of Google---PageRank

PageRank What is an important Webpage? There are many Webpages pointing to it Important Webpage point to more important Webpage Importance diffuses based on links between Webpages Vertices: Webpages; Edges: hyperlinks; HITS: JM Kleinberg

PageRank Diffusion (Random walk) on Web λ λ p i : importance for page i; L ij : link from page j to i; Hence, the problem becomes a Markov chain problem (diffusion process): λ λ

PageRank Diffusion (Random walk with restart) on Web λ λ p i : importance for page i; L ij : link from page j to i;

PageRank Diffusion (Random walk with restart) on Web Pseudocount λ λ Diffusion factor Matrix form: λ λ

PageRank How do we solve this? λ λ Note that p is simply for ranking and the absolute values are not critical! WLOG, we assume Hence, the problem becomes a Markov chain problem (diffusion process): λ λ

Viral propagation How does the virus spread over the network? Will it become an epidemic outbreak? How fast the virus will die out or become epidemic? How we should design robust networks to prevent cascading failures? * D Chakrabati, Tools for large graph mining. Ph.D. Thesis, CMU, 2005

Mathematical Epidemiology SIR (Susceptible-Infective-Recovered) model SIS (Susceptible-Infective-Susceptible) model Catching the disease from Infective neighbors (birth rate): β Recover rate: δ Epidemic threshold: τ

SIS model SIS model is again a Markov process!?? Sum and Product rules in probability!!

SIS model Sum and Product rules in probability!!

SIS model With appropriate approximations, we can derive p(v i t =susceptible) = p(v i t-1 =susceptible) ζ i + p(v i t-1 =infective) δ 1-p(v i t ) = [1-p(v i t-1 )] ζ i + p(v i t-1 ) δ and p t =[ βl + (1-δ)I ] p t-1 Sum and Product rules in probability!!

SIS model With appropriate approximations, we can derive p t =[ βl + (1-δ)I ] p t-1 Eigen-decomposition of the matrix S= [ βl + (1-δ)I ] p p 0 L Hence, L

SIS model With appropriate approximations, we can derive p t =[ βl + (1-δ)I ] p t-1 Eigen-decomposition of the matrix S= [ βl + (1-δ)I ] Hence, L Epidemic threshold: L

Summary Networks/graphs are everywhere and require new tools to study them efficiently and effectively. Random walk (Markov chain) on graphs and its extension can be a useful technique to mine complex networks/graphs PageRank Viral propagation Have you learned anything? :) I am teaching Biological Network Analysis, Spring 2012.