Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Size: px
Start display at page:

Download "Fast Nearest Neighbor Search on Large Time-Evolving Graphs"

Transcription

1 Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu

2 Graphs are everywhere Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 3

3 and LARGE and TIME-evolving! n 1.32 billion monthly active users June 30, 2014 Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 4

4 Proximity problem on graphs also: NN-search, similarity, closeness, relevance Q: Which nodes are close to A? I 1 J 1 1 A 1 H 1 B B 1 1 D E F G Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 5

5 Application: Recommendations Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 6

6 Other applications Finding communities (e.g. co-authorship networks such as DBLP) Anomaly detection (e.g. infected hosts, potential suspects) Link Prediction Keyword search Content-based Image Retrieval Fighting spam Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 7

7 Proximity measures for graphs n Several metrics: shortest paths, commute time, hitting time, SimRank, n Prevalent (robust) metric: Personalized PageRank I 1 J PPR captures: many, A 1 H 1 B - short, 1 1 D - heavy-weighted paths E F G Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 8

8 PPR is based on RWR Slides adapted from Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 9

9 Problem Definition Maintain A LARGE, time- varying, edge- weighted graph G(t), so that we can answer the following query efficiently: Given a query node q in G(t) at Fme t, Find verfces in G(t) that are close to q (w.r.t. the Personalized PageRank score) Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 10

10 Road Map n n n n Motivation Problem Definition Previous work Our Approach q q n n n Graph clustering Intra-Cluster & Inter-Cluster Random Walks (baby steps & BIG steps) Time-Varying Graphs Experiments Conclusions Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 11

11 Previous Work on PPR n n n n n n n n D. Fogaras, B. Rcz, K. Csalogny, Tams Sarls. Towards scaling fully personalized pagerank. In Internet Mathematics Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast Random Walk with Restart and Its Applications. In ICDM Soumen Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW H. Tong, S. Papadimitriou, P. S. Yu and C. Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. In SDM P. Sarkar, A. W. Moore. Fast nearest-neighbor search in disk-resident graphs. In KDD Bahman Bahmani, Abdur Chowdhury, Ashish Goel: Fast Incremental and Personalized PageRank. In PVLDB Bahman Bahmani, Kaushik Chakrabarti, Dong Xin: Fast personalized PageRank on MapReduce. In SIGMOD P. A Lofgren, S. Banerjee, A. Goel, C. Seshadhri. FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs. In KDD We consider both large AND time-varying graphs! Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 12

12 Our Method ClusterRank 1) Pre-computation a. Graph clustering b. Compute meta-info for each cluster 2) Query processing a. Identify relevant clusters to consider b. Combine their meta-info to compute an answer Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 13

13 Graph Clustering n We work with large graphs (that do not fit in main memory), thus cluster vertices such that each cluster is small enough. n Need good clusters many intra-cluster edges, but few inter-cluster edges. q Random walks more likely to stay within cluster q Good cluster is already a good approximation of close neighborhood of vertices in cluster Note: For some cases, graph could be clustered naturally (e.g. Web graph across many servers) Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 14

14 Graph Clustering n Many graph clustering algorithms, e.g. based on community detection, spectral partitioning, etc. n Reid Andersen, Fan Chung, and Kevin Lang (ACL). Local Graph Partitioning using PageRank Vectors. FOCS, n Advantages: q Local algorithm complexity depends on output cluster size q Gives different size clusters which can be overlapping q Can do clustering while graph is on disk Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 15

15 What is good clustering? ACL [FOCS06] s measure is conductance: ϵ [0, 1] Φ = 3 / ( ) = 0.17 Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 16

16 Graph Clustering G Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 17

17 Our Method ClusterRank 1) Pre-computation a. Graph clustering b. Compute meta-info for each cluster 2) Query processing a. Identify relevant clusters to consider b. Combine their meta-info to compute an answer Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 18

18 Compute meta-info for each cluster C(u,v) : The expected number of times (Count) a RW starting at node u in cluster S hits node v, before exiting S (can exit by walking to another cluster or by restarting to q). E(u,v) : Expected probability that a RW starting at node u in cluster S Exits S to node v (out-bound node in B) (assuming query (restart) vertex q is outside S). C matrix for S is 5x5 ( S x S ) E matrix is 5x3 ( S x 2 B + q ) S Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 19

19 Compute meta-info for each cluster Intra-cluster random-walks à baby steps S3 S2 q S4 S1 Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 20

20 Compute meta-info for each cluster Recursive definition for C T(u, v) : transition probability from u to v N(u) : neighbor nodes set of node u (1 α) : restart probability S : set of nodes in given cluster Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 21

21 Compute meta-info for each cluster Closed-form formulae for C and E Similarly, : S x S transition matrix : S x ( B +1) matrix with exit prob.s to nodes in B U {q} Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 22

22 Our Method ClusterRank 1) Pre-computation a. Graph clustering OFFLINE b. Compute meta-info for each cluster 2) Query processing ONLINE a. Identify relevant clusters to consider b. Combine their meta-info to compute an answer Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 23

23 Query processing Update meta-info for q s cluster C q (C given q) : E q (E given q) : C q K : S x S 0s matrix with column q all 1s (rank 1!) à Fast Sherman-Morrison matrix inverse update Recall: Closed-form formulae for C and E Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 24

24 Query processing Inter-cluster Graph M over relevant clusters S3 S2 q S4 S1 Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 25

25 Query processing Inter-cluster random-walks à BIG steps M q Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 26

26 Query processing Combine intra- and inter- cluster meta-info to compute final answer ( lift C matrices) S3 S2 S4 S1 Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 27

27 Query processing Combine intra- and inter- cluster meta-info to compute final answer ( lift C matrices) S3 S2 S4 S1 Theorem: ClusterRank gives exact PPR scores. Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 28

28 Road Map n n n n Motivation Problem Definition Previous work Our Approach q q n n n Graph clustering Intra-Cluster & Inter-Cluster Random Walks (baby steps & BIG steps) Time-Varying Graphs Experiments Conclusions Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 29

29 Time-varying ClusterRank n WLOG: assume single edge (u,v) added n Observation: changes in & low-rank à compute new C & E by SM formula n 4 cases studied in paper: q Both u and v new vertices q Either u or v is a new vertex q u and v in same cluster q u and v in different clusters Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 30

30 Graph datasets Dataset #edges #nodes #clusters description Synthetic 909K 300K 100 Planted partitions Amazon 900K 262K 3739 Product co-purchase Web 1.1M 325K links DBLP 1.1M 329K 4670 Co-authorships LiveJournal 21.5M 2.7M Friendships Dataset median Φ avg. Φ med. size avg. size Amazon Web DBLP LiveJournal Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 31

31 Pre-computation Pre-computation time depends on 1) graph size, 2) #clusters, 3) parallelization Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 32

32 Query Processing: set up n Instead of all clusters, focus on a subset of relevant clusters (small neighborhoods around query vertex) (1,2-hop away). n Allow for maximum of B boundary vertices n Sparsify inter-cluster matrix: zero-out entries close to zero n 100 randomly chosen query vertices Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 33

33 Evaluation criterion n n We report accuracy and running time for k nearest neighbor (knn) queries. Accuracy = Relative Average Goodness (RAG) RAG(@k) = total true score of output total true score of optimum Note: precision, i.e. overlap with optimum, is *not* a good measure (due to ties/near-ties). Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 34

34 Synthetic graphs SYNTHETIC 2-HOP 1-HOP Average RAG (50) score (100 runs) B = 5K B = 1K ClusterRank Average Response Time (sec.) B = 5K B = 1K Brute-Force 5.16 sec.s Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 35

35 Real graphs Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 36

36 Dynamic updates n 500K edge DBLP graph + 1K new edges Avg: seconds Avg: 2.78 clusters Note: load/store time of C, E matrices included Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 37

37 Dynamic updates DBLP K edges in time +500K edges in time Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 38

38 Summary n ClusterRank: k Nearest Neighbor queries based on Personalized Pagerank scores q Works with large and time-evolving graphs q Fast query time: sub-linear computation on pre-computed meta-info q Efficient dynamic updates by low-rank matrices q Disk-based: query processing and dynamic updates only on relevant subset of clusters n Future directions q Cluster tracking and localized re-clustering q Extension to hitting / commute time Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 39

39 Thank You! Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 40

40 Back-up Slides

41 Recursive definition for E T(u, v) : transition probability from u to v N(u) : neighbor nodes set of node u (1 α) : restart probability S : set of nodes in given cluster Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 42

42 Closed formulations for C and E C 1 is an identity matrix of S x S Similary, Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 43

43 What if s (query vertex) ϵ S? Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 44

44 At query time, given the query vertex s, those two matrices in which s resides in is updated only. K is rank 1! Therefore, we will use the Sherman-Morrison Lemma to update C. Complexity: Multiplication of S x1 and 1x S vectors Note that we do not need to run SVD as K is rank-1 only! Leman Akoglu Fast Nearest Neighbor Search on Large Time-Evolving Graphs 45

Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu

Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu Fast Personalized PageRank On MapReduce Authors: Bahman Bahmani, Kaushik Chakrabart, Dong Xin In SIGMOD 2011 Presenter: Adams Wei Yu March 2015, CMU Graph data is Ubiquitous Basic Problem in Graphs: How

More information

Fast Inbound Top- K Query for Random Walk with Restart

Fast Inbound Top- K Query for Random Walk with Restart Fast Inbound Top- K Query for Random Walk with Restart Chao Zhang, Shan Jiang, Yucheng Chen, Yidan Sun, Jiawei Han University of Illinois at Urbana Champaign czhang82@illinois.edu 1 Outline Background

More information

Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

More information

Rare Category Detection on Time-Evolving Graphs

Rare Category Detection on Time-Evolving Graphs Rare Category Detection on Time-Evolving Graphs Dawei Zhou Arizona State University davidchouzdw@gmail.com Kangyang Wang Arizona State University wky91916@gmail.com Nan Cao IBM T.J. Watson Research nan.cao@gmail.com

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs

BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs ABSTRACT Kijung Shin Seoul National University koreaskj@snu.ac.kr Lee Sael The State University of New York (SUNY) Korea sael@sunykorea.ac.kr

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs

More information

GIVEN a large graph and a query node, finding its k-

GIVEN a large graph and a query node, finding its k- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL., NO., 2016 1 Efficient and Exact Local Search for Random Walk Based Top-K Proximity Query in Large Graphs Yubao Wu, Ruoming Jin, and Xiang Zhang

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Fast Nearest-neighbor Search in Disk-resident Graphs. February 2010 CMU-ML

Fast Nearest-neighbor Search in Disk-resident Graphs. February 2010 CMU-ML Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar Andrew W. Moore February 2010 CMU-ML-10-100 Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar February 5, 2010

More information

Graph Exploration: Taking the User into the Loop

Graph Exploration: Taking the User into the Loop Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs

TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs Minji Yoon Seoul National University riin55@snu.ac.kr Jinhong Jung Seoul National University jinhongjung@snu.ac.kr

More information

arxiv: v1 [cs.db] 31 Jan 2012

arxiv: v1 [cs.db] 31 Jan 2012 Fast and Exact Top-k Search for Random Walk with Restart Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, Masaru Kitsuregawa NTT Cyber Space Labs, NTT Cyber Solutions Labs, The University of Tokyo

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

Graph Mining: Overview of different graph models

Graph Mining: Overview of different graph models Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road Anomaly detection (previous lecture)

More information

Fast Random Walk with Restart: Algorithms and Applications U Kang Dept. of CSE Seoul National University

Fast Random Walk with Restart: Algorithms and Applications U Kang Dept. of CSE Seoul National University Fast Random Walk with Restart: Algorithms and Applications U Kang Dept. of CSE Seoul National University U Kang (SNU) 1 Today s Talk RWR for ranking in graphs: important problem with many real world applications

More information

Analysis of Large Graphs: TrustRank and WebSpam

Analysis of Large Graphs: TrustRank and WebSpam Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Clustering. Bruno Martins. 1 st Semester 2012/2013

Clustering. Bruno Martins. 1 st Semester 2012/2013 Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts

More information

Local Higher-Order Graph Clustering

Local Higher-Order Graph Clustering Local Higher-Order Graph Clustering ABSTRACT Hao Yin Stanford University yinh@stanford.edu Jure Leskovec Stanford University jure@cs.stanford.edu Local graph clustering methods aim to find a cluster of

More information

Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks

Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Shaoli Bu bsl89723@gmail.com Zhaohui Peng pzh@sdu.edu.cn Abstract Relevance search in

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

QUINT: On Query-Specific Optimal Networks

QUINT: On Query-Specific Optimal Networks QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Yuan Yao (NJU) -1- Jie Tang (Tsinghua) Wei Fan (Baidu) Hanghang Tong (ASU) Node Proximity: What? Node proximity: the closeness

More information

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document

More information

Local higher-order graph clustering

Local higher-order graph clustering Local higher-order graph clustering Hao Yin Stanford University yinh@stanford.edu Austin R. Benson Cornell University arb@cornell.edu Jure Leskovec Stanford University jure@cs.stanford.edu David F. Gleich

More information

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection. CSCI-B609: A Theorist s Toolkit, Fall 016 Sept. 6, 016 Lecture 03: The Sparsest Cut Problem and Cheeger s Inequality Lecturer: Yuan Zhou Scribe: Xuan Dong We will continue studying the spectral graph theory

More information

SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING

SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING Alireza Chakeri, Hamidreza Farhidzadeh, Lawrence O. Hall Department of Computer Science and Engineering College of Engineering University of South Florida

More information

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems,

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Proximity Tracking on Time-Evolving Bipartite Graphs

Proximity Tracking on Time-Evolving Bipartite Graphs Proximity Tracking on Time-Evolving Bipartite Graphs Hanghang Tong Spiros Papadimitriou Philip S. Yu Christos Faloutsos Abstract Given an author-conference network that evolves over time, which are the

More information

Learning to Rank Networked Entities

Learning to Rank Networked Entities Learning to Rank Networked Entities Alekh Agarwal Soumen Chakrabarti Sunny Aggarwal Presented by Dong Wang 11/29/2006 We've all heard that a million monkeys banging on a million typewriters will eventually

More information

A Local Algorithm for Structure-Preserving Graph Cut

A Local Algorithm for Structure-Preserving Graph Cut A Local Algorithm for Structure-Preserving Graph Cut Presenter: Dawei Zhou Dawei Zhou* (ASU) Si Zhang (ASU) M. Yigit Yildirim (ASU) Scott Alcorn (Early Warning) Hanghang Tong (ASU) Hasan Davulcu (ASU)

More information

Snowball Sampling a Large Graph

Snowball Sampling a Large Graph Snowball Sampling a Large Graph William Cohen Out March 20, 2013 Due Wed, April 3, 2013 via Blackboard 1 Background A snowball sample of a graph starts with some set of seed nodes of interest, and then

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

Local Partitioning using PageRank

Local Partitioning using PageRank Local Partitioning using PageRank Reid Andersen Fan Chung Kevin Lang UCSD, UCSD, Yahoo! What is a local partitioning algorithm? An algorithm for dividing a graph into two pieces. Instead of searching for

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010 Finding and Visualizing Graph Clusters Using PageRank Optimization Fan Chung and Alexander Tsiatas, UCSD WAW 2010 What is graph clustering? The division of a graph into several partitions. Clusters should

More information

A two-stage strategy for solving the connection subgraph problem

A two-stage strategy for solving the connection subgraph problem Graduate Theses and Dissertations Graduate College 2012 A two-stage strategy for solving the connection subgraph problem Heyong Wang Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

arxiv: v1 [cs.si] 18 Oct 2017

arxiv: v1 [cs.si] 18 Oct 2017 Supervised and Extended Restart in Random Walks for Ranking and Link Prediction in Networks Woojeong Jin Jinhong Jung U Kang arxiv:1710.06609v1 [cs.si] 18 Oct 2017 Abstract Given a real-world graph, how

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction

More information

The link prediction problem for social networks

The link prediction problem for social networks The link prediction problem for social networks Alexandra Chouldechova STATS 319, February 1, 2011 Motivation Recommending new friends in in online social networks. Suggesting interactions between the

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

arxiv: v1 [cs.si] 2 Dec 2017

arxiv: v1 [cs.si] 2 Dec 2017 Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees arxiv:112.00595v1 [cs.si] 2 Dec 201 ABSTRACT Minji Yoon Seoul National University riin55@snu.ac.kr Given a time-evolving graph,

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/12/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 3/12/2014 Jure

More information

B490 Mining the Big Data. 5. Models for Big Data

B490 Mining the Big Data. 5. Models for Big Data B490 Mining the Big Data 5. Models for Big Data Qin Zhang 1-1 2-1 MapReduce MapReduce The MapReduce model (Dean & Ghemawat 2004) Input Output Goal Map Shuffle Reduce Standard model in industry for massive

More information

A Dynamic Algorithm for Local Community Detection in Graphs

A Dynamic Algorithm for Local Community Detection in Graphs A Dynamic Algorithm for Local Community Detection in Graphs Anita Zakrzewska and David A. Bader Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia 30332 azakrzewska3@gatech.edu,

More information

The extendability of matchings in strongly regular graphs

The extendability of matchings in strongly regular graphs The extendability of matchings in strongly regular graphs Sebastian Cioabă Department of Mathematical Sciences University of Delaware Villanova, June 5, 2014 Introduction Matching A set of edges M of a

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Seed Noise in Personalized PageRank

Seed Noise in Personalized PageRank Can you really trust that seed? : Reducing the Impact of Seed Noise in Personalized PageRank Shengyu Huang, Xinsheng Li Arizona State University Tempe, AZ 8587, USA Email: shengyu.huang@asu.edu, lxinshen@asu.edu

More information

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today 3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

More information

Distribution-Free Models of Social and Information Networks

Distribution-Free Models of Social and Information Networks Distribution-Free Models of Social and Information Networks Tim Roughgarden (Stanford CS) joint work with Jacob Fox (Stanford Math), Rishi Gupta (Stanford CS), C. Seshadhri (UC Santa Cruz), Fan Wei (Stanford

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees

Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees ABSTRACT Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees Minji Yoon Seoul National University riin55@snu.ac.kr Given a time-evolving graph, how can we track similarity between

More information

Mining for Patterns and Anomalies in Data Streams. Sampath Kannan University of Pennsylvania

Mining for Patterns and Anomalies in Data Streams. Sampath Kannan University of Pennsylvania Mining for Patterns and Anomalies in Data Streams Sampath Kannan University of Pennsylvania The Problem Data sizes too large to fit in primary memory Devices with small memory Access times to secondary

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Optimized Graph-Based Trust Mechanisms using Hitting Times

Optimized Graph-Based Trust Mechanisms using Hitting Times Optimized Graph-Based Trust Mechanisms using Hitting Times Alejandro Buendia New York, NY 10027 alb2281@columbia.edu Daniel Boley Computer Science and Engineering Minneapolis, MN 55455 boley@umn.edu Columbia

More information

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

Neighborhood Formation and Anomaly Detection in Bipartite Graphs Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun 1 Huiming Qu 2 Deepayan Chakrabarti 3 Christos Faloutsos 1 1 Carnegie Mellon Univ. {jimeng, christos}@cs.cmu.edu 2 Univ. of Pittsburgh

More information

Predicting Disease-related Genes using Integrated Biomedical Networks

Predicting Disease-related Genes using Integrated Biomedical Networks Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng (jiajiepeng@nwpu.edu.cn) HanshengXue(xhs1892@gmail.com) Jin Chen* (chen.jin@uky.edu) Yadong Wang* (ydwang@hit.edu.cn) 1

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Challenges in Multiresolution Methods for Graph-based Learning

Challenges in Multiresolution Methods for Graph-based Learning Challenges in Multiresolution Methods for Graph-based Learning Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google

More information

TupleRank: Ranking Relational Databases using Random Walks on Extended K-partite Graphs

TupleRank: Ranking Relational Databases using Random Walks on Extended K-partite Graphs TupleRank: Ranking Relational Databases using Random Walks on Extended K-partite Graphs Jiyang Chen, Osmar R. Zaïane, Randy Goebel, Philip S. Yu 2 Department of Computing Science, University of Alberta

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Local Community Detection in Dynamic Graphs Using Personalized Centrality algorithms Article Local Community Detection in Dynamic Graphs Using Personalized Centrality Eisha Nathan, Anita Zakrzewska, Jason Riedy and David A. Bader * School of Computational Science and Engineering,

More information

Fast Random Walk with Restart and Its Applications

Fast Random Walk with Restart and Its Applications Fast Random Walk with Restart and Its Applications Hanghang Tong Carnegie Mellon University htong@cs.cmu.edu Christos Faloutsos Carnegie Mellon University christos@cs.cmu.edu Jia-Yu Pan Carnegie Mellon

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

Using Spam Farm to Boost PageRank p. 1/2

Using Spam Farm to Boost PageRank p. 1/2 Using Spam Farm to Boost PageRank Ye Du Joint Work with: Yaoyun Shi and Xin Zhao University of Michigan, Ann Arbor Using Spam Farm to Boost PageRank p. 1/2 Roadmap Introduction: Link Spam and PageRank

More information

MCHITS: Monte Carlo based Method for Hyperlink Induced Topic Search on Networks

MCHITS: Monte Carlo based Method for Hyperlink Induced Topic Search on Networks 2376 JOURNAL OF NEWORKS, VOL. 8, NO. 10, OCOBER 2013 MCHIS: Monte Carlo based Method for Hyperlink Induced opic Search on Networks Zhaoyan Jin, Dianxi Shi, Quanyuan Wu, and Hua Fan National Key Laboratory

More information

Local Algorithms for Sparse Spanning Graphs

Local Algorithms for Sparse Spanning Graphs Local Algorithms for Sparse Spanning Graphs Reut Levi Dana Ron Ronitt Rubinfeld Intro slides based on a talk given by Reut Levi Minimum Spanning Graph (Spanning Tree) Local Access to a Minimum Spanning

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Succinct Representation of Separable Graphs

Succinct Representation of Separable Graphs Succinct Representation of Separable Graphs Arash Farzan Max-Planck-Institute for Computer Science Guy E. Blelloch Computer Science Department, Carnegie Mellon University Overview Preliminaries: - succinctness

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory An introductory (i.e. foundational) level graduate course.

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory An introductory (i.e. foundational) level graduate course. CSC2420 Fall 2012: Algorithm Design, Analysis and Theory An introductory (i.e. foundational) level graduate course. Allan Borodin November 8, 2012; Lecture 9 1 / 24 Brief Announcements 1 Game theory reading

More information

A Scalable Approach to Size-Independent Network Similarity

A Scalable Approach to Size-Independent Network Similarity A Scalable Approach to Size-Independent Network Similarity Michele Berlingerio Danai Koutra Tina Eliassi-Rad Christos Faloutsos IBM Research Dublin Carnegie Mellon University Rutgers University mberling@ie.ibm.com

More information

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1 Graphs and Networks Page /7: Lecture 8. Clustering Heuristics Wednesday, November 8, 26 8:49 AM Today we will talk about clustering and partitioning in graphs, and sometimes in data sets. Partitioning

More information

Preserving Personalized Pagerank in Subgraphs

Preserving Personalized Pagerank in Subgraphs Andrea Vattani UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093 Deepayan Chakrabarti Yahoo! Research, 701 1st Avenue, Sunnyvale, CA 94089 Maxim Gurevich Yahoo! Research, 701 1st Avenue, Sunnyvale, CA

More information

Distributed Algorithms on Exact Personalized PageRank

Distributed Algorithms on Exact Personalized PageRank Distributed Algorithms on Exact Personalized PageRank Tao Guo Xin Cao 2 Gao Cong Jiaheng Lu 3 Xuemin Lin 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore 2 School

More information

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia How well do CPU, GPU and Hybrid Graph Processing Frameworks Perform? Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia Networked Systems

More information

Approximation Algorithms

Approximation Algorithms Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs

More information

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

DSCI 575: Advanced Machine Learning. PageRank Winter 2018 DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information