Similarity Ranking in Large- Scale Bipartite Graphs

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Similarity Ranking in Large- Scale Bipartite Graphs"

Transcription

1 Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March

2 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2

3 AdWords Ads Ads

4 Our Goal Tackling AdWords data to identify automatically, for each advertiser, its main competitors and suggest relevant queries to each advertiser. Goals: Useful business information. Improve advertisement. More relevant performance benchmarks.

5 The Data Query Nike store New York Soccer shoes Information Market Segment: Retailer, Geo: NY (USA), Stats: 10 clicks Market Segment: Apparel, Geo: London, UK, Stats: 4 clicks Soccer ball Market Segment: Equipment Geo: San Francisco (USA), Stats: 5 clicks. millions of other queries. Large advertisers (e.g., Amazon, Ask.com, etc) compete in several market segments with very different advertisers.

6 Modeling the Data as a Bipartite Graph Hundreds of Labels Millions of Advertisers Billions of Queries

7 Other Applications General approach applicable to several contexts: User, Movies, Categories: find similar users and suggest movies. Authors, Papers, Conferences: find related authors and suggest papers to read. Generally this bipartite graphs are lopsided: we want algorithms with complexity depending on the smaller side.

8 Semi-Formal Problem Definition Advertisers Queries

9 Semi-Formal Problem Definition A Advertisers Queries

10 Semi-Formal Problem Definition A Advertisers Labels: Queries

11 Semi-Formal Problem Definition A Advertisers Labels: Queries Goal: Find the nodes most similar to A.

12 How to Define Similarity? We address the computation of several node similarity measures: Neighborhood based: Common neighbors, Jaccard Coefficient, Adamic-Adar. Paths based: Katz. Random Walk based: Personalized PageRank. What is the accuracy? Can it scale to huge graphs? Can be computed in real-time?

13 Our Contribution Reduce and Aggregate: general approach to induce real-time similarity rankings in multicategorical bipartite graphs, that we apply to several similarity measures. Theoretical guarantees for the precision of the algorithms. Experimental evaluation with real world data.

14 The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v. Personalized PageRank For a node v (the seed) and a probability alpha v u

15 Personalized PageRank Extensive algorithmic literature. Very good accuracy in our experimental evaluation compared to other similarities (Jaccard, Intersection, etc.). Efficient MapReduce algorithm scaling to large graphs (hundred of millions of nodes). However

16 Personalized PageRank Our graphs are too big (billions of nodes) even for large-scale systems. MapReduce is not real-time. We cannot pre-compute the rankings for each subset of labels.

17 Reduce and Aggregate Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph. Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.

18 In practice First stage: Large-scale (but feasible) MapReduce pre-computation of the individual category reduced graphs. Second Stage: Fast real-time algorithm aggregation algorithm.

19 Reduce for Personalized PageRank Side A Side B Markov Chain state aggregation theory (Simon and Ado, 61; Meyer 89, etc.). 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph. Side A

20 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i

21 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i

22 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i

23 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i

24 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i

25 Stochastic Complementation Theorem [Meyer 89] For every irreducible aperiodic Markov Chain, i = t i s i i where is the stationary distribution of the nodes in and is the stationary distribution of C i s i Si

26 Stochastic Complementation Computing the stochastic complements is unfeasible in general for large matrices (matrix inversion). In our case we can exploit the properties of random walks on Bipartite graphs to invert the matrix analytically.

27 Reduce for PPR Side A x w(x, z) Side B z y w(y, z)

28 Reduce for PPR Side A x w(x, z) Side B w(x, y) z y w(y, z) w(x, y) = X z2n(x)[n(y) w(x, z)w(y, z) P h2n(z)w(z,h)

29 Reduce for PPR Side A x w(x, z) Side B w(x, y) z y w(y, z) One step in the reduced graph is equivalent to two steps in the bipartite graph.

30 Properties of the Reduced Graph Lemma 1: PPR(G,,a)[A] = 1 2 PPR(Ĝ, 2 2,a) Proof Sketch: Every path between nodes in A is even. Probability of not jumping for two steps. The probability of being in the A-Side at stationarity does not depend on the graph.

31 Properties of the Reduced Graph Similarly, we can reduce the process to a graph with B-Side nodes only. Lemma 2: PPR(G,,a)[B] = 1 2 P b2n(a) w(a, b)ppr(ĝb, 2 2,b) Finally, the stationary distribution of either side uniquely determines that of the other side.

32 Koury et al. Aggregation-Disaggregation Algorithm A B Step 1: Partition the Markov chain into disjoint subsets

33 Koury et al. Aggregation-Disaggregation Algorithm A A B B Step 2: Approximate the stationary distribution on each subset independently.

34 Koury et al. Aggregation-Disaggregation Algorithm P AA A P B AB B A P BA Step 3: Compute the k x k approximated transition matrix T between the subsets. P BB

35 Koury et al. Aggregation-Disaggregation Algorithm t A t B P AA A P B AB B A P BA Step 4: Compute the stationary distribution of T. P BB

36 Koury et al. Aggregation-Disaggregation Algorithm t A t B P AA A B 0 A P AB 0 B P BA Step 5: Based on the stationary distribution improve the estimation of A and. Repeat until convergence. B P BB

37 Aggregation in PPR X A Y A Precompute the stationary distributions individually

38 Aggregation in PPR X B Y B Precompute the stationary distributions individually

39 Aggregation in PPR A B The two subsets are not disjoint

40 Reduction to the Query Side X Y A B

41 Reduction to the Query Side X Y A B This is the larger side of the graph.

42 Our Approach A B X Y X Y We tackle the bijective relationships between the stationary distributions of the two sides. The algorithm is based only on the reduced graphs with Advertiser-Side nodes. The aggregation algorithm is scalable and converges to the correct distribution.

43 Experimental Evaluation We experimented with publicly available and proprietary datasets: Query-Ads graph from Google AdWords > 1.5 billions nodes, > 5 billions edges. DBLP Author-Papers and Patent Inventor- Inventions graphs. Ground-Truth clusters of competitors in Google AdWords.

44 Patent Graph Precision vs Recall Inter Jaccard Adamic-Adar Katz PPR Precision Precision Recall Recall

45 Google AdWords Precision Recall

46 Convergence after One Iteration

47 Convergence Approximation Error vs # Iterations DBLP (1 - Cosine) Patent (1 - Cosine) 1-Cosine Similarity 1-Cosine e-05 1e Iterations Iterations

48 Conclusions and Future Work Good accuracy and fast convergence. The framework can be applied to other problems and similarity measures. Future work could focus on the case where categories are not disjoint is relevant.

49 Thank you for your attention

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Graph Algorithms Matching

Graph Algorithms Matching Chapter 5 Graph Algorithms Matching Algorithm Theory WS 2012/13 Fabian Kuhn Circulation: Demands and Lower Bounds Given: Directed network, with Edge capacities 0and lower bounds l for Node demands for

More information

Ego-net Community Mining Applied to Friend Suggestion

Ego-net Community Mining Applied to Friend Suggestion Ego-net Community Mining Applied to Friend Suggestion Alessandro Epasto Brown University epasto@cs.brown.edu Ismail Oner Sebe Google, Mountain View sebe@google.com Silvio Lattanzi Google, New York silviol@google.com

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent

More information

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet

More information

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur

More information

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5) INTRODUCTION TO DATA SCIENCE Link Analysis (MMDS5) Introduction Motivation: accurate web search Spammers: want you to land on their pages Google s PageRank and variants TrustRank Hubs and Authorities (HITS)

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 12: Link Analysis January 28 th, 2016 Wolf-Tilo Balke and Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig An Overview

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

A graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial.

A graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial. 2301-670 Graph theory 1.1 What is a graph? 1 st semester 2550 1 1.1. What is a graph? 1.1.2. Definition. A graph G is a triple (V(G), E(G), ψ G ) consisting of V(G) of vertices, a set E(G), disjoint from

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

TODAY S LECTURE HYPERTEXT AND

TODAY S LECTURE HYPERTEXT AND LINK ANALYSIS TODAY S LECTURE HYPERTEXT AND LINKS We look beyond the content of documents We begin to look at the hyperlinks between them Address questions like Do the links represent a conferral of authority

More information

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

Local Higher-Order Graph Clustering

Local Higher-Order Graph Clustering Local Higher-Order Graph Clustering ABSTRACT Hao Yin Stanford University yinh@stanford.edu Jure Leskovec Stanford University jure@cs.stanford.edu Local graph clustering methods aim to find a cluster of

More information

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri CS 598CSC: Approximation Algorithms Lecture date: March, 011 Instructor: Chandra Chekuri Scribe: CC Local search is a powerful and widely used heuristic method (with various extensions). In this lecture

More information

A Parallel Algorithm for Finding Sub-graph Isomorphism

A Parallel Algorithm for Finding Sub-graph Isomorphism CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Sub-graph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 21: Link Analysis Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-06-18 1/80 Overview

More information

Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni

Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni Mainly based on joint work with: Algorithms Research Group, Google Research, New York Hossein Bateni, Aditya Bhaskara, Hossein

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Data Mining. Jeff M. Phillips. January 8, 2014

Data Mining. Jeff M. Phillips. January 8, 2014 Data Mining Jeff M. Phillips January 8, 2014 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data

More information

x = 12 x = 12 1x = 16

x = 12 x = 12 1x = 16 2.2 - The Inverse of a Matrix We've seen how to add matrices, multiply them by scalars, subtract them, and multiply one matrix by another. The question naturally arises: Can we divide one matrix by another?

More information

The PageRank Citation Ranking

The PageRank Citation Ranking October 17, 2012 Main Idea - Page Rank web page is important if it points to by other important web pages. *Note the recursive definition IR - course web page, Brian home page, Emily home page, Steven

More information

COMP Page Rank

COMP Page Rank COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper

More information

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

Modelling TCP with Markov chains

Modelling TCP with Markov chains Modelling TCP with Markov chains Keith Briggs Keith.Briggs@bt.com more.btexact.com/people/briggsk2/ 2002 June 17 typeset in L A T E X2e on a linux system Modelling TCP p.1/30 Outline Markov chain basics

More information

Bipartite graphs unique perfect matching.

Bipartite graphs unique perfect matching. Generation of graphs Bipartite graphs unique perfect matching. In this section, we assume G = (V, E) bipartite connected graph. The following theorem states that if G has unique perfect matching, then

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Sampling Large Graphs for Anticipatory Analysis

Sampling Large Graphs for Anticipatory Analysis Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015

More information

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Local Community Detection in Dynamic Graphs Using Personalized Centrality algorithms Article Local Community Detection in Dynamic Graphs Using Personalized Centrality Eisha Nathan, Anita Zakrzewska, Jason Riedy and David A. Bader * School of Computational Science and Engineering,

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

9 About Intersection Graphs

9 About Intersection Graphs 9 About Intersection Graphs Since this lecture we focus on selected detailed topics in Graph theory that are close to your teacher s heart... The first selected topic is that of intersection graphs, i.e.

More information

Social Network Analysis

Social Network Analysis Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Centrality Analysis techniques

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

PERSONALIZED TAG RECOMMENDATION

PERSONALIZED TAG RECOMMENDATION PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science. September 21, 2017

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science. September 21, 2017 E6893 Big Data Analytics Lecture 3: Big Data Storage and Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 21, 2017 1 E6893 Big Data Analytics

More information

Large-scale Graph Google NY

Large-scale Graph Google NY Large-scale Graph Mining @ Google NY Vahab Mirrokni Google Research New York, NY DIMACS Workshop Large-scale graph mining Many applications Friend suggestions Recommendation systems Security Advertising

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

Fast Inbound Top- K Query for Random Walk with Restart

Fast Inbound Top- K Query for Random Walk with Restart Fast Inbound Top- K Query for Random Walk with Restart Chao Zhang, Shan Jiang, Yucheng Chen, Yidan Sun, Jiawei Han University of Illinois at Urbana Champaign czhang82@illinois.edu 1 Outline Background

More information

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

CS7540 Spectral Algorithms, Spring 2017 Lecture #2. Matrix Tree Theorem. Presenter: Richard Peng Jan 12, 2017

CS7540 Spectral Algorithms, Spring 2017 Lecture #2. Matrix Tree Theorem. Presenter: Richard Peng Jan 12, 2017 CS7540 Spectral Algorithms, Spring 2017 Lecture #2 Matrix Tree Theorem Presenter: Richard Peng Jan 12, 2017 DISCLAIMER: These notes are not necessarily an accurate representation of what I said during

More information

Lec 8: Adaptive Information Retrieval 2

Lec 8: Adaptive Information Retrieval 2 Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:

More information

Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun

Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun Routing State Distance: A Path-based Metric for Network Analysis Gonca Gürsun joint work with Natali Ruchansky, Evimaria Terzi, Mark Crovella Distance Metrics for Analyzing Routing Shortest Path Similar

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I

CLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I Sutra: International Journal of Mathematical Science Education, Technomathematics Research Foundation Vol. 1, No. 1, 30-35, 2008 CLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I R. Akerkar Technomathematics

More information

r=1 The Binomial Theorem. 4 MA095/98G Revision

r=1 The Binomial Theorem. 4 MA095/98G Revision Revision Read through the whole course once Make summary sheets of important definitions and results, you can use the following pages as a start and fill in more yourself Do all assignments again Do the

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Data Mining. Jeff M. Phillips. January 9, 2013

Data Mining. Jeff M. Phillips. January 9, 2013 Data Mining Jeff M. Phillips January 9, 2013 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data

More information

Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs ayesian Networks and ecision raphs hapter 7 hapter 7 p. 1/27 Learning the structure of a ayesian network We have: complete database of cases over a set of variables. We want: ayesian network structure

More information

CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

More information

Training Deep Neural Networks (in parallel)

Training Deep Neural Networks (in parallel) Lecture 9: Training Deep Neural Networks (in parallel) Visual Computing Systems How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors as

More information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

Models and Algorithms for Complex Networks

Models and Algorithms for Complex Networks Models and Algorithms for Complex Networks with network with parametrization, elements maintaining typically characteristic local profiles with categorical attributes [C. Faloutsos MMDS8] Milena Mihail

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Differentially Private Algorithm and Auction Configuration

Differentially Private Algorithm and Auction Configuration Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch October 11, 2017 Joint work with Nina Balcan and Travis Dick $1 Prices learned from purchase histories can reveal

More information

The Necessity of Mathematics

The Necessity of Mathematics The Necessity of Mathematics from Google to Counterterrorism to Sudoku Amy Langville langvillea@cofc.edu work supported by NSF-CAREER-0566, NSA, DOEd, SAS, Semandex Mathematics Department College of Charleston

More information

On 2-Subcolourings of Chordal Graphs

On 2-Subcolourings of Chordal Graphs On 2-Subcolourings of Chordal Graphs Juraj Stacho School of Computing Science, Simon Fraser University 8888 University Drive, Burnaby, B.C., Canada V5A 1S6 jstacho@cs.sfu.ca Abstract. A 2-subcolouring

More information

In class 75min: 2:55-4:10 Thu 9/30.

In class 75min: 2:55-4:10 Thu 9/30. MATH 4530 Topology. In class 75min: 2:55-4:10 Thu 9/30. Prelim I Solutions Problem 1: Consider the following topological spaces: (1) Z as a subspace of R with the finite complement topology (2) [0, π]

More information

Constant Contact. Responsyssy. VerticalResponse. Bronto. Monitor. Satisfaction

Constant Contact. Responsyssy. VerticalResponse. Bronto. Monitor. Satisfaction Contenders Leaders Marketing Cloud sy Scale Campaign aign Monitor Niche High Performers Satisfaction Email Marketing Products Products shown on the Grid for Email Marketing have received a minimum of 10

More information

Scaling Link-Based Similarity Search

Scaling Link-Based Similarity Search Scaling Link-Based Similarity Search Dániel Fogaras Budapest University of Technology and Economics Budapest, Hungary, H-1521 fd@cs.bme.hu Balázs Rácz Computer and Automation Research Institute of the

More information

Shortest Paths Algorithms and the Internet: The Distributed Bellman Ford Lecturer: Prof. Chiara Petrioli

Shortest Paths Algorithms and the Internet: The Distributed Bellman Ford Lecturer: Prof. Chiara Petrioli Shortest Paths Algorithms and the Internet: The Distributed Bellman Ford Lecturer: Prof. Chiara Petrioli Dipartimento di Informatica Rome University La Sapienza G205: Fundamentals of Computer Engineering

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering Results. Result List Example. Clustering Results. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Minimal Universal Bipartite Graphs

Minimal Universal Bipartite Graphs Minimal Universal Bipartite Graphs Vadim V. Lozin, Gábor Rudolf Abstract A graph U is (induced)-universal for a class of graphs X if every member of X is contained in U as an induced subgraph. We study

More information

Data Partitioning and MapReduce

Data Partitioning and MapReduce Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,

More information

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

US Patent 6,658,423. William Pugh

US Patent 6,658,423. William Pugh US Patent 6,658,423 William Pugh Detecting duplicate and near - duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and

More information

Improving Collection Selection with Overlap Awareness in P2P Search Engines

Improving Collection Selection with Overlap Awareness in P2P Search Engines Improving Collection Selection with Overlap Awareness in P2P Search Engines Matthias Bender Peter Triantafillou Gerhard Weikum Christian Zimmer and Improving Collection Selection with Overlap Awareness

More information

Small Survey on Perfect Graphs

Small Survey on Perfect Graphs Small Survey on Perfect Graphs Michele Alberti ENS Lyon December 8, 2010 Abstract This is a small survey on the exciting world of Perfect Graphs. We will see when a graph is perfect and which are families

More information

Section 26. Compact Sets

Section 26. Compact Sets 26. Compact Sets 1 Section 26. Compact Sets Note. You encounter compact sets of real numbers in senior level analysis shortly after studying open and closed sets. Recall that, in the real setting, a continuous

More information