Similarity Ranking in Large Scale Bipartite Graphs


 Josephine Evans
 1 years ago
 Views:
Transcription
1 Similarity Ranking in Large Scale Bipartite Graphs Alessandro Epasto Brown University  20 th March
2 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2
3 AdWords Ads Ads
4 Our Goal Tackling AdWords data to identify automatically, for each advertiser, its main competitors and suggest relevant queries to each advertiser. Goals: Useful business information. Improve advertisement. More relevant performance benchmarks.
5 The Data Query Nike store New York Soccer shoes Information Market Segment: Retailer, Geo: NY (USA), Stats: 10 clicks Market Segment: Apparel, Geo: London, UK, Stats: 4 clicks Soccer ball Market Segment: Equipment Geo: San Francisco (USA), Stats: 5 clicks. millions of other queries. Large advertisers (e.g., Amazon, Ask.com, etc) compete in several market segments with very different advertisers.
6 Modeling the Data as a Bipartite Graph Hundreds of Labels Millions of Advertisers Billions of Queries
7 Other Applications General approach applicable to several contexts: User, Movies, Categories: find similar users and suggest movies. Authors, Papers, Conferences: find related authors and suggest papers to read. Generally this bipartite graphs are lopsided: we want algorithms with complexity depending on the smaller side.
8 SemiFormal Problem Definition Advertisers Queries
9 SemiFormal Problem Definition A Advertisers Queries
10 SemiFormal Problem Definition A Advertisers Labels: Queries
11 SemiFormal Problem Definition A Advertisers Labels: Queries Goal: Find the nodes most similar to A.
12 How to Define Similarity? We address the computation of several node similarity measures: Neighborhood based: Common neighbors, Jaccard Coefficient, AdamicAdar. Paths based: Katz. Random Walk based: Personalized PageRank. What is the accuracy? Can it scale to huge graphs? Can be computed in realtime?
13 Our Contribution Reduce and Aggregate: general approach to induce realtime similarity rankings in multicategorical bipartite graphs, that we apply to several similarity measures. Theoretical guarantees for the precision of the algorithms. Experimental evaluation with real world data.
14 The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v. Personalized PageRank For a node v (the seed) and a probability alpha v u
15 Personalized PageRank Extensive algorithmic literature. Very good accuracy in our experimental evaluation compared to other similarities (Jaccard, Intersection, etc.). Efficient MapReduce algorithm scaling to large graphs (hundred of millions of nodes). However
16 Personalized PageRank Our graphs are too big (billions of nodes) even for largescale systems. MapReduce is not realtime. We cannot precompute the rankings for each subset of labels.
17 Reduce and Aggregate Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph. Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.
18 In practice First stage: Largescale (but feasible) MapReduce precomputation of the individual category reduced graphs. Second Stage: Fast realtime algorithm aggregation algorithm.
19 Reduce for Personalized PageRank Side A Side B Markov Chain state aggregation theory (Simon and Ado, 61; Meyer 89, etc.). 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph. Side A
20 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i
21 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i
22 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i
23 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i
24 Stochastic Complementation P P 1i... P 1k..... P i1... P ii... P ik..... P k1... P ki... P kk The stochastic complement of following C i C i matrix C i is the S i = P ii + P i (1 P i ) 1 P i
25 Stochastic Complementation Theorem [Meyer 89] For every irreducible aperiodic Markov Chain, i = t i s i i where is the stationary distribution of the nodes in and is the stationary distribution of C i s i Si
26 Stochastic Complementation Computing the stochastic complements is unfeasible in general for large matrices (matrix inversion). In our case we can exploit the properties of random walks on Bipartite graphs to invert the matrix analytically.
27 Reduce for PPR Side A x w(x, z) Side B z y w(y, z)
28 Reduce for PPR Side A x w(x, z) Side B w(x, y) z y w(y, z) w(x, y) = X z2n(x)[n(y) w(x, z)w(y, z) P h2n(z)w(z,h)
29 Reduce for PPR Side A x w(x, z) Side B w(x, y) z y w(y, z) One step in the reduced graph is equivalent to two steps in the bipartite graph.
30 Properties of the Reduced Graph Lemma 1: PPR(G,,a)[A] = 1 2 PPR(Ĝ, 2 2,a) Proof Sketch: Every path between nodes in A is even. Probability of not jumping for two steps. The probability of being in the ASide at stationarity does not depend on the graph.
31 Properties of the Reduced Graph Similarly, we can reduce the process to a graph with BSide nodes only. Lemma 2: PPR(G,,a)[B] = 1 2 P b2n(a) w(a, b)ppr(ĝb, 2 2,b) Finally, the stationary distribution of either side uniquely determines that of the other side.
32 Koury et al. AggregationDisaggregation Algorithm A B Step 1: Partition the Markov chain into disjoint subsets
33 Koury et al. AggregationDisaggregation Algorithm A A B B Step 2: Approximate the stationary distribution on each subset independently.
34 Koury et al. AggregationDisaggregation Algorithm P AA A P B AB B A P BA Step 3: Compute the k x k approximated transition matrix T between the subsets. P BB
35 Koury et al. AggregationDisaggregation Algorithm t A t B P AA A P B AB B A P BA Step 4: Compute the stationary distribution of T. P BB
36 Koury et al. AggregationDisaggregation Algorithm t A t B P AA A B 0 A P AB 0 B P BA Step 5: Based on the stationary distribution improve the estimation of A and. Repeat until convergence. B P BB
37 Aggregation in PPR X A Y A Precompute the stationary distributions individually
38 Aggregation in PPR X B Y B Precompute the stationary distributions individually
39 Aggregation in PPR A B The two subsets are not disjoint
40 Reduction to the Query Side X Y A B
41 Reduction to the Query Side X Y A B This is the larger side of the graph.
42 Our Approach A B X Y X Y We tackle the bijective relationships between the stationary distributions of the two sides. The algorithm is based only on the reduced graphs with AdvertiserSide nodes. The aggregation algorithm is scalable and converges to the correct distribution.
43 Experimental Evaluation We experimented with publicly available and proprietary datasets: QueryAds graph from Google AdWords > 1.5 billions nodes, > 5 billions edges. DBLP AuthorPapers and Patent Inventor Inventions graphs. GroundTruth clusters of competitors in Google AdWords.
44 Patent Graph Precision vs Recall Inter Jaccard AdamicAdar Katz PPR Precision Precision Recall Recall
45 Google AdWords Precision Recall
46 Convergence after One Iteration
47 Convergence Approximation Error vs # Iterations DBLP (1  Cosine) Patent (1  Cosine) 1Cosine Similarity 1Cosine e05 1e Iterations Iterations
48 Conclusions and Future Work Good accuracy and fast convergence. The framework can be applied to other problems and similarity measures. Future work could focus on the case where categories are not disjoint is relevant.
49 Thank you for your attention
Reduce and Aggregate: Similarity Ranking in MultiCategorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in MultiCategorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationCOMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4degrees of separation, BackstromBoldiRosaUganderVigna,
More informationGraph Algorithms Matching
Chapter 5 Graph Algorithms Matching Algorithm Theory WS 2012/13 Fabian Kuhn Circulation: Demands and Lower Bounds Given: Directed network, with Edge capacities 0and lower bounds l for Node demands for
More informationEgonet Community Mining Applied to Friend Suggestion
Egonet Community Mining Applied to Friend Suggestion Alessandro Epasto Brown University epasto@cs.brown.edu Ismail Oner Sebe Google, Mountain View sebe@google.com Silvio Lattanzi Google, New York silviol@google.com
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent
More informationPage rank computation HPC course project a.y Compute efficient and scalable Pagerank
Page rank computation HPC course project a.y. 201213 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationThe PageRank Computation in Google, Randomized Algorithms and Consensus of MultiAgent Systems
The PageRank Computation in Google, Randomized Algorithms and Consensus of MultiAgent Systems Roberto Tempo IEIITCNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss
More informationG(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu
G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Coexpression Protein Structure Program Flow Big Graphs o
More informationINTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)
INTRODUCTION TO DATA SCIENCE Link Analysis (MMDS5) Introduction Motivation: accurate web search Spammers: want you to land on their pages Google s PageRank and variants TrustRank Hubs and Authorities (HITS)
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More information1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a
!"#$ %#& ' Introduction ' Social network analysis ' Cocitation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,/*,) Early search engines mainly compare content similarity of the query and
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 12: Link Analysis January 28 th, 2016 WolfTilo Balke and Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig An Overview
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationChapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn
Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationJordan BoydGraber University of Maryland. Thursday, March 3, 2011
DataIntensive Information Processing Applications! Session #5 Graph Algorithms Jordan BoydGraber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons AttributionNoncommercialShare
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationA graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial.
2301670 Graph theory 1.1 What is a graph? 1 st semester 2550 1 1.1. What is a graph? 1.1.2. Definition. A graph G is a triple (V(G), E(G), ψ G ) consisting of V(G) of vertices, a set E(G), disjoint from
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationTODAY S LECTURE HYPERTEXT AND
LINK ANALYSIS TODAY S LECTURE HYPERTEXT AND LINKS We look beyond the content of documents We begin to look at the hyperlinks between them Address questions like Do the links represent a conferral of authority
More informationData Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More informationLocal HigherOrder Graph Clustering
Local HigherOrder Graph Clustering ABSTRACT Hao Yin Stanford University yinh@stanford.edu Jure Leskovec Stanford University jure@cs.stanford.edu Local graph clustering methods aim to find a cluster of
More informationCS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri
CS 598CSC: Approximation Algorithms Lecture date: March, 011 Instructor: Chandra Chekuri Scribe: CC Local search is a powerful and widely used heuristic method (with various extensions). In this lecture
More informationA Parallel Algorithm for Finding Subgraph Isomorphism
CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Subgraph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns
More informationDataIntensive Computing with MapReduce
DataIntensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons AttributionNoncommercialShare
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 21: Link Analysis Hinrich Schütze Center for Information and Language Processing, University of Munich 20140618 1/80 Overview
More informationRandomized Composable Coresets for Distributed Optimization Vahab Mirrokni
Randomized Composable Coresets for Distributed Optimization Vahab Mirrokni Mainly based on joint work with: Algorithms Research Group, Google Research, New York Hossein Bateni, Aditya Bhaskara, Hossein
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10B Sangmi Lee Pallickara. FAQs. Collaborative filtering
W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationData Mining. Jeff M. Phillips. January 8, 2014
Data Mining Jeff M. Phillips January 8, 2014 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data
More informationx = 12 x = 12 1x = 16
2.2  The Inverse of a Matrix We've seen how to add matrices, multiply them by scalars, subtract them, and multiply one matrix by another. The question naturally arises: Can we divide one matrix by another?
More informationThe PageRank Citation Ranking
October 17, 2012 Main Idea  Page Rank web page is important if it points to by other important web pages. *Note the recursive definition IR  course web page, Brian home page, Emily home page, Steven
More informationCOMP Page Rank
COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper
More informationMathematical Methods and Computational Algorithms for Complex Networks. Benard Abola
Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationModelling TCP with Markov chains
Modelling TCP with Markov chains Keith Briggs Keith.Briggs@bt.com more.btexact.com/people/briggsk2/ 2002 June 17 typeset in L A T E X2e on a linux system Modelling TCP p.1/30 Outline Markov chain basics
More informationBipartite graphs unique perfect matching.
Generation of graphs Bipartite graphs unique perfect matching. In this section, we assume G = (V, E) bipartite connected graph. The following theorem states that if G has unique perfect matching, then
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationSampling Large Graphs for Anticipatory Analysis
Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015
More informationLocal Community Detection in Dynamic Graphs Using Personalized Centrality
algorithms Article Local Community Detection in Dynamic Graphs Using Personalized Centrality Eisha Nathan, Anita Zakrzewska, Jason Riedy and David A. Bader * School of Computational Science and Engineering,
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More information9 About Intersection Graphs
9 About Intersection Graphs Since this lecture we focus on selected detailed topics in Graph theory that are close to your teacher s heart... The first selected topic is that of intersection graphs, i.e.
More informationSocial Network Analysis
Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Centrality Analysis techniques
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning ChoJui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationPERSONALIZED TAG RECOMMENDATION
PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications
More informationChingYung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science. September 21, 2017
E6893 Big Data Analytics Lecture 3: Big Data Storage and Analytics ChingYung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 21, 2017 1 E6893 Big Data Analytics
More informationLargescale Graph Google NY
Largescale Graph Mining @ Google NY Vahab Mirrokni Google Research New York, NY DIMACS Workshop Largescale graph mining Many applications Friend suggestions Recommendation systems Security Advertising
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationFast Inbound Top K Query for Random Walk with Restart
Fast Inbound Top K Query for Random Walk with Restart Chao Zhang, Shan Jiang, Yucheng Chen, Yidan Sun, Jiawei Han University of Illinois at Urbana Champaign czhang82@illinois.edu 1 Outline Background
More informationAgenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page
Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationThe Structure of BullFree Perfect Graphs
The Structure of BullFree Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertexdisjoint
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationLecture 17 November 7
CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has
More informationCS7540 Spectral Algorithms, Spring 2017 Lecture #2. Matrix Tree Theorem. Presenter: Richard Peng Jan 12, 2017
CS7540 Spectral Algorithms, Spring 2017 Lecture #2 Matrix Tree Theorem Presenter: Richard Peng Jan 12, 2017 DISCLAIMER: These notes are not necessarily an accurate representation of what I said during
More informationLec 8: Adaptive Information Retrieval 2
Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/irbook/ Linear Algebra Revision Vectors:
More informationRouting State Distance: A Pathbased Metric for Network Analysis Gonca Gürsun
Routing State Distance: A Pathbased Metric for Network Analysis Gonca Gürsun joint work with Natali Ruchansky, Evimaria Terzi, Mark Crovella Distance Metrics for Analyzing Routing Shortest Path Similar
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationCLASSROOM NOTES: OPTIMIZATION PROBLEM SOLVING  I
Sutra: International Journal of Mathematical Science Education, Technomathematics Research Foundation Vol. 1, No. 1, 3035, 2008 CLASSROOM NOTES: OPTIMIZATION PROBLEM SOLVING  I R. Akerkar Technomathematics
More informationr=1 The Binomial Theorem. 4 MA095/98G Revision
Revision Read through the whole course once Make summary sheets of important definitions and results, you can use the following pages as a start and fill in more yourself Do all assignments again Do the
More informationmodern database systems lecture 10 : largescale graph processing
modern database systems lecture 1 : largescale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 91 : final exam april : homework due graphs
More informationData Mining. Jeff M. Phillips. January 9, 2013
Data Mining Jeff M. Phillips January 9, 2013 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data
More informationBayesian Networks and Decision Graphs
ayesian Networks and ecision raphs hapter 7 hapter 7 p. 1/27 Learning the structure of a ayesian network We have: complete database of cases over a set of variables. We want: ayesian network structure
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) Inclass Duration: Duration of the class 1hr 15min (75min) Total points:
More informationTraining Deep Neural Networks (in parallel)
Lecture 9: Training Deep Neural Networks (in parallel) Visual Computing Systems How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors as
More informationCOLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA
COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 WolfTilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationModels and Algorithms for Complex Networks
Models and Algorithms for Complex Networks with network with parametrization, elements maintaining typically characteristic local profiles with categorical attributes [C. Faloutsos MMDS8] Milena Mihail
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationDifferentially Private Algorithm and Auction Configuration
Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch October 11, 2017 Joint work with Nina Balcan and Travis Dick $1 Prices learned from purchase histories can reveal
More informationThe Necessity of Mathematics
The Necessity of Mathematics from Google to Counterterrorism to Sudoku Amy Langville langvillea@cofc.edu work supported by NSFCAREER0566, NSA, DOEd, SAS, Semandex Mathematics Department College of Charleston
More informationOn 2Subcolourings of Chordal Graphs
On 2Subcolourings of Chordal Graphs Juraj Stacho School of Computing Science, Simon Fraser University 8888 University Drive, Burnaby, B.C., Canada V5A 1S6 jstacho@cs.sfu.ca Abstract. A 2subcolouring
More informationIn class 75min: 2:554:10 Thu 9/30.
MATH 4530 Topology. In class 75min: 2:554:10 Thu 9/30. Prelim I Solutions Problem 1: Consider the following topological spaces: (1) Z as a subspace of R with the finite complement topology (2) [0, π]
More informationConstant Contact. Responsyssy. VerticalResponse. Bronto. Monitor. Satisfaction
Contenders Leaders Marketing Cloud sy Scale Campaign aign Monitor Niche High Performers Satisfaction Email Marketing Products Products shown on the Grid for Email Marketing have received a minimum of 10
More informationScaling LinkBased Similarity Search
Scaling LinkBased Similarity Search Dániel Fogaras Budapest University of Technology and Economics Budapest, Hungary, H1521 fd@cs.bme.hu Balázs Rácz Computer and Automation Research Institute of the
More informationShortest Paths Algorithms and the Internet: The Distributed Bellman Ford Lecturer: Prof. Chiara Petrioli
Shortest Paths Algorithms and the Internet: The Distributed Bellman Ford Lecturer: Prof. Chiara Petrioli Dipartimento di Informatica Rome University La Sapienza G205: Fundamentals of Computer Engineering
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEYINTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationMinimal Universal Bipartite Graphs
Minimal Universal Bipartite Graphs Vadim V. Lozin, Gábor Rudolf Abstract A graph U is (induced)universal for a class of graphs X if every member of X is contained in U as an induced subgraph. We study
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationRandom Walks for KnowledgeBased Word Sense Disambiguation. Qiuyu Li
Random Walks for KnowledgeBased Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised  using labeled training sets (features and proper sense label) 2 Unsupervised  only use unlabeled
More informationPaths, Flowers and Vertex Cover
Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is
More information6. Lecture notes on matroid intersection
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm
More informationUS Patent 6,658,423. William Pugh
US Patent 6,658,423 William Pugh Detecting duplicate and near  duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationSpectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014
Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and
More informationImproving Collection Selection with Overlap Awareness in P2P Search Engines
Improving Collection Selection with Overlap Awareness in P2P Search Engines Matthias Bender Peter Triantafillou Gerhard Weikum Christian Zimmer and Improving Collection Selection with Overlap Awareness
More informationSmall Survey on Perfect Graphs
Small Survey on Perfect Graphs Michele Alberti ENS Lyon December 8, 2010 Abstract This is a small survey on the exciting world of Perfect Graphs. We will see when a graph is perfect and which are families
More informationSection 26. Compact Sets
26. Compact Sets 1 Section 26. Compact Sets Note. You encounter compact sets of real numbers in senior level analysis shortly after studying open and closed sets. Recall that, in the real setting, a continuous
More information