Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome

Motivation Recommendation Systems: Bipartite graphs with Users and Items. Identify similar users and suggest relevant items. Concrete example: The AdWords case. Two key observations: Items belong to different categories. Graphs are often lopsided.

Modeling the Data as a Bipartite Graph 2$ Retailers 3$ 4$ 1$ 5$ 2$ Nike Store New York Soccer Shoes Apparel Sport Equipment Hundreds of Labels Soccer Ball Millions of Advertisers Billions of Queries

Personalized PageRank For a node v (the seed) and a probability alpha v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.

The Problem 2$ Retailers 3$ 4$ 1$ 5$ 2$ Nike Store New York Soccer Shoes Apparel Sport Equipment Hundreds of Labels Soccer Ball Millions of Advertisers Billions of Queries

Other Applications General approach applicable to several contexts: User, Movies, Genres: find similar users and suggest movies. Authors, Papers, Conferences: find related authors and suggest papers to read.

Semi-Formal Problem Definition Advertisers Queries

Semi-Formal Problem Definition A Advertisers Queries

Semi-Formal Problem Definition A Advertisers Labels: Queries

Semi-Formal Problem Definition A Advertisers Labels: Queries Goal: Find the nodes most similar to A.

How to Define Similarity? We address the computation of several node similarity measures: Neighborhood based: Common neighbors, Jaccard Coefficient, Adamic-Adar. Paths based: Katz. Random Walk based: Personalized PageRank. Experimental question: which measure is useful? Algorithmic questions: Can it scale to huge graphs? Can we compute it in real-time?

Our Contribution Reduce and Aggregate: general approach to induce real-time similarity rankings in multicategorical bipartite graphs, that we apply to several similarity measures. Theoretical guarantees for the precision of the algorithms. Experimental evaluation with real world data.

Personalized PageRank For a node v (the seed) and a probability alpha v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.

Challenges Our graphs are too big (billions of nodes) even for very large-scale MapReduce systems. MapReduce is not real-time. We cannot pre-compute the rankings for each subset of labels.

Reduce and Aggregate a a b 1) a b a c 2) c c b c 3) b Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph. Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.

Reduce (Precomputation) Advertisers Queries

Reduce (Precomputation) Advertisers Queries Precomputed Rankings

Reduce (Precomputation) Advertisers Queries Precomputed Rankings Precomputed Rankings

Reduce (Precomputation) Advertisers Queries Precomputed Rankings Precomputed Rankings Precomputed Rankings

Aggregate (Run Time) A Precomputed Rankings Precomputed Rankings Ranking of Red + Yellow

Reduce for Personalized PageRank X Side A Y X Y Side B Markov Chain state aggregation theory (Simon and Ado, 61; Meyer 89, etc.). 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph. Side A

Run-time Aggregation

Koury et al. Aggregation-Disaggregation Algorithm A B Step 1: Partition the Markov chain into DISJOINT subsets

Koury et al. Aggregation-Disaggregation Algorithm A A B B Step 2: Approximate the stationary distribution on each subset independently.

Koury et al. Aggregation-Disaggregation Algorithm P AA A P B AB B A P BA Step 3: Consider the transition between subsets. P BB

Koury et al. Aggregation-Disaggregation Algorithm P AA A B 0 A P AB 0 B P BA Step 4: Aggregate the distributions. Repeat until convergence. P BB

Aggregation in PPR X A Y A Precompute the stationary distributions individually

Aggregation in PPR X B Y B Precompute the stationary distributions individually

Aggregation in PPR A B The two subsets are not disjoint!

Our Approach A B X Y X Y The algorithm is based only on the reduced graphs with Advertiser-Side nodes. The aggregation algorithm is scalable and converges to the correct distribution.

Experimental Evaluation We experimented with publicly available and proprietary datasets: Query-Ads graph from Google AdWords > 1.5 billions nodes, > 5 billions edges. DBLP Author-Papers and Patent Inventor- Inventions graphs. Ground-Truth clusters of competitors in Google AdWords.

Patent Graph 1 0.9 0.8 Precision vs Recall Inter Jaccard Adamic-Adar Katz PPR 0.7 0.6 Precision Precision 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Recall

Google AdWords Precision Recall

Conclusions and Future Work It is possible to compute several similarity scores on very large bipartite graphs in real-time with good accuracy. Future work could focus on the case where categories are not disjoint is relevant.

Thank you for your attention

Reduction to the Query Side X Y A B

Reduction to the Query Side X Y A B This is the larger side of the graph.

Convergence after One Iteration Kendall-Tau Correlation 1 DBLP Patent Query-Ads (cost) Kendall-Tau 0.8 0.6 0.4 0.2 0 10 20 30 40 50 All Position (k)

Convergence Approximation Error vs # Iterations DBLP (1 - Cosine) Patent (1 - Cosine) 1-Cosine Similarity 1-Cosine 0.001 0.0001 1e-05 1e-06 0 2 4 6 8 10 12 14 16 18 20 Iterations Iterations