Speeding Up the Xbox Recommender System Using a Euclidean Transformation for Inner-Product Spaces

Similar documents
Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

An Optimal Algorithm for Prufer Codes *

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Parallelism for Nested Loops with Non-uniform and Flow Dependences

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Cluster Analysis of Electrical Behavior

Performance Evaluation of Information Retrieval Systems

Smoothing Spline ANOVA for variable screening

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Classifier Selection Based on Data Complexity Measures *

Feature Reduction and Selection

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

S1 Note. Basis functions.

Learning-Based Top-N Selection Query Evaluation over Relational Databases

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Module Management Tool in Software Development Organizations

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Research of Support Vector Machine in Agricultural Data Classification

Feature-Based Matrix Factorization

LECTURE : MANIFOLD LEARNING

Analysis of Continuous Beams in General

Query Clustering Using a Hybrid Query Similarity Measure

Support Vector Machines

A Binarization Algorithm specialized on Document Images and Photos

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Optimizing Document Scoring for Query Retrieval

Unsupervised Learning

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

The Codesign Challenge

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

UB at GeoCLEF Department of Geography Abstract

Laplacian Eigenmap for Image Retrieval

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Hermite Splines in Lie Groups as Products of Geodesics

Problem Set 3 Solutions

Private Information Retrieval (PIR)

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Classification / Regression Support Vector Machines

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Hierarchical clustering for gene expression data analysis

Programming in Fortran 90 : 2017/2018

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Solving two-person zero-sum game by Matlab

Support Vector Machines

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Machine Learning: Algorithms and Applications

Mathematics 256 a course in differential equations for engineering students

Parallel matrix-vector multiplication

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

LOCALIZING USERS AND ITEMS FROM PAIRED COMPARISONS. Matthew R. O Shaughnessy and Mark A. Davenport

Unsupervised Learning and Clustering

Meta-heuristics for Multidimensional Knapsack Problems

Constructing Minimum Connected Dominating Set: Algorithmic approach

A Facet Generation Procedure. for solving 0/1 integer programs

Detection of an Object by using Principal Component Analysis

Signed Distance-based Deep Memory Recommender

CS246: Mining Massive Datasets Jure Leskovec, Stanford University


A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Self-tuning Histograms: Building Histograms Without Looking at Data

Optimal Workload-based Weighted Wavelet Synopses

Lecture 4: Principal components

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Lecture 5: Multilayer Perceptrons

A New Approach For the Ranking of Fuzzy Sets With Different Heights

MULTI-VIEW ANCHOR GRAPH HASHING

GSLM Operations Research II Fall 13/14

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

Machine Learning 9. week

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Deep Classification in Large-scale Text Hierarchies

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

An Entropy-Based Approach to Integrated Information Needs Assessment

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Concurrent Apriori Data Mining Algorithms

Unsupervised Learning and Clustering

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Abstract. 1 Introduction

On the Efficiency of Swap-Based Clustering

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

User Authentication Based On Behavioral Mouse Dynamics Biometrics

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

Transcription:

Speedng Up the Xbox Recommender System Usng a Eucldean Transformaton for Inner-Product Spaces Ran Glad-Bachrach Mcrosoft Research Yoram Bachrach Mcrosoft Research Nr Nce Mcrosoft R&D Lran Katzr Computer Scence, Technon Yehuda Fnkelsten Mcrosoft R&D Ulrch Paquet Mcrosoft Research Noam Koengsten Mcrosoft R&D ABSTRACT A promnent approach n collaboratve flterng based recommender systems s usng dmensonalty reducton (matrx factorzaton) technques to map users and tems nto low-dmensonal vectors. In such systems, a hgher nner product between a user vector and an tem vector ndcates that the tem better suts the user s preference. Tradtonally, retrevng the most sutable tems s done by scorng and sortng all tems. Real world onlne recommender systems must adhere to strct response-tme constrants, so when the number of tems s large, scorng all tems s ntractable. We propose a novel order preservng transformaton, mappng the maxmum nner product search problem to Eucldean space nearest neghbor search problem. Utlzng ths transformaton, we study the effcency of several (approxmate) nearest neghbor data structures. Our fnal soluton s based on a novel use of the PCA-Tree data structure n whch results are augmented usng paths one hammng dstance away from the query (neghborhood boostng). The end result s a system whch allows approxmate matches (tems wth relatvely hgh nner product, but not necessarly the hghest one). We evaluate our technques on two large-scale recommendaton datasets, Xbox Moves and Yahoo Musc, and show that ths technque allows tradng off a slght degradaton n the recommendaton qualty for a sgnfcant mprovement n the retreval tme. Categores and Subject Descrptors H.5[Informaton systems]: Informaton retreval retreval models and rankng, retreval tasks and goals Keywords Recommender systems, matrx factorzaton, nner product search, fast retreval Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. Copyrghts for components of ths work owned by others than the author(s) must be honored. Abstractng wth credt s permtted. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Request permssons from permssons@acm.org. RecSys 14, October 6 1, 214, Foster Cty, Slcon Valley, CA, USA. Copyrght s held by the owner/author(s). Publcaton rghts lcensed to ACM. ACM 978-1-453-2668-1/14/1...$15.. http://dx.do.org/1.1145/264571.2645741. 1. INTRODUCTION The massve growth n onlne servces data gves rse to the need for better nformaton flterng technques. In the context of recommender systems the data conssts of (1) the tem catalog; (2) the users; and (3) the user feedback (ratngs). The goal of a recommender system s to fnd for everyuser a lmted set of tems that havethe hghest chance to be consumed. Modern recommender systems have two major parts. In the frst part, the learnng phase, a model s learned (offlne)basedon userfeedback 1. Inthesecond part, the retreval phase, recommendatons are ssued per user (onlne). Ths paper studes the scalablty of the retreval phase (the second part) n massve recommender systems based on matrx factorzaton. Specfcally, we ntroduce a new approach whch offers a trade-off between runnng tme and the qualty of the results presented to a user. Matrx Factorzaton (MF) s one of the most popular approaches for collaboratve flterng. Ths method has repeatedly demonstrated better accuracy than other methods such as nearest neghbor models and restrcted Boltzmann machnes [2, 8]. In MF models, users and tems are represented by latent feature vectors. A Bayesan MF model s also at the heart of the Xbox recommendaton system [16] whch serves games, moves, and musc recommendatons to mllons of users daly. In ths system, users and tems are represented by (low-dmensonal) vectors n R 5. The qualty of the match between a user u represented by the vector x u and the tem represented by the vector y s gven by thenner productx u y between thesetwovectors. A hgher nner product mples a hgher chance of the user consumng the tem. The Retreval Problem: Ideally, gven a user u represented by a vector x u, all the tem vectors (y 1,...,y n) are examned. For each such tem vector y, ts match qualty wth the user (x u y ) s computed, and the tems sorted accordng to ther match qualty. The tems wth the hghest match qualty n the lst are then selected to form the fnal lst of recommendatons. However, the catalog of tems s often too large to allow an exhaustve computaton of all the nner products wthn a lmted allowed retreval tme. The Xbox catalog conssts of mllons of tems of varous knds. If a lnear scan s used, mllons of nner product computatons are requred for each sngle recommendaton. The 1 Ths phase cannot be done entrely offlne when a context s used to ssue the recommended tems.

user vectors can take nto account contextual nformaton 2 that s only avalable durng user engagement. Hence, the complete user vector s computed onlne (at runtme). As a result, the retreval of the recommended tems lst can only be performed onlne, and cannot be pre-computed offlne. Ths task consttutes the sngle most computatonal ntensve task mposed on the onlne servers. Thereby, havng a fast alternatve for ths process s hghly desrable. Our Contrbuton: Ths paper shows how to sgnfcantly speed up the recommendaton retreval process. The optmal tem-user match retreval s relaxed to an approxmate search: retrevng tems that have a hgh nner product wth the user vector, but not necessarly the hghest one. The approach combnes several buldng blocks. Frst, we defne a novel transformaton from the nner product problem to a Eucldean nearest neghbor problem (Secton 3). As a pre-processng step, ths transformaton s appled to the tem vectors. Durng tem retreval, another transformaton s appled to the user vector. The tem wth the smallest Eucldean dstance n the transformed space s then retreved. To expedte the nearest neghbor search, the PCA-Tree [21] data structure s used together wth a novel neghborhood boostng scheme (Secton 4). To demonstrate the effectveness of the proposed approach, t s appled to an Xbox recommendatons dataset and the publcly avalable Yahoo Musc dataset [8]. Experments show a trade-off curve of a slght degradaton n the recommendaton qualty for a sgnfcant mprovement n the retreval tme (Secton 5). In addton, the achevable tmeaccuracy trade-offs are compared wth two baselne approaches, an mplementaton based on Localty Senstve Hashng [1] and the current state of the art method for approxmate recommendaton n matrx-factorzaton based CF systems[13]. We show that for a gven requred recommendaton qualty (accuracy n pckng the optmal tems), our approach allows achevng a much hgher speedup than these alternatves. Notaton: We use lower-case fonts for scalars, bold lowercase fonts for vectors, and bold upper-case fonts for matrces. For example, x s a scalar, x s a vector, and X s a matrx. Gven a vector x R d, let x be the measure n dmenson, wth (x 1,x 2,...,x d ) T R d. The norm s denoted by ; d n Eucldean space x = =1 x2. We denote by x y a dot product (nner product) between x and y. Fnally, we use ( a,x T) T to denote a concatenaton of a scalar a wth a vector x. 2. BACKGROUND AND RELATED WORK In ths secton we wll explan the problem of fndng best recommendatons n MF models and revew possble approaches for effcent retreval of recommendatons. 2.1 Matrx Factorzaton Based Recommender Systems In MF models, each user u s assocated wth a user-trats vector x u R d, and each tem wth an tem-trats vector y R d. The predcted ratng of a user u to an tem s denoted by ˆr u and obtaned usng the rule: ˆr u = µ+b u +b +x u y, (1) 2 The contextual nformaton may nclude the tme of day, recent search queres, etc. where µ s the overall mean ratng value and b and b u represent the tem and user bases respectvely. The above model s a smple baselne model smlar to [14]. It can be readly extended to form the core of a varety of more complex MF models, and adapted to dfferent knds of user feedback. Whle µ and b are mportant components of the model, they do not effect the rankng of tems for any gven user, and the rule r u = b + x u y wll produce the same set of recommendatons as that of Equaton 1. We can also concatenate the tem bas b to the user vector and reduce our predcton rule to a smple dot product: r u = x u ȳ, where x u (1, x T u) T, and ȳ (b, y T ) T. Hence, computng recommendatons n MF models amounts to a smple search n an nner product space: gven a user vector x u, we wsh to fnd tems wth vectors ȳ that wll maxmze the nner product x u ȳ. For the sake of readablty, from ths pont onward we wll drop the bar and refer to x u and ȳ u as x u and y. We therefore focus on the problem of fndng maxmal nner product matches as descrbed above. 2.2 Retreval of Recommendatons n Inner- Product Spaces The problem of effcent retreval of recommendatons n MF models s relatvely new, but t has been dscussed n the past [1, 11, 13]. In real-world large scale systems such as the Xbox Recommender, ths s a concrete problem, and we dentfed t as the man bottleneck that drans our onlne resources. Prevous studes can be categorzed nto two basc approaches. The frst approach s to propose new recommendaton algorthms n whch the predcton rule s not based on nner-product matches. Ths was the approach taken by Khoshneshnetal. [1], whowerefrsttorase theproblemof effcent retreval of recommendatons n MF models. In [1] a new model s proposed n whch users and tems are embedded based on ther Eucldean smlarty rather than ther nner-product. In a Eucldean space, the plethora of algorthms for nearest-neghbor search can be utlzed for an effcent retreval of recommendatons. A smlar approach was taken by [11] where an tem-orented model was desgned to allevate retreval of recommendatons by embeddng tems n a Eucldean space. Whle these methods show sgnfcant mprovements n retreval tmes, they devate from the well famlar MF framework. These approaches whch are based on new algorthms do not beneft the core of exstng MF based recommender systems n whch the retreval of recommendatons s stll based on nner-products. The second approach to ths problem s based on desgnng new algorthms to mtgate maxmal nner-product search. These algorthms can be used n any exstng MF based system and requre only to mplement a new data structure on top of the recommender to assst at the retreval phase. For example, n [13] a new IP-Tree data structure was proposed that enables a branch-and-bound search scheme n nner-product spaces. In order to reach hgher speedup values, the IP-Tree was combned wth sphercal user clusterng that allows to pre-compute and cache recommendatons to smlar users. However, ths approach requres pror knowledge of all the user vectors whch s not avalable n systems such as the Xbox recommender where ad-hoc contextual nformaton s used to update the user vectors. Ths work was later contnued n [18] for the general problem of maxmal nner-product search, but these extensons showed effectve-

ness n hgh-dmensonal sparse datasets whch s not the case for vectors generated by a MF process. Ths paper bulds upon a novel transformaton that reduces the maxmal nner-product problem to smple nearest neghbor search n a Eucldean space. On one hand the proposed approach can be employed by any classcal MF model, and on the other hand t enables usng any of the exstng algorthms for Eucldean spaces. Next, we revew several alternatves for solvng the problem n a Eucldean Space. 2.2.1 Nearest Neghbor n Eucldean Spaces Localty Senstve Hashng (LSH) was recently popularzed as an effectve approxmate retreval algorthm. LSH was ntroduced by Broder et al. to fnd documents wth hgh Jaccard smlarty[4]. It was later extended to other metrcs ncludng the Eucldean dstance [9], cosne smlarty [5], and earth mover dstance [5]. A dfferent approach s based on space parttonng trees: KD-trees[3]sadatastructurethatparttonsR d ntohyperrectangular (axs parallel) cells. In constructon tme, nodes are splt along one coordnate. At query tme, one can search of all ponts n a rectangular box and nearest neghbors effcently. Several augmented splts are used to mprove the query tme. For example, (1) Prncpal component axes trees (PCA-Trees) transform the orgnal coordnates to the prncpal components [21]; (2) Prncpal Axs Trees (PAC- Trees) [15] use a prncpal component axs at every node; (3) Random Projecton Trees (RPT) use a random axs at each node [6]; and (4) Maxmum Margn Trees (MMT) use a maxmum margn axs at every node [2]. A theoretcal and emprcal comparson for some varants can be found [19]. Our approach makes use of PCA-trees and combnes t wth a novel neghborhood boostng scheme. In Secton 5 we compare to alternatves such as LSH, KD-Trees, and PAC- Trees. We do not compare aganst MMT and RPT as we don t see ther advantage over the other methods for the partcular problem at hand. 3. REDUCIBLE SEARCH PROBLEMS A key contrbuton of ths work s focused on the concept of effcent reductons between search problems. In ths secton we formalze the concept of a search problem and show effcent reductons between known varants. We defne a search problem as: Defnton 1. A search problem S(I, Q, s) conssts of an nstance set of n tems I = { 1, 2,..., n} I, a query q Q, and a search functon s : I Q {1,2,...,n}. Functon s retreves the ndex of an tem n I for a gven query q. The goal s to pre-process the tems wth g : I I such that each query s answered effcently. The preprocessng g can nvolve a transformaton from one doman to another, so that a transformed search problem can operate on a dfferent doman. The followng defnton formalzes the reducton concept between search problems: Defnton 2. A search problem S 1(I,Q,s 1) s reducble to a search problem S 2(I,Q,s 2), denoted by S 1 S 2, f there exst functons g : I I and h : Q Q such that j = s 1(I,q) f and only f j = s 2(g(I),h(q)). Ths reducton does not apply any constrants on the runnng tme of g and h. Note that g runs only once as a pre-processng step, whle h s appled at the query tme. Ths yelds a requrement that h has a O(1) runnng tme. We formalze ths wth the followng notaton: Defnton 3. We say that S 1 O(f(n)) S 2 f S 1 S 2 and the runnng tme of g and h are O(f(n)) and O(1) respectvely. For a query vector n R d, we consder three search problems n ths paper: MIP, the maxmum nner product from n vectors n R d (MIP n,d ); NN, the nearest neghbor from n vectors n R d (NN n,d ); MCS, the maxmum cosne smlarty from n vectors n R d (MCS n,d ). They are formally defned as follows: Instance: A matrx of n vectors Y = [y 1,y 2,...,y n] such that y R d ; therefore I = R d n. Query: A vector x R d ; hence Q = R d. Objectve: Retreve an ndex accordng to s(y,x) = argmaxx y s(y,x) = argmn x y s(y,x) = argmax x y x y where ndcates column of Y. MIP n,d NN n,d MCS n,d, The followng secton shows how transformatons between these three problems can be acheved wth MCS n,d O(n) MIP n,d O(n) NN n,d+1 and NN n,d O(n) MCS n,d+1 O(n) MIP n,d+1. 3.1 Order Preservng Transformatons The trangle nequalty does not hold between vectors x, y, and y j when an nner product compares them, as s the case n MIP. Many effcent search data structures rely on the trangle nequalty, and f MIP can be transformed to NN wth ts Eucldan dstance, these data structures would mmedately become applcable. Our frst theorem states that MIP can be reduced to NN by havng an Eucldan metrc n one more dmenson than the orgnal problem. Theorem 1. MIP n,d O(n) NN n,d+1 Proof: Let φ max y and preprocess nput wth: ỹ = ( ) T g(y ) = φ 2 y 2,y T. Durng query tme: x = h(x) = (,x T) T. As we have x 2 = x 2 ỹ 2 = φ 2 y 2 + y 2 = φ 2 x ỹ = φ 2 x 2 +x y = x y x ỹ 2 = x 2 + ỹ 2 2 x ỹ = x 2 +φ 2 2x y. Fnally, as φ and x are ndependent of ndex, j = argmn x ỹ 2 = argmaxx y.

Theorem 1 provdes the man workhorse for our proposed approach (Secton 4). In the remanng of ths secton, we present ts propertes as well the related transformatons. If t s known that the transformed Ỹ = [ỹ1,ỹ2,...,ỹn] s n a manfold, as gven above, we mght expect to recover Y by reducng back wth NN n,d O(n) MIP n,d 1. However, n the general case the transformaton s only possble by ncreasng the dmensonalty by one agan: Theorem 2. NN n,d O(n) MIP n,d+1 Proof: The preprocessng of the nput: ỹ = g(y ) = ( ) y 2 T. (,y ) T Durng query tme: x = h(x) = 1, 2x T T. We have x ỹ = y 2 2x y. Fnally, j = argmax x ỹ = argmn x 2 + y 2 2x y = argmn x y 2. MIP search can also be embedded n a MCS search by ncreasng the dmensonalty by one: Theorem 3. MIP n,d O(n) MCS n,d+1 Proof: Preprocessng and query transformaton are dentcal to Theorem 1. The preprocessng of the nput: φ ( ) T max y and let ỹ = g(y ) = φ 2 y 2,y T. Durng query tme: x = h(x) = (,x T) T. Fnally, j = argmax x ỹ x ỹ = argmax x y x φ = argmax x y. However, MCS s smply MIP searchng over normalzed vectors: Theorem 4. MCS n,d O(n) MIP n,d Proof: The preprocessng of the nput: ỹ = g(y) = y y. Durng query tme: x = h(x) = x. Fnally, x y j = argmax x ỹ = argmax x y. Our fnal result states that a NN search can be transformed to a MCS search by ncreasng the dmensonalty by one: Theorem 5. NN n,d O(n) MCS n,d+1 Proof: Same reducton as n Theorem 1. The preprocessng of the nput: φ max y and ỹ = g(y ) = ( φ 2 y 2,y T ) T. Durng query tme: x = h(x) = (,x T ) T. Thus by Theorem 1, j = argmax x ỹ x ỹ = argmax x y = argmn x ỹ 2. x φ = argmax x y Next, weutlzetheorem 1forspeedngupretrevalofrecommendatons n Xbox and other MF based recommender systems. Algorthm 1 TransformAndIndex(Y, d ) nput: tem vectors Y, depth d d+1 output: tree t compute φ, µ, W S = for = 1 : n do ỹ = g(y ) ; S = S ỹ end for return T PCA-Tree(S, d ) 4. AN OVERVIEW OF OUR APPROACH Our soluton s based on two components, a reducton to a Eucldan search problem, and a PCA-Tree to address t. The reducton s very smlar to that defned n Theorem 1, but composed wth an addtonal shft and rotaton, so that the MIP search problem s reduced to NN search, wth all vectors algned to ther prncpal components. 4.1 Reducton We begn wth defnng the frst reducton functon followng Theorem 1. Let φ max y, and y = g 1(y ) = x = h 1(x) = ( φ 2 y 2, y T ) T (,x T) T, (2) whch, when appled to Y, gves elements y R d+1. Ths reduces MIP to NN. As NN s nvarant to shfts and rotatons n the nput space, we can compose the transformatons wth PCA rotaton and stll keep an equvalent search problem. y We mean-center and rotate the data: Let µ = 1 n be the mean after the frst reducton, and M R d+1 n a matrx wth µ replcated along ts columns. The SVD of the centered data matrx s (Y M) = WΣU T, where data tems appear n the columns of Y. Matrx W s a (d + 1) by (d + 1) matrx. Each of the columns of W = [w 1,...,w d+1 ] defnes an orthogonal unt-length egenvector, sothat each w j defnesahyperplaneontowhch each y µ s projected. Matrx W s a rotaton matrx that algns the vectors to ther prncpal components. 3 We defne the centered rotaton as our second transformaton, The composton ỹ = g 2(y ) = W T (y µ) x = h 2(x ) = W T (x µ). (3) g(y ) = g 2(g 1(y )), h(x) = h 2(h 1(x)) (4) stll defnesareductonfrom MIPtoNN. Usngỹ = g(y ), gves us a transformed set of nput vectors Ỹ, over whch an Eucldan search can be performed. Moreover, after ths transformaton, the ponts are rotated so that ther components are n decreasng order of varance. Next, we ndex the transformed tem vectors n Ỹ usng a PCA-Tree data structure. We summarze the above logc n Algorthm 1. 3 Notce that Σ s not ncluded, as the Eucldan metrc s nvarant under rotatons of the space, but not shears.

Algorthm 2 PCA-Tree(S, δ) nput: tem vectors set S, depth δ output: tree t f δ = then return new leaf wth S end f j = d+1 δ // prncpal component at depth δ m = medan({ỹ j for all ỹ S}) S = {ỹ S where ỹ j m} S > = {ỹ S where ỹ j > m} t.leftchld = PCA-Tree(S, δ 1) t.rghtchld = PCA-Tree(S >, δ 1) return t 4.2 Fast Retreve wth PCA-Trees Buldng the PCA-Tree follows from a the KD-Tree constructon algorthm on Ỹ. Snce the axes are algned wth the d+1 prncpal components of Y, we can make use of a KD-tree constrcton process to get a PCA-Tree data structure. The top d d + 1 prncpal components are used, and each tem vector s assgned to ts representatve leaf. Algorthm 2 defnes ths tree constructon procedure. At the retreval tme, the transformed user vector x = h(x) s used to traverse the tree to the approprate leaf. The leaf contans the tem vectors n the neghborhood of x, hence vectors that are on the same sde of all the splttng hyperplanes (the top prncpal components). The tems n ths leaf form an ntal canddates set from whch the top tems or nearest neghbors are selected usng a drect rankng by dstance. The number of tems n each leaf decays exponentally n the depth d of the tree. By ncreasng the depth we are left wth less canddates hence tradng better speedup values wth lower accuracy. The process allows achevng dfferent trade-offs between the qualty of the recommendatons and an allotted runnng tme: wth a larger d, a smaller proporton of canddates are examned, resultng n a larger speedup, but also a reduced accuracy. Our emprcal analyss (Secton 5) examnes the trade-offs we can acheve usng our PCA-trees, and contrasts ths wth trade-offs achevable usng other methods. 4.2.1 Boostng Canddates Wth Hammng Dstance Neghborhoods Whle the ntal canddates set ncludes many nearby tems, t s possble that some of the optmal top K vectors are ndexed n other leafs and most lkely the adjacent leafs. In our approach we propose boostng the canddates set wth the tem vectors n leafs that are on the wrong sde n at most one of the medan-shfted PCA hyperplane compared to x. These vectors are lkely to have a small Eucldean dstance from the user vector. Our PCA-Tree s a complete bnary tree of heght d, where each leaf corresponds to a bnary vector of length d. We supplement the ntal canddates set from the leaf of the user vector, wth all the canddates of leafs wth a Hammng dstance of 1, and hence examne canddates from d of the 2 d leafs. In Secton 5.1.1 we show that ths approach s nstrumental n achevng the best balance between speedup and accuracy. 5. EMPIRICAL ANALYSIS OF SPEEDUP- ACCURACY TRADEOFFS We use two large scale datasets to evaluate the speedup acheved by several methods: 1. Xbox Moves [12] Ths dataset s a Mcrosoft proprety dataset consstng of 1 mllon bnary {, 1} ratngs of more than 15K moves by 5.8 mllon users. We appled the method used n [12] to generate the vectors representng tems and users. 2. Yahoo! Musc[8] Ths s a publcly avalable ratngs dataset consstng of 252,8,275 ratngs of 624,961 tems by 1,,99 users. The ratngs are on a scale of -1. The users and tems vectors were generated by the algorthm n [7]. From both datasets we created a set of tem vectors and user vectors of dmensonalty d = 5. The followng evaluatons are based on these vectors. Measurements and Baselnes. We quantfy the mprovement of an algorthm A over another (nave) algorthm A by the followng term: A (A) = Tme taken by Algorthm A Tme taken by Algorthm A. (5) In all of our evaluatons we measure the speedup wth respect to the same algorthm: a nave search algorthm that terates over all tems to fnd the best recommendatons for every user (.e. computes the nner product between the user vectorandeach ofthetemvectors, keepngtrackofthetem wth the hghest nner product found so far). Thus denotng by T nave the tme taken by the nave algorthm we have: T nave = Θ(#users #tems d). The state of the art method for fndng approxmately optmal recommendatons uses a combnaton of IP-Trees and user cones [13]. In the followng evaluaton we dubbed ths method IP-Tree. The IP-Tree approach assumes all the user vectors (queres) are computed n advance and can be clustered nto a structure of user cones. In many realworld systems lke the Xbox recommender the user vectors are computed or updated onlne, so ths approach cannot be used. In contrast, our method does not requre havng all the user vectors n advance, and s thus applcable n these settngs. The IP-Tree method reles on an adaptaton of the branchand-bound search n metrc-trees[17] to handle nearest neghbor search n nner-product spaces. However, the constructon of the underlayng metrc-tree data structure, whch s a space parttonng tree, s not adapted to nner-product spaces (t parttons vectors accordng to Eucldean proxmty). By usng the Eucldean transformaton of Theorem 1, we can utlze the data structures and algorthms desgned for Eucldean spaces n ther orgnal form, wthout adaptatons that may curb ther effectveness. Next, we show that our approach acheves a superor computaton speedup, despte havng no access to any pror knowledge about the user vectors or ther dstrbuton. 4 4 We focus on onlne processng tme,.e. the tme to choose an tem to recommend for a target user. We gnore the computaton tme requred by offlne preprocessng steps.

Theorem 1 allows usng varous approxmate nearest-neghbor algorthms for Eucldean spaces, whose performance depends on the specfc dataset used. We propose usng PCA-Trees as explaned n Secton 4.2, and show that they have an excellent performance for both the Xbox moves and Yahoo! musc datasets, consstng of low dmensonalty dense vectors obtaned by matrx factorzaton. A dfferent and arguably more popular approach for fndng approxmatenearest-neghbors n Eucldean spaces s Localty-Senstve Hashng (LSH) [1]. In the evaluatons below we also nclude a comparson aganst LSH. We emphasze that usng both our PCA-Trees approach and LSH technques s only enabled by our Eucldean transformaton (Theorem 1). Our approxmate retreval algorthms ntroduce a tradeoff between accuracy and speedup. We use two measures to quantfy the qualty of the top K recommendatons. The frst measure Precson@K denotes how smlar the approxmate recommendatons are to the optmal top K recommendatons (as retreved by the nave approach): Precson@K Lrec Lopt K, (6) where L rec and L opt are the lsts of the top K approxmate and the top K optmal recommendatons respectvely. Our evaluaton metrcs only consder the tems at the top of the approxmate and optmal lsts. 5 A hgh value for Precson mples that the approxmate recommendatons are very smlar to the optmal recommendatons. In many practcal applcatons (especally for large tem catalogs), t s possble to have low Precson rates but stll recommend very relevant tems (wth a hgh nner product between the user and tem vectors). Ths motvates our second measure RMSE@K whch examnes the preference to the approxmate tems compared to the optmal tems: RMSE@K 1 K K k=1 ( L rec(k) L opt(k)) 2, (7) where L rec(k) and L opt(k) are the scores (predcted ratngs) of the k th recommended tem n the approxmated lst and the optmal lst respectvely. Namely, L rec(k) and L opt(k) are the values of nner products between the user vector and k th recommended tem vector and ( optmal tem vector ) respectvely. Note that the amount L rec(k) L opt(k) s always postve as the tems n each lst are ranked by ther scores. 5.1 Results Our ntal evaluaton consders three approxmaton algorthms: IP-Tree, LSH, and our approach (Secton 4.2). Fgure 1(a) depcts Precson@1 for the Xbox Moves dataset (hgher values ndcate better performance). The Precson values are plotted aganst the average speedup values they enable. At very low speedup values the LSH algorthm shows the best trade-off between precson and speedup, but when hgher speedup values are consdered the LSH performance drops sgnfcantly and becomes worst. One possble reason for ths s that our Eucldean transformaton results n transformed vectors wth one dmenson beng very large compared wth the other dmensons, whch s a dffcult 5 Note that for ths evaluaton the recall s completely determned by the precson. Method Enabled by Pror Neghborhood Theorem 1 knowledge boostng IP-Tree no user vectors not allowed KD-Tree yes none allowed PCA-Tree yes none allowed PAC-Tree yes none not allowed Table 1: A summary of the dfferent tree approaches. IP-Tree s the baselne from [13], whch requres pror knowledge of the users vectors. All other approaches (as well as LSH) were not feasble before Theorem 1 was ntroduced n ths paper. nput dstrbuton for LSH approaches. 6 In contrast, the tree-based approaches (IP-Tree and our approach) show a smlar behavor of a slow and steady decrease n Precson values as the speedup ncreases. The speedup values of our approach offers a better precson-vs-speedup tradeoff than the IP-tree approach, though ther precson s almost the same for hgh speedup values. Fgure 1(b) depcts the RMSE@1 (lower values ndcate better performance) vs. speedup for the three approaches. The trend shows sgnfcantly superor results for our PCA- Tree approach, for all speedup values. Smlarly to Fgure 1(a), we see a sharp degradaton of the LSH approach as the speedup ncreases, whle the tree-based approaches show a trend of a slow ncrease n RMSE values as the speedup ncreases. We note that even for hgh speed-up values, whch yeld low precson rates n Fgure 1(a), the RMSE values reman very low, ndcatng that very hgh qualty of recommendatons can be acheved at a fracton of the computatonal costs of the nave algorthm. In other words, the recommended tems are stll very relevant to the user, although the lst of recommended tems s qute dfferent from the optmal lst of tems. Fgure 2 depcts Precson@1 and RMSE@1 for the Yahoo! Musc dataset. The general trends of all three algorthms seem to agree wth those of Fgure 1: LSH starts better but deterorates quckly, and the tree-based approaches have smlar trends. The scale of the RMSE errors n Fgure 1(b) s dfferent (larger) because the predcted scores are n the range of -1, whereas n the Xbox Moves dataset the predctons are bnary. The emprcal analyss on both the Xbox and Yahoo! datasets shows that t s possble to acheve excellent recommendatons for very low computatonal costs by employng our Eucldean transformaton and usng an approxmate Eucldean nearest neghbor method. The results ndcate that treebased approaches are superor to an LSH based approach (except when the requred speedup s very small). Further, the results ndcate that our method yelds hgher qualty recommendatons than the IP-trees approach[13]. Note that we also compared Precson@K and RMSE@K for other K values. Whle the fgures are not ncluded n ths paper, the trends are all smlar to those presented above. 5.1.1 Comparng Dfferent Tree Approaches A key buldng block n our approach s algnng the tem vectors wth ther prncpal components (Equaton 3) and usng PCA-Trees rather than KD-Trees. Another essental 6 The larger dmenson s the auxlary dmenson ( φ 2 y 2 ) n Equaton 2.

1.9 IP Tree LSH Ths Paper.4.35 IP Tree LSH Ths Paper.8.3.7 Precson@1.6.5.4 RMSE@1.25.2.15.3.2.1.1.5 1 2 3 4 5 6 7 (a) Precson vs. 1 2 3 4 5 6 7 (b) RMSE vs. Fgure 1: Performance aganst speedup values for the Xbox Moves dataset top 1 recommendatons 1.9 IP Tree LSH Ths Paper 3 IP Tree LSH Ths Paper.8 25.7 2 Precson@1.6.5.4 RMSE@1 15.3 1.2 5.1 1 2 3 4 5 6 7 (a) Precson vs. 1 2 3 4 5 6 7 (b) RMSE vs. Fgure 2: Performance aganst speedup values for the Yahoo! Musc dataset top 1 recommendatons ngredent n our approach s the neghborhood boostng of Secton 4.2.1. One may queston the vtalty of PCA-Trees or the neghborhood boostng to our overall soluton. We therefore present a detaled comparson of the dfferent tree based approaches. For the sake of completeness, we also ncluded a comparson to PAC-Trees [15]. Table 1 summarzes the dfferent data structures. Except the IP-Tree approach, all of these approaches were not feasble before Theorem 1 was ntroduced n ths paper. Note that neghborhood boostng s possble only when the tree splts are all based on a sngle consstent axs system. It s therefore prohbted n IP-Tees and PAC-Trees where the splttng hyperplanes are ad-hoc on every node. We compare the approach proposed n ths paper wth smple KD-Trees, PAC-Trees, and wth PCA-Trees wthout neghborhood boostng(our approach wthout neghborhood boostng). Fgure 3 depcts Precson@1 and RMSE@1 on the Yahoo! Musc dataset. As the speedup levels ncrease, we notce an evdent advantage n favor of PCA algned trees over KD-Trees. When comparng PCA-Trees wthout neghborhood boostng to PAC-Trees we see a mxed pcture: For low speedup values PCA-Trees perform better, but for hgher speedup values we notce an emnent advantage n favor of PAC-Trees. To conclude, we note the overall advantage for the method proposed n ths paper over any of the other tree based alternatves both n terms of Precson and RMSE. 6. CONCLUSIONS We presented a novel transformaton mappng a maxmal nner product search to Eucldean nearest neghbor search, andshowedhowtcanbeusedtospeed-uptherecommendaton process n a matrx factorzaton based recommenders such as the Xbox recommender system. We proposed a method for approxmately solvng the Eucldean nearest neghbor problem usng PCA-Trees, and emprcally evaluated t on the Xbox Move recommendatons and the Yahoo Musc datasets. Our analyss shows that our approach allows achevng excellent qualty recommendatons at a fracton of the computatonal cost of a nave approach, and that t acheves superor qualty-speedup tradeoffs compared wth state-of-the-art methods.

1.9.8 KD Tree PAC Tree No neghborhood boostng Ths Paper 3 25 KD Tree PAC Tree No neghborhood boostng Ths Paper.7 2 Precson@1.6.5.4 RMSE@1 15.3 1.2 5.1 1 2 3 4 5 6 7 (a) Precson vs. 1 2 3 4 5 6 7 (b) RMSE vs. Fgure 3: Comparng tree based methods for the Yahoo! Musc dataset top 1 recommendatons 7. REFERENCES [1] Alexandr Andon and Potr Indyk. Near-optmal hashng algorthms for approxmate nearest neghbor n hgh dmensons. In FOCS, pages 459 468, 26. [2] Robert M. Bell and Yehuda Koren. Lessons from the netflx prze challenge. SIGKDD Explor. Newsl., 27. [3] Jon Lous Bentley. Multdmensonal bnary search trees used for assocatve searchng. Commun. ACM, 18(9):59 517, September 1975. [4] Andre Broder. On the resemblance and contanment of documents. In Proceedngs of the Compresson and Complexty of Sequences 1997, pages 21 29, 1997. [5] Moses S. Charkar. Smlarty estmaton technques from roundng algorthms. In Proceedngs of the Thry-fourth Annual ACM Symposum on Theory of Computng, pages 38 388, 22. [6] Sanjoy Dasgupta and Yoav Freund. Random projecton trees and low dmensonal manfolds. In Proceedngs of the Forteth Annual ACM Symposum on Theory of Computng, pages 537 546, 28. [7] Gdeon Dror, Noam Koengsten, and Yehuda Koren. Yahoo! musc recommendatons: Modelng musc ratngs wth temporal dynamcs and tem taxonomy. In Proc. 5th ACM Conference on Recommender Systems, 211. [8] Gdeon Dror, Noam Koengsten, Yehuda Koren, and Markus Wemer. The Yahoo! musc dataset and KDD-Cup 11. Journal Of Machne Learnng Research, 17:1 12, 211. [9] Potr Indyk and Rajeev Motwan. Approxmate nearest neghbors: Towards removng the curse of dmensonalty. In Proceedngs of the Thrteth Annual ACM Symposum on Theory of Computng, pages 64 613, 1998. [1] Mohammad Khoshneshn and W. Nck Street. Collaboratve flterng va eucldean embeddng. In Proceedngs of the fourth ACM conference on Recommender systems, 21. [11] Noam Koengsten and Yehuda Koren. Towards scalable and accurate tem-orented recommendatons. In Proc. 7th ACM Conference on Recommender Systems, 213. [12] Noam Koengsten and Ulrch Paquet. Xbox moves recommendatons: Varatonal Bayes matrx factorzaton wth embedded feature selecton. In Proc. 7th ACM Conference on Recommender Systems, 213. [13] Noam Koengsten, Parksht Ram, and Yuval Shavtt. Effcent retreval of recommendatons n a matrx factorzaton framework. In CIKM, 212. [14] Yehuda Koren, Robert M. Bell, and Chrs Volnsky. Matrx factorzaton technques for recommender systems. IEEE Computer, 29. [15] James McNames. A fast nearest-neghbor algorthm based on a prncpal axs search tree. IEEE Trans. Pattern Anal. Mach. Intell., 23(9):964 976, September 21. [16] Ulrch Paquet and Noam Koengsten. One-class collaboratve flterng wth random graphs. In Proceedngs of the 22nd nternatonal conference on World Wde Web, WWW 13, pages 999 18, 213. [17] Franco P. Preparata and Mchael I. Shamos. Computatonal Geometry: An Introducton. Sprnger, 1985. [18] Parksht Ram and Alexander Gray. Maxmum nner-product search usng cone trees. In SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng. ACM, 212. [19] Parksht Ram and Alexander G. Gray. Whch space parttonng tree to use for search? In Chrstopher J. C. Burges, Léon Bottou, Zoubn Ghahraman, and Klan Q. Wenberger, edtors, NIPS, pages 656 664, 213. [2] Parksht Ram, Dongryeol Lee, and Alexander G. Gray. Nearest-neghbor search on a tme budget va max-margn trees. In SDM, pages 111 122. SIAM / Omnpress, 212. [21] Robert F. Sproull. Refnements to nearest-neghbor searchng n k-dmensonal trees. Algorthmca, 6(4):579 589, 1991.