Reviewer Profiling Using Sparse Matrix Regression Evangelos E. Papalexakis, Nicholas D. Sidiropoulos, Minos N. Garofalakis Technical University of Crete, ECE department 14 December 2010, OEDM 2010, Sydney, Australia Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 1 / 17
Motivation Consider a typical conference/workshop. We have a pool of P papers and R reviewers. The TPC chair needs to assign those papers to reviewers in a succinct manner. Each reviewer should review a (pre-determined) number of papers. Those papers should fall under the field of expertise of each reviewer. Key idea: Represent each paper & reviewer as vectors in a low dimensional space. Choose as a basis a small number ( 40 50) of terms that concisely characterizes the broad area of the conference. Match the papers to reviewers, using those profile vectors. C.J. Taylor et.al, On the Optimal Assignment of Conference Papers to Reviewers This talk is about deriving keyword profiles for reviewers and papers, using a common keyword list as a basis. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 2 / 17
Main Idea Express both entities (reviewers & papers) in a (usually) high dimensional space. Use elementary text mining tools to retrieve bulk terms that describe each reviewer and each paper. The union of those bulk terms is our starting point: we express reviewer and papers according to that basis. Use dimensionality reduction techniques to keep only the essential terms. LSI-SVD comes to mind! Factorize a data matrix M as M = AB T where A = UΣ, B = V, and [U, Σ, V] is the Singular Value Decomposition of M, usually in a low rank. This factorization is optimal in the least squares sense. However, this approach has a significant drawback! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 3 / 17
LSI-SVD drawbacks regarding our application Apart from optimality in the least squares sense, we desire model interpretability The optimal factorization (SVD) produces both negative and positive coefficients. Negative coefficients imply possible cancellations in a linear combination. We want strictly additive combinations. A negative coefficient has little interpretation value. When a reviewer/paper is not matched by a certain term, the corresponding coefficient should be exactly zero, or at least very small but positive. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 4 / 17
Non-Negative Matrix Factorization Non-negative Matrix Factorization offers model interpretability by imposing non-negativity on all coefficients. min A,B M ABT 2 F subject to a i,j 0 and b i,j 0 Non-linearity of the bilinear model leads to non-convexity. The algorithm may converge to a local minimum. The most popular algorithm for the computation of the NMF is the Multiplicative Method Cost per iteration: O(IJ ˆk). Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 5 / 17
Initial Formulation As input, we get the list of prospective reviewers and the list of submitted papers. From all of the above entities, we extract a set of raw terms with size equal to T. This set of terms defines the basis vector for each reviewer/paper. The dimension T can be very large (e.g 2000) Then, we create the following matrices: P: This matrix is the P T paper-by-term matrix. P denotes the number of submitted papers and T denotes the number of all initially extracted terms. R : This matrix is the R T reviewer-by-term matrix. R denotes the number of the reviewers. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 6 / 17
Generic Algorithm Our algorithm, regardless the type of the factorization is: [ R Form matrix M = P] Factor M AB T, in a lower rank ˆk. Each column of B contains a topic/group of terms in the ˆk-dimensional space. Each row of A contains the corresponding weight of a reviewer or paper to a group of terms. Reconstruct each profile vector, as a linear combination of the columns of B: ˆM = AB T. The reconstructed profiles are ˆR and ˆP. The peaks of each row of ˆM indicate the highest scoring terms for each reviewer and paper. Assemble the highest scoring reviewer terms and limit them only to terms that appear on paper titles too. The final set of terms is called T final. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 7 / 17
NMF Issues The factors A, B are dense. Each reconstructed profile vector exhibits peaks at the highest scoring terms for each entity. Apart from the peaks, there is a lot of noise, in the form of coefficients with small values. We can not derive the highest scoring terms directly from the profile vector! Solution: We resort to post-processing of each profile vector: We sort each profile and keep the ˆt largest coefficients. But, can we do something better? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 8 / 17
Sparse Matrix Regression Factorize M AB T with additional l 1 regularization on A, B (plus non-negativity constraints). min { M A,B ABT 2 F +λ A 1 +λ B 1 } As A 1 we define sum(sum(abs(a)). subject to a i,j 0 and b i,j 0 It has been shown that l 1 regularization leads to sparse solutions. In our case, we impose sparsity penalty on both matrices, since we desire both the latent Reviewer/Paper profiles, and the latent term profiles to be sparse The above problem is non-linear and cannot be solved directly. Instead, we solve the following Lasso Regression problems in an alternating fashion, with cost per iteration: O(IJˆk 2 ). min B { M AB T 2 F +λ B 1 } min A { M T BA T 2 F +λ A 1 } Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 9 / 17
How do we collect data from each reviewer? When it comes to submitted papers, we already have the title of each paper. On the other hand, the only thing available regarding the reviewers is a list with their names. We would ideally like a list of each reviewer s publications. A good tactic is to look on-line for this piece of information. Google Scholar gives us this opportunity! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 10 / 17
GoogleScholar Miner A java-based software we developed. Queries Google Scholar with a particular reviewer. Browses through the results, retrieves the list of publications with associated citations and date of pub. We push up papers that are highly cited and/or recent: A highly cited paper indicates the field of expertise of the reviewer. A recent paper indicates a current research interest of the reviewer. GoogleScholar Miner outputs a set of terms influenced by each paper s weight. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 11 / 17
GoogleScholar Miner Example Paper titles retrieved: 10 9 8 7 6 5 Weight vector for N Sidiropoulos 1 fast nearest neighbor search in medical image databases 2 blind parafac receivers for ds-cdma systems 3 on the uniqueness of multilinear decomposition of n-way arrays 4 parallel factor analysis in sensor array processing 5 fast and effective retrieval of medical tumor shapes 4 3 2 1 0 0 10 20 30 40 50 60 70 80 paper Terms retrieved (wrt the weights): multilinear, robust iterative, iterative fitting, multilinear models, beamforming, multidimensional, harmonic retrieval, access control-physical, control-physical cross-layer, collision resolution, fitting, user selection, physical-layer multicasting, 6 online data mining for co-evolving time sequences 7 on downlink beamforming with greedy user selection: performance analysis an simple new algorithm 8 transmit beamforming for physical-layer multicasting 9 medium access control-physical cross-layer design 10 almost-sure identifiability of multidimensional harmonic retrieval 11 collision resolution in packet radio networks using rotational invariance techniq 12 cramer-rao lower bounds for low-rank decomposition of multidimensional array. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 12 / 17
Why focus on paper titles? Even though the methods we developed can be extended to full text indexing, we focus exclusively on paper titles: A paper title contains the distilled essence of the full text, as the author himself decided best to produce. Hopefully, the title summarizes the full text in a more succinct manner (compared to automated tools). Due to confidentiality/accessibility reasons, the full text might not be available. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 13 / 17
Profile Precision vs Rank For quantitative evaluation, we used data from a real conference. We asked the TPC chair, who is a domain expert, to mark the terms extracted as relevant or not. Precision is the fraction of the retrieved terms that are also relevant Relevant Retrieved Precision = Retrieved Also note that Relevant Retrieved Precision 0.9 0.85 0.8 0.75 0.7 0.65 0.6 Precision vs ˆk for NMF and SMR NMF, T=1251 NMF, T=1844 NMF, T=2431 SMR, T=1251, λ=0.6 SMR, T=1844, λ=1.3 SMR, T=2431, λ=1.3 0.55 0.5 0.45 0.4 20 25 30 35 40 ˆk Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 14 / 17
Reviewing Assignments Evaluation We used SMR profiles to produce Reviewing Assignments for a real conference. We also asked reviewers and authors to choose from a list of terms that best represented their bio/paper. With aid from the TPC chair, we measured the probablity of a bad assignment for each of the two assignments. We define a bad assignment as an assignment where more than half of the assigned papers to a reviewer are not suitable regarding his expertise. Some simplifying assumptions: 1) Each reviewer s expertise covers 1 th of the 7 broad scientific field of the conference. 2) Each assignment consists of 4 papers per reviewer. The probability of a bad assignment in a set of random assignments is: ( ) ( ) 3 ( ) 4 4 1 6 6 Pr{bad} = + 0.9 3 7 7 7 Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 15 / 17
Take Home Point Profiles Created for our testbed conference Manual Custom profiles SMR profiles Random TPC chair: 7 days TPC chair: 2-4 hrs TPC chair: 0 hrs TPC chair 10 min Reviewer: 0 hrs Reviewer: 2 min Reviewer: 0 hrs Reviewer: 0 hrs Author: 0 hrs Author: 2 min Author: 0 hrs Author: 0 hrs Pr{bad} = 0.109 Pr{bad} = 0.047 Pr{bad} = 0.1875 Pr{bad} = 0.9 Conclusions & Future Work SMR eliminates noise that NMF allows, yielding clearer profiles. Our approach yields relatively good assignements (wrt to Pr{bad}), requiring zero effort from everyone! We are currently working on the modification of the algorithm, in order to allow for imbalanced sparsity penalties. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 16 / 17
The End! Thank you for your attention! Any Questions?? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 17 / 17