Reviewer Profiling Using Sparse Matrix Regression

Similar documents
Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

Lecture 26: Missing data

General Instructions. Questions

ICRA 2016 Tutorial on SLAM. Graph-Based SLAM and Sparsity. Cyrill Stachniss

CSC 411 Lecture 18: Matrix Factorizations

Linear Methods for Regression and Shrinkage Methods

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

CS 664 Structure and Motion. Daniel Huttenlocher

Convex Optimization / Homework 2, due Oct 3

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Lecture on Modeling Tools for Clustering & Regression

Singular Value Decomposition, and Application to Recommender Systems

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Lecture 17 Sparse Convex Optimization

Latent Semantic Indexing

Outline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff

I How does the formulation (5) serve the purpose of the composite parameterization

Effectiveness of Sparse Features: An Application of Sparse PCA

Short Communications

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Information Retrieval: Retrieval Models

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Content-based Dimensionality Reduction for Recommender Systems

An efficient algorithm for sparse PCA

Lecture 3: Camera Calibration, DLT, SVD

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

CS5220: Assignment Topic Modeling via Matrix Computation in Julia

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering CS 550: Machine Learning

ELEG Compressive Sensing and Sparse Signal Representations

Capturing, Modeling, Rendering 3D Structures

Information Networks: PageRank

Cluster Analysis (b) Lijun Zhang

Recognition, SVD, and PCA

Recommender Systems: User Experience and System Issues

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

10. Network dimensioning

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

Principal Component Analysis for Distributed Data

Bagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

Machine Learning Feature Creation and Selection

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Algebraic Iterative Methods for Computed Tomography

Unsupervised learning in Vision

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

Understanding Clustering Supervising the unsupervised

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

MSA220 - Statistical Learning for Big Data

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Advanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

CMPSCI 646, Information Retrieval (Fall 2003)

Generalized Tree-Based Wavelet Transform and Applications to Patch-Based Image Processing

Non-negative Matrix Factorization for Multimodal Image Retrieval

Metric Structure from Motion

Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Advanced Topics in Digital Communications Spezielle Methoden der digitalen Datenübertragung

Robust Lossless Image Watermarking in Integer Wavelet Domain using SVD

Note Set 4: Finite Mixture Models and the EM Algorithm

Least Squares and SLAM Pose-SLAM

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Algebraic Iterative Methods for Computed Tomography

Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University

Practical Guidance for Machine Learning Applications

Chapter 7: Computation of the Camera Matrix P

New user profile learning for extremely sparse data sets

56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997

CS 664 Segmentation. Daniel Huttenlocher

CS 231A: Computer Vision (Winter 2018) Problem Set 2

Clustering. Bruno Martins. 1 st Semester 2012/2013

Facial Expression Recognition Using Non-negative Matrix Factorization

Data fusion and multi-cue data matching using diffusion maps

Chapter 4: Text Clustering

Prof. Noah Snavely CS Administrivia. A4 due on Friday (please sign up for demo slots)

Network Lasso: Clustering and Optimization in Large Graphs

ECS289: Scalable Machine Learning

Learning a Manifold as an Atlas Supplementary Material

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Generalized trace ratio optimization and applications

Transcription:

Reviewer Profiling Using Sparse Matrix Regression Evangelos E. Papalexakis, Nicholas D. Sidiropoulos, Minos N. Garofalakis Technical University of Crete, ECE department 14 December 2010, OEDM 2010, Sydney, Australia Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 1 / 17

Motivation Consider a typical conference/workshop. We have a pool of P papers and R reviewers. The TPC chair needs to assign those papers to reviewers in a succinct manner. Each reviewer should review a (pre-determined) number of papers. Those papers should fall under the field of expertise of each reviewer. Key idea: Represent each paper & reviewer as vectors in a low dimensional space. Choose as a basis a small number ( 40 50) of terms that concisely characterizes the broad area of the conference. Match the papers to reviewers, using those profile vectors. C.J. Taylor et.al, On the Optimal Assignment of Conference Papers to Reviewers This talk is about deriving keyword profiles for reviewers and papers, using a common keyword list as a basis. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 2 / 17

Main Idea Express both entities (reviewers & papers) in a (usually) high dimensional space. Use elementary text mining tools to retrieve bulk terms that describe each reviewer and each paper. The union of those bulk terms is our starting point: we express reviewer and papers according to that basis. Use dimensionality reduction techniques to keep only the essential terms. LSI-SVD comes to mind! Factorize a data matrix M as M = AB T where A = UΣ, B = V, and [U, Σ, V] is the Singular Value Decomposition of M, usually in a low rank. This factorization is optimal in the least squares sense. However, this approach has a significant drawback! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 3 / 17

LSI-SVD drawbacks regarding our application Apart from optimality in the least squares sense, we desire model interpretability The optimal factorization (SVD) produces both negative and positive coefficients. Negative coefficients imply possible cancellations in a linear combination. We want strictly additive combinations. A negative coefficient has little interpretation value. When a reviewer/paper is not matched by a certain term, the corresponding coefficient should be exactly zero, or at least very small but positive. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 4 / 17

Non-Negative Matrix Factorization Non-negative Matrix Factorization offers model interpretability by imposing non-negativity on all coefficients. min A,B M ABT 2 F subject to a i,j 0 and b i,j 0 Non-linearity of the bilinear model leads to non-convexity. The algorithm may converge to a local minimum. The most popular algorithm for the computation of the NMF is the Multiplicative Method Cost per iteration: O(IJ ˆk). Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 5 / 17

Initial Formulation As input, we get the list of prospective reviewers and the list of submitted papers. From all of the above entities, we extract a set of raw terms with size equal to T. This set of terms defines the basis vector for each reviewer/paper. The dimension T can be very large (e.g 2000) Then, we create the following matrices: P: This matrix is the P T paper-by-term matrix. P denotes the number of submitted papers and T denotes the number of all initially extracted terms. R : This matrix is the R T reviewer-by-term matrix. R denotes the number of the reviewers. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 6 / 17

Generic Algorithm Our algorithm, regardless the type of the factorization is: [ R Form matrix M = P] Factor M AB T, in a lower rank ˆk. Each column of B contains a topic/group of terms in the ˆk-dimensional space. Each row of A contains the corresponding weight of a reviewer or paper to a group of terms. Reconstruct each profile vector, as a linear combination of the columns of B: ˆM = AB T. The reconstructed profiles are ˆR and ˆP. The peaks of each row of ˆM indicate the highest scoring terms for each reviewer and paper. Assemble the highest scoring reviewer terms and limit them only to terms that appear on paper titles too. The final set of terms is called T final. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 7 / 17

NMF Issues The factors A, B are dense. Each reconstructed profile vector exhibits peaks at the highest scoring terms for each entity. Apart from the peaks, there is a lot of noise, in the form of coefficients with small values. We can not derive the highest scoring terms directly from the profile vector! Solution: We resort to post-processing of each profile vector: We sort each profile and keep the ˆt largest coefficients. But, can we do something better? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 8 / 17

Sparse Matrix Regression Factorize M AB T with additional l 1 regularization on A, B (plus non-negativity constraints). min { M A,B ABT 2 F +λ A 1 +λ B 1 } As A 1 we define sum(sum(abs(a)). subject to a i,j 0 and b i,j 0 It has been shown that l 1 regularization leads to sparse solutions. In our case, we impose sparsity penalty on both matrices, since we desire both the latent Reviewer/Paper profiles, and the latent term profiles to be sparse The above problem is non-linear and cannot be solved directly. Instead, we solve the following Lasso Regression problems in an alternating fashion, with cost per iteration: O(IJˆk 2 ). min B { M AB T 2 F +λ B 1 } min A { M T BA T 2 F +λ A 1 } Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 9 / 17

How do we collect data from each reviewer? When it comes to submitted papers, we already have the title of each paper. On the other hand, the only thing available regarding the reviewers is a list with their names. We would ideally like a list of each reviewer s publications. A good tactic is to look on-line for this piece of information. Google Scholar gives us this opportunity! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 10 / 17

GoogleScholar Miner A java-based software we developed. Queries Google Scholar with a particular reviewer. Browses through the results, retrieves the list of publications with associated citations and date of pub. We push up papers that are highly cited and/or recent: A highly cited paper indicates the field of expertise of the reviewer. A recent paper indicates a current research interest of the reviewer. GoogleScholar Miner outputs a set of terms influenced by each paper s weight. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 11 / 17

GoogleScholar Miner Example Paper titles retrieved: 10 9 8 7 6 5 Weight vector for N Sidiropoulos 1 fast nearest neighbor search in medical image databases 2 blind parafac receivers for ds-cdma systems 3 on the uniqueness of multilinear decomposition of n-way arrays 4 parallel factor analysis in sensor array processing 5 fast and effective retrieval of medical tumor shapes 4 3 2 1 0 0 10 20 30 40 50 60 70 80 paper Terms retrieved (wrt the weights): multilinear, robust iterative, iterative fitting, multilinear models, beamforming, multidimensional, harmonic retrieval, access control-physical, control-physical cross-layer, collision resolution, fitting, user selection, physical-layer multicasting, 6 online data mining for co-evolving time sequences 7 on downlink beamforming with greedy user selection: performance analysis an simple new algorithm 8 transmit beamforming for physical-layer multicasting 9 medium access control-physical cross-layer design 10 almost-sure identifiability of multidimensional harmonic retrieval 11 collision resolution in packet radio networks using rotational invariance techniq 12 cramer-rao lower bounds for low-rank decomposition of multidimensional array. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 12 / 17

Why focus on paper titles? Even though the methods we developed can be extended to full text indexing, we focus exclusively on paper titles: A paper title contains the distilled essence of the full text, as the author himself decided best to produce. Hopefully, the title summarizes the full text in a more succinct manner (compared to automated tools). Due to confidentiality/accessibility reasons, the full text might not be available. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 13 / 17

Profile Precision vs Rank For quantitative evaluation, we used data from a real conference. We asked the TPC chair, who is a domain expert, to mark the terms extracted as relevant or not. Precision is the fraction of the retrieved terms that are also relevant Relevant Retrieved Precision = Retrieved Also note that Relevant Retrieved Precision 0.9 0.85 0.8 0.75 0.7 0.65 0.6 Precision vs ˆk for NMF and SMR NMF, T=1251 NMF, T=1844 NMF, T=2431 SMR, T=1251, λ=0.6 SMR, T=1844, λ=1.3 SMR, T=2431, λ=1.3 0.55 0.5 0.45 0.4 20 25 30 35 40 ˆk Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 14 / 17

Reviewing Assignments Evaluation We used SMR profiles to produce Reviewing Assignments for a real conference. We also asked reviewers and authors to choose from a list of terms that best represented their bio/paper. With aid from the TPC chair, we measured the probablity of a bad assignment for each of the two assignments. We define a bad assignment as an assignment where more than half of the assigned papers to a reviewer are not suitable regarding his expertise. Some simplifying assumptions: 1) Each reviewer s expertise covers 1 th of the 7 broad scientific field of the conference. 2) Each assignment consists of 4 papers per reviewer. The probability of a bad assignment in a set of random assignments is: ( ) ( ) 3 ( ) 4 4 1 6 6 Pr{bad} = + 0.9 3 7 7 7 Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 15 / 17

Take Home Point Profiles Created for our testbed conference Manual Custom profiles SMR profiles Random TPC chair: 7 days TPC chair: 2-4 hrs TPC chair: 0 hrs TPC chair 10 min Reviewer: 0 hrs Reviewer: 2 min Reviewer: 0 hrs Reviewer: 0 hrs Author: 0 hrs Author: 2 min Author: 0 hrs Author: 0 hrs Pr{bad} = 0.109 Pr{bad} = 0.047 Pr{bad} = 0.1875 Pr{bad} = 0.9 Conclusions & Future Work SMR eliminates noise that NMF allows, yielding clearer profiles. Our approach yields relatively good assignements (wrt to Pr{bad}), requiring zero effort from everyone! We are currently working on the modification of the algorithm, in order to allow for imbalanced sparsity penalties. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 16 / 17

The End! Thank you for your attention! Any Questions?? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM 2010 17 / 17