Reviewer Profiling Using Sparse Matrix Regression
|
|
- Earl Taylor
- 6 years ago
- Views:
Transcription
1 Reviewer Profiling Using Sparse Matrix Regression Evangelos E. Papalexakis, Nicholas D. Sidiropoulos, Minos N. Garofalakis Technical University of Crete, ECE department 14 December 2010, OEDM 2010, Sydney, Australia Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
2 Motivation Consider a typical conference/workshop. We have a pool of P papers and R reviewers. The TPC chair needs to assign those papers to reviewers in a succinct manner. Each reviewer should review a (pre-determined) number of papers. Those papers should fall under the field of expertise of each reviewer. Key idea: Represent each paper & reviewer as vectors in a low dimensional space. Choose as a basis a small number ( 40 50) of terms that concisely characterizes the broad area of the conference. Match the papers to reviewers, using those profile vectors. C.J. Taylor et.al, On the Optimal Assignment of Conference Papers to Reviewers This talk is about deriving keyword profiles for reviewers and papers, using a common keyword list as a basis. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
3 Main Idea Express both entities (reviewers & papers) in a (usually) high dimensional space. Use elementary text mining tools to retrieve bulk terms that describe each reviewer and each paper. The union of those bulk terms is our starting point: we express reviewer and papers according to that basis. Use dimensionality reduction techniques to keep only the essential terms. LSI-SVD comes to mind! Factorize a data matrix M as M = AB T where A = UΣ, B = V, and [U, Σ, V] is the Singular Value Decomposition of M, usually in a low rank. This factorization is optimal in the least squares sense. However, this approach has a significant drawback! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
4 LSI-SVD drawbacks regarding our application Apart from optimality in the least squares sense, we desire model interpretability The optimal factorization (SVD) produces both negative and positive coefficients. Negative coefficients imply possible cancellations in a linear combination. We want strictly additive combinations. A negative coefficient has little interpretation value. When a reviewer/paper is not matched by a certain term, the corresponding coefficient should be exactly zero, or at least very small but positive. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
5 Non-Negative Matrix Factorization Non-negative Matrix Factorization offers model interpretability by imposing non-negativity on all coefficients. min A,B M ABT 2 F subject to a i,j 0 and b i,j 0 Non-linearity of the bilinear model leads to non-convexity. The algorithm may converge to a local minimum. The most popular algorithm for the computation of the NMF is the Multiplicative Method Cost per iteration: O(IJ ˆk). Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
6 Initial Formulation As input, we get the list of prospective reviewers and the list of submitted papers. From all of the above entities, we extract a set of raw terms with size equal to T. This set of terms defines the basis vector for each reviewer/paper. The dimension T can be very large (e.g 2000) Then, we create the following matrices: P: This matrix is the P T paper-by-term matrix. P denotes the number of submitted papers and T denotes the number of all initially extracted terms. R : This matrix is the R T reviewer-by-term matrix. R denotes the number of the reviewers. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
7 Generic Algorithm Our algorithm, regardless the type of the factorization is: [ R Form matrix M = P] Factor M AB T, in a lower rank ˆk. Each column of B contains a topic/group of terms in the ˆk-dimensional space. Each row of A contains the corresponding weight of a reviewer or paper to a group of terms. Reconstruct each profile vector, as a linear combination of the columns of B: ˆM = AB T. The reconstructed profiles are ˆR and ˆP. The peaks of each row of ˆM indicate the highest scoring terms for each reviewer and paper. Assemble the highest scoring reviewer terms and limit them only to terms that appear on paper titles too. The final set of terms is called T final. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
8 NMF Issues The factors A, B are dense. Each reconstructed profile vector exhibits peaks at the highest scoring terms for each entity. Apart from the peaks, there is a lot of noise, in the form of coefficients with small values. We can not derive the highest scoring terms directly from the profile vector! Solution: We resort to post-processing of each profile vector: We sort each profile and keep the ˆt largest coefficients. But, can we do something better? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
9 Sparse Matrix Regression Factorize M AB T with additional l 1 regularization on A, B (plus non-negativity constraints). min { M A,B ABT 2 F +λ A 1 +λ B 1 } As A 1 we define sum(sum(abs(a)). subject to a i,j 0 and b i,j 0 It has been shown that l 1 regularization leads to sparse solutions. In our case, we impose sparsity penalty on both matrices, since we desire both the latent Reviewer/Paper profiles, and the latent term profiles to be sparse The above problem is non-linear and cannot be solved directly. Instead, we solve the following Lasso Regression problems in an alternating fashion, with cost per iteration: O(IJˆk 2 ). min B { M AB T 2 F +λ B 1 } min A { M T BA T 2 F +λ A 1 } Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
10 How do we collect data from each reviewer? When it comes to submitted papers, we already have the title of each paper. On the other hand, the only thing available regarding the reviewers is a list with their names. We would ideally like a list of each reviewer s publications. A good tactic is to look on-line for this piece of information. Google Scholar gives us this opportunity! Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
11 GoogleScholar Miner A java-based software we developed. Queries Google Scholar with a particular reviewer. Browses through the results, retrieves the list of publications with associated citations and date of pub. We push up papers that are highly cited and/or recent: A highly cited paper indicates the field of expertise of the reviewer. A recent paper indicates a current research interest of the reviewer. GoogleScholar Miner outputs a set of terms influenced by each paper s weight. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
12 GoogleScholar Miner Example Paper titles retrieved: Weight vector for N Sidiropoulos 1 fast nearest neighbor search in medical image databases 2 blind parafac receivers for ds-cdma systems 3 on the uniqueness of multilinear decomposition of n-way arrays 4 parallel factor analysis in sensor array processing 5 fast and effective retrieval of medical tumor shapes paper Terms retrieved (wrt the weights): multilinear, robust iterative, iterative fitting, multilinear models, beamforming, multidimensional, harmonic retrieval, access control-physical, control-physical cross-layer, collision resolution, fitting, user selection, physical-layer multicasting, 6 online data mining for co-evolving time sequences 7 on downlink beamforming with greedy user selection: performance analysis an simple new algorithm 8 transmit beamforming for physical-layer multicasting 9 medium access control-physical cross-layer design 10 almost-sure identifiability of multidimensional harmonic retrieval 11 collision resolution in packet radio networks using rotational invariance techniq 12 cramer-rao lower bounds for low-rank decomposition of multidimensional array. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
13 Why focus on paper titles? Even though the methods we developed can be extended to full text indexing, we focus exclusively on paper titles: A paper title contains the distilled essence of the full text, as the author himself decided best to produce. Hopefully, the title summarizes the full text in a more succinct manner (compared to automated tools). Due to confidentiality/accessibility reasons, the full text might not be available. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
14 Profile Precision vs Rank For quantitative evaluation, we used data from a real conference. We asked the TPC chair, who is a domain expert, to mark the terms extracted as relevant or not. Precision is the fraction of the retrieved terms that are also relevant Relevant Retrieved Precision = Retrieved Also note that Relevant Retrieved Precision Precision vs ˆk for NMF and SMR NMF, T=1251 NMF, T=1844 NMF, T=2431 SMR, T=1251, λ=0.6 SMR, T=1844, λ=1.3 SMR, T=2431, λ= ˆk Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
15 Reviewing Assignments Evaluation We used SMR profiles to produce Reviewing Assignments for a real conference. We also asked reviewers and authors to choose from a list of terms that best represented their bio/paper. With aid from the TPC chair, we measured the probablity of a bad assignment for each of the two assignments. We define a bad assignment as an assignment where more than half of the assigned papers to a reviewer are not suitable regarding his expertise. Some simplifying assumptions: 1) Each reviewer s expertise covers 1 th of the 7 broad scientific field of the conference. 2) Each assignment consists of 4 papers per reviewer. The probability of a bad assignment in a set of random assignments is: ( ) ( ) 3 ( ) Pr{bad} = Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
16 Take Home Point Profiles Created for our testbed conference Manual Custom profiles SMR profiles Random TPC chair: 7 days TPC chair: 2-4 hrs TPC chair: 0 hrs TPC chair 10 min Reviewer: 0 hrs Reviewer: 2 min Reviewer: 0 hrs Reviewer: 0 hrs Author: 0 hrs Author: 2 min Author: 0 hrs Author: 0 hrs Pr{bad} = Pr{bad} = Pr{bad} = Pr{bad} = 0.9 Conclusions & Future Work SMR eliminates noise that NMF allows, yielding clearer profiles. Our approach yields relatively good assignements (wrt to Pr{bad}), requiring zero effort from everyone! We are currently working on the modification of the algorithm, in order to allow for imbalanced sparsity penalties. Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
17 The End! Thank you for your attention! Any Questions?? Papalexakis, Sidiropoulos, Garofalakis (TUC ECE) Reviewer Profiling Using Sparse Matrix Regression 14 Dec. 2010, OEDM / 17
Mining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationBOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen
BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationICRA 2016 Tutorial on SLAM. Graph-Based SLAM and Sparsity. Cyrill Stachniss
ICRA 2016 Tutorial on SLAM Graph-Based SLAM and Sparsity Cyrill Stachniss 1 Graph-Based SLAM?? 2 Graph-Based SLAM?? SLAM = simultaneous localization and mapping 3 Graph-Based SLAM?? SLAM = simultaneous
More informationCSC 411 Lecture 18: Matrix Factorizations
CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 18-Matrix Factorizations 1 / 27 Overview Recall PCA: project data
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationRegularized Tensor Factorizations & Higher-Order Principal Components Analysis
Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,
More informationCS 664 Structure and Motion. Daniel Huttenlocher
CS 664 Structure and Motion Daniel Huttenlocher Determining 3D Structure Consider set of 3D points X j seen by set of cameras with projection matrices P i Given only image coordinates x ij of each point
More informationConvex Optimization / Homework 2, due Oct 3
Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationLecture 17 Sparse Convex Optimization
Lecture 17 Sparse Convex Optimization Compressed sensing A short introduction to Compressed Sensing An imaging perspective 10 Mega Pixels Scene Image compression Picture Why do we compress images? Introduction
More informationLatent Semantic Indexing
Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling
More informationOutline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff
Compressed Sensing David L Donoho Presented by: Nitesh Shroff University of Maryland Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationEffectiveness of Sparse Features: An Application of Sparse PCA
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationShort Communications
Pertanika J. Sci. & Technol. 9 (): 9 35 (0) ISSN: 08-7680 Universiti Putra Malaysia Press Short Communications Singular Value Decomposition Based Sub-band Decomposition and Multiresolution (SVD-SBD-MRR)
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationAn efficient algorithm for sparse PCA
An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System
More informationLecture 3: Camera Calibration, DLT, SVD
Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationCS5220: Assignment Topic Modeling via Matrix Computation in Julia
CS5220: Assignment Topic Modeling via Matrix Computation in Julia April 7, 2014 1 Introduction Topic modeling is widely used to summarize major themes in large document collections. Topic modeling methods
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /
More informationCapturing, Modeling, Rendering 3D Structures
Computer Vision Approach Capturing, Modeling, Rendering 3D Structures Calculate pixel correspondences and extract geometry Not robust Difficult to acquire illumination effects, e.g. specular highlights
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationRecognition, SVD, and PCA
Recognition, SVD, and PCA Recognition Suppose you want to find a face in an image One possibility: look for something that looks sort of like a face (oval, dark band near top, dark band near bottom) Another
More informationRecommender Systems: User Experience and System Issues
Recommender Systems: User Experience and System ssues Joseph A. Konstan University of Minnesota konstan@cs.umn.edu http://www.grouplens.org Summer 2005 1 About me Professor of Computer Science & Engineering,
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More information10. Network dimensioning
Partly based on slide material by Samuli Aalto and Jorma Virtamo ELEC-C7210 Modeling and analysis of communication networks 1 Contents Introduction Parameters: topology, routing and traffic Dimensioning
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationPrincipal Component Analysis for Distributed Data
Principal Component Analysis for Distributed Data David Woodruff IBM Almaden Based on works with Ken Clarkson, Ravi Kannan, and Santosh Vempala Outline 1. What is low rank approximation? 2. How do we solve
More informationBagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu
Bagging & System Combination for POS Tagging Dan Jinguji Joshua T. Minor Ping Yu Bagging Bagging can gain substantially in accuracy The vital element is the instability of the learning algorithm Bagging
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value
More informationMachine Learning Feature Creation and Selection
Machine Learning Feature Creation and Selection Jeff Howbert Introduction to Machine Learning Winter 2012 1 Feature creation Well-conceived new features can sometimes capture the important information
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationExplore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan
Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationMissing Data Estimation in Microarrays Using Multi-Organism Approach
Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008
More informationDetecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference
Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400
More informationAdvanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz
Advanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz SLAM Constraints connect the poses of the robot while it is moving
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationSPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari
SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical
More informationMinoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University
Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationGeneralized Tree-Based Wavelet Transform and Applications to Patch-Based Image Processing
Generalized Tree-Based Wavelet Transform and * Michael Elad The Computer Science Department The Technion Israel Institute of technology Haifa 32000, Israel *Joint work with A Seminar in the Hebrew University
More informationNon-negative Matrix Factorization for Multimodal Image Retrieval
Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The
More informationMetric Structure from Motion
CS443 Final Project Metric Structure from Motion Peng Cheng 1 Objective of the Project Given: 1. A static object with n feature points and unknown shape. 2. A camera with unknown intrinsic parameters takes
More informationSlides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization
Slides adapted from Marshall Tappen and Bryan Russell Algorithms in Nature Non-negative matrix factorization Dimensionality Reduction The curse of dimensionality: Too many features makes it difficult to
More informationRobust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma
Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected
More informationAdvanced Topics in Digital Communications Spezielle Methoden der digitalen Datenübertragung
Advanced Topics in Digital Communications Spezielle Methoden der digitalen Datenübertragung Dr.-Ing. Carsten Bockelmann Institute for Telecommunications and High-Frequency Techniques Department of Communications
More informationRobust Lossless Image Watermarking in Integer Wavelet Domain using SVD
Robust Lossless Image Watermarking in Integer Domain using SVD 1 A. Kala 1 PG scholar, Department of CSE, Sri Venkateswara College of Engineering, Chennai 1 akala@svce.ac.in 2 K. haiyalnayaki 2 Associate
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationLeast Squares and SLAM Pose-SLAM
Least Squares and SLAM Pose-SLAM Giorgio Grisetti Part of the material of this course is taken from the Robotics 2 lectures given by G.Grisetti, W.Burgard, C.Stachniss, K.Arras, D. Tipaldi and M.Bennewitz
More informationCompression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets Mehmet Koyutürk, Ananth Grama, and Naren Ramakrishnan
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationUsing Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam
Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationBeyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University
Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University Why Image Retrieval? World Wide Web: Millions of hosts Billions of images Growth of video
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationChapter 7: Computation of the Camera Matrix P
Chapter 7: Computation of the Camera Matrix P Arco Nederveen Eagle Vision March 18, 2008 Arco Nederveen (Eagle Vision) The Camera Matrix P March 18, 2008 1 / 25 1 Chapter 7: Computation of the camera Matrix
More informationNew user profile learning for extremely sparse data sets
New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl.
More information56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997
56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997 Answer #1 and any five of the remaining six problems! possible score 1. Multiple Choice 25 2. Traveling Salesman Problem 15 3.
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationCS 231A: Computer Vision (Winter 2018) Problem Set 2
CS 231A: Computer Vision (Winter 2018) Problem Set 2 Due Date: Feb 09 2018, 11:59pm Note: In this PS, using python2 is recommended, as the data files are dumped with python2. Using python3 might cause
More informationClustering. Bruno Martins. 1 st Semester 2012/2013
Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts
More informationFacial Expression Recognition Using Non-negative Matrix Factorization
Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationProf. Noah Snavely CS Administrivia. A4 due on Friday (please sign up for demo slots)
Robust fitting Prof. Noah Snavely CS111 http://www.cs.cornell.edu/courses/cs111 Administrivia A due on Friday (please sign up for demo slots) A5 will be out soon Prelim is coming up, Tuesday, / Roadmap
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationLearning a Manifold as an Atlas Supplementary Material
Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationGeneralized trace ratio optimization and applications
Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO
More information