x 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20
|
|
- Nora Harrison
- 5 years ago
- Views:
Transcription
1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) L R 75 x y= x 1
2 1 Introducing the model 2
3 1 Brain decoding We are given: n = # scans; p = number of voxels in mask design matrix: X R n p (brain images) response vector: y R n (external covariates) Need to predict y on new data. Linear model assumption: y X w We seek to estimate the weights map, w that ensures best prediction / classification scores 3
4 1 The need for regularization 4 ill-posed problem: high-dimensional (n p) Typically n and p We need regularization to reduce dimensions and encode practioner s priors on the weights w
5 1 Why spatial priors? 3D spatial gradient (a linear operator) : w R p ( x w, y w, z w) R p 3 penalize image grad w regions Such priors are reasonable since brain activity is spatially correlated more stable maps and more predictive than unstructured priors (e.g SVM) [Hebiri 2011, Michel 2011, Baldassare 2012, Grosenick 2013, Gramfort 2013] 5
6 1 SpaceNet SpaceNet is a family of structure + sparsity priors for regularizing the models for brain decoding. SpaceNet generalizes TV [Michel 2001], Smooth-Lasso / GraphNet [Hebiri 2011, Grosenick 2013], and TV-L1 [Baldassare 2012, Gramfort 2013]. 6
7 2 Methods 7
8 2 The SpaceNet regularized model 8 y = X w + error Optimization problem (regularized model): minimize 1 2 y Xw penalty(w) 1 2 y Xw 2 2 is the loss term, and will be different for squared-loss, logistic loss,...
9 2 The SpaceNet regularized model 9 penalty(w) = αω ρ (w), where Ω ρ (w) := ρ w 1 + (1 ρ) 1 2 w 2, for GraphNet w TV, for TV-L1... α (0 < α < + ) is total amount regularization ρ (0 < ρ 1) is a mixing constant called the l 1 -ratio ρ = 1 for Lasso
10 2 The SpaceNet regularized model penalty(w) = αω ρ (w), where Ω ρ (w) := ρ w 1 + (1 ρ) 1 2 w 2, for GraphNet w TV, for TV-L1... α (0 < α < + ) is total amount regularization ρ (0 < ρ 1) is a mixing constant called the l 1 -ratio ρ = 1 for Lasso Problem is convex, non-smooth, and heavily-ill-conditioned. 9
11 2 Interlude: zoom on ISTA-based algorithms 10 Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz; both f and g convex ISTA: O(L f /ɛ) [Daubechies 2004] Step 1: Gradient descent on f Step 2: Proximal operator of g FISTA: O(L f / ɛ) [Beck Teboulle 2009] = ISTA with a Nesterov acceleration trick!
12 2 FISTA: Implementation for TV-L1 11 Convergence time in seconds α 1e-3 l 1 ratio e e e e e e e e [DOHMATOB 2014 (PRNI)] 1e e e e e e
13 2 FISTA: Implementation for GraphNet 12 Augment X: X := [X c α,ρ ] T R (n+3p) p Xz (t) = Xz (t) + c α,ρ (z (t) ) 1. Gradient descent step (datafit term): w (t+1) z (t) γ X T ( Xz (t) y) 2. Prox step (penalty term): w (t+1) soft αργ (w (t+1) ) 3. Nesterov acceleration: z (t+1) (1 + θ (t) )w (t+1) θ (t) w (t) Bottleneck: 80% of runtime spent doing Xz (t)! We badly need speedup!
14 2 Speedup via univariate screening 13 Whereby we detect and remove irrelevant voxels before optimization problem is even entered!
15 2 X T y maps: relevant voxels stick-out 14 L R y=20 100% brain vol
16 2 X T y maps: relevant voxels stick-out 14 L R L R y=20 100% brain vol y=20 50% brain vol
17 2 X T y maps: relevant voxels stick-out 14 L R L R L R y=20 y=20 y= % brain vol 50% brain vol 20% brain vol
18 2 X T y maps: relevant voxels stick-out 14 L R L R L R y=20 y=20 y= % brain vol 50% brain vol 20% brain vol The 20% mask has the 3 bright blobs we would expect to get... but contains much less voxels less run-time
19 2 Our screening heuristic t p := pth percentile of the vector X T y. Discard jth voxel if Xj T y < t p L R L R L R y=20 k = 100% voxels y=20 k = 50% voxels y=20 k = 20% voxels Marginal screening [Lee 2014], but without the (invertibility) restriction k min(n, p) The regularization will do the rest... 15
20 2 Our screening heuristic 16 See [DOHMATOB 2015 (PRNI)] for a more detailed exposition of speedup heuristics developed.
21 2 Automatic model selection via cross-validation regularization parameters: 0 < α L <... < α 3 < α 2 < α 1 = α max mixing constants: 0 < ρ M <... < ρ 3 < ρ 2 < ρ 1 1 Thus L M grid to search over for best parameters (α 1, ρ 1 ) (α 1, ρ 2 ) (α 1, ρ 3 )... (α 1, ρ M ) (α 2, ρ 1 ) (α 2, ρ 2 ) (α 2, ρ 3 )... (α 2, ρ M ) (α 3, ρ 1 ) (α 3, ρ 2 ) (α 3, ρ 3 )... (α 3, ρ M ) (α L, ρ 1 ) (α L, ρ 2 ) (α L, ρ L )... (α L, ρ M ) 17
22 2 Automatic model selection via cross-validation 18 The final model uses average of the the per-fold best weights maps (bagging) This bagging strategy ensures more stable and robust weights maps
23 3 Some experimental results 19
24 3 Weights: SpaceNet versus SVM Faces vs objects classification on [Haxby 2001] Smooth-Lasso weights TV-L1 weights SVM weights 20
25 3 Classification scores: SpaceNet versus SVM 21
26 3 Concluding remarks 22 SpaceNet enforces both sparsity and structure, leading to better prediction / classification scores and more interpretable brain maps. The code runs in 15 minutes for simple datasets, and 30 minutes for very difficult datasets.
27 3 Concluding remarks 22 SpaceNet enforces both sparsity and structure, leading to better prediction / classification scores and more interpretable brain maps. The code runs in 15 minutes for simple datasets, and 30 minutes for very difficult datasets. In the next release, SpaceNet will feature as part of Nilearn [Abraham et al. 2014]
28 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X 23
29 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X Marginal screening: Set S = indices of top k voxels j in terms of X T j y values In [Lee 2014], k min(n, p), so that ŵ LS (X T S X S ) 1 X T S y We don t require invertibility condition k min(n, p). Our spatial regularization will do the rest! 23
30 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X Marginal screening: Set S = indices of top k voxels j in terms of X T j y values In [Lee 2014], k min(n, p), so that ŵ LS (X T S X S ) 1 X T S y We don t require invertibility condition k min(n, p). Our spatial regularization will do the rest! Lots of screening rules out there: [El Ghaoui 2010, Liu 2014, Wang 2015, Tibshirani 2010, Fercoq 2015] 23
Speeding-up model-selection in GraphNet via early-stopping and univariate feature-screening
Speeding-up model-selection in GraphNet via early-stopping and univariate feature-screening Elvis Dohmatob, Michael Eickenberg, Bertrand Thirion, Gaël Varoquaux To cite this version: Elvis Dohmatob, Michael
More informationToward a rigorous statistical framework for brain mapping
Toward a rigorous statistical framework for brain mapping Bertrand Thirion, bertrand.thirion@inria.fr 1 Cognitive neuroscience How are cognitive activities affected or controlled by neural circuits in
More informationProximal operator and methods
Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem
More informationVoxel selection algorithms for fmri
Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent
More informationSparse Screening for Exact Data Reduction
Sparse Screening for Exact Data Reduction Jieping Ye Arizona State University Joint work with Jie Wang and Jun Liu 1 wide data 2 tall data How to do exact data reduction? The model learnt from the reduced
More informationTotal Variation meets Sparsity: statistical learning with segmenting penalties
Total Variation meets Sparsity: statistical learning with segmenting penalties Michael Eickenberg, Elvis Dohmatob, Bertrand Thirion, Gaël Varoquaux To cite this version: Michael Eickenberg, Elvis Dohmatob,
More informationPattern Recognition for Neuroimaging Data
Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate
More informationBenchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging
Benchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging Elvis Dohmatob, Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux To cite this version: Elvis Dohmatob, Alexandre
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationLearning brain regions via large-scale online structured sparse dictionary-learning
Learning brain regions via large-scale online structured sparse dictionary-learning Elvis Dohmatob, Arthur Mensch, Gael Varoquaux, Bertrand Thirion firstname.lastname@inria.fr Parietal Team, INRIA / CEA,
More informationComposite Self-concordant Minimization
Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) volkan.cevher@epfl.ch Paris 6 Dec 11, 2013 joint
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationSeeking Interpretable Models for High Dimensional Data
Seeking Interpretable Models for High Dimensional Data Bin Yu Statistics Department, EECS Department University of California, Berkeley http://www.stat.berkeley.edu/~binyu Characteristics of Modern Data
More informationIdentifying predictive regions from fmri with TV-L1 prior
Identifying predictive regions from fmri with TV-L1 prior Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux To cite this version: Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux. Identifying predictive
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationSparsity Based Regularization
9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationConvex optimization algorithms for sparse and low-rank representations
Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationCan we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff
Can we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff Computer Science Department, University College London, UK Laboratory of Behavioral and Cognitive Neuroscience,
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationBehavioral Data Mining. Lecture 10 Kernel methods and SVMs
Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationConditional gradient algorithms for machine learning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationA primal-dual framework for mixtures of regularizers
A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland
More informationLecture 19: November 5
0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationAsynchronous Multi-Task Learning
Asynchronous Multi-Task Learning Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou December 14th, 2016 ICDM 2016 Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou 1 Outline 1 Introduction 2 Solving
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationParallel and Distributed Sparse Optimization Algorithms
Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationThe picasso Package for High Dimensional Regularized Sparse Learning in R
The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationConvex and Distributed Optimization. Thomas Ropars
>>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationSparse & Functional Principal Components Analysis
Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College
More informationInterpreting predictive models in terms of anatomically labelled regions
Interpreting predictive models in terms of anatomically labelled regions Data Feature extraction/selection Model specification Accuracy and significance Model interpretation 2 Data Feature extraction/selection
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationA fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography
A fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography Davood Karimi, Rabab Ward Department of Electrical and Computer Engineering University of British
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex
More informationPicasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationEffectiveness of Sparse Features: An Application of Sparse PCA
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationConditional gradient algorithms for machine learning
Conditional gradient algorithms for machine learning Zaid Harchaoui LEAR-LJK, INRIA Grenoble Anatoli Juditsky LJK, Université de Grenoble Arkadi Nemirovski Georgia Tech Abstract We consider penalized formulations
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A.
ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS Donghwan Kim and Jeffrey A. Fessler University of Michigan Dept. of Electrical Engineering and Computer Science
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationSparsity and image processing
Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition
More informationMultiVariate Bayesian (MVB) decoding of brain images
MultiVariate Bayesian (MVB) decoding of brain images Alexa Morcom Edinburgh SPM course 2015 With thanks to J. Daunizeau, K. Brodersen for slides stimulus behaviour encoding of sensorial or cognitive state?
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationFace Recognition via Sparse Representation
Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationSparse Estimation of Movie Preferences via Constrained Optimization
Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion
More informationStructured Sparsity with Group-Graph Regularization
Structured Sparsity with Group-Graph Regularization Xin-Yu Dai, Jian-Bing Zhang, Shu-Jian Huang, Jia-Jun hen, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing
More informationSparse wavelet expansions for seismic tomography: Methods and algorithms
Sparse wavelet expansions for seismic tomography: Methods and algorithms Ignace Loris Université Libre de Bruxelles International symposium on geophysical imaging with localized waves 24 28 July 2011 (Joint
More informationLocal-Aggregate Modeling for Big Data via Distributed Optimization: Applications to Neuroimaging
Biometrics 71, 905 917 December 2015 DOI: 10.1111/biom.12355 Local-Aggregate Modeling for Big Data via Distributed Optimization: Applications to Neuroimaging Yue Hu 1, * and Genevera I. Allen 2, ** 1 Department
More informationSparse coding for image classification
Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction
More informationConvex Optimization. Lijun Zhang Modification of
Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization
More informationRobust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson
Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is
More informationIterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development
Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Technical University of Lisbon PORTUGAL
More informationA Multilevel Acceleration for l 1 -regularized Logistic Regression
A Multilevel Acceleration for l 1 -regularized Logistic Regression Javier S. Turek Intel Labs 2111 NE 25th Ave. Hillsboro, OR 97124 javier.turek@intel.com Eran Treister Earth and Ocean Sciences University
More informationarxiv: v1 [stat.ap] 15 Jun 2018
Statistical Inference with Ensemble of Clustered Desparsified Lasso Jérôme-Alexis Chevalier 1,2, Joseph Salmon 2, and Bertrand Thirion 1 1 Parietal Team, Inria, CEA, Paris-Saclay University, France 2 Telecom
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationPredict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry
Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice
More informationLinear models. Subhransu Maji. CMPSCI 689: Machine Learning. 24 February February 2015
Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015 Overvie Linear models Perceptron: model and learning algorithm combined as one Is there a better ay to learn
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems Kim-Chuan Toh Sangwoon Yun Abstract The affine rank minimization problem, which consists of finding a matrix
More informationThe Benefit of Tree Sparsity in Accelerated MRI
The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationAsynchronous Multi-Task Learning
Asynchronous Multi-Task Learning Inci M. Baytas 1, Ming Yan 2, Anil K. Jain 1, and Jiayu Zhou 1 1 Department of Computer Science and Engineering 2 Department of Mathematics Michigan State University East
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationOptimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents
Optimization by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Optimization II. 2. Solving Optimization Problems III. 3. How do we Find x f(x) = 0 IV.
More informationParallel Lasso Screening for Big Data Optimization
Parallel Lasso Screening for Big Data Optimization Qingyang Li Arizona State Univ Tempe, AZ 85287 qingyangli@asuedu Paul M Thompson Univ of Southern California Los Angeles, CA 989 pthomp@uscedu Shuang
More informationDecoding analyses. Neuroimaging: Pattern Analysis 2017
Decoding analyses Neuroimaging: Pattern Analysis 2017 Scope Conceptual explanation of decoding/machine learning (ML) and related concepts; Not about mathematical basis of ML algorithms! Learning goals
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationRobust Face Recognition via Sparse Representation
Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of
More informationMULTIVARIATE ANALYSES WITH fmri DATA
MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationAn MM Algorithm for Multicategory Vertex Discriminant Analysis
An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationRegularized Tensor Factorizations & Higher-Order Principal Components Analysis
Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More information