x 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20

Size: px
Start display at page:

Download "x 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20"

Transcription

1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) L R 75 x y= x 1

2 1 Introducing the model 2

3 1 Brain decoding We are given: n = # scans; p = number of voxels in mask design matrix: X R n p (brain images) response vector: y R n (external covariates) Need to predict y on new data. Linear model assumption: y X w We seek to estimate the weights map, w that ensures best prediction / classification scores 3

4 1 The need for regularization 4 ill-posed problem: high-dimensional (n p) Typically n and p We need regularization to reduce dimensions and encode practioner s priors on the weights w

5 1 Why spatial priors? 3D spatial gradient (a linear operator) : w R p ( x w, y w, z w) R p 3 penalize image grad w regions Such priors are reasonable since brain activity is spatially correlated more stable maps and more predictive than unstructured priors (e.g SVM) [Hebiri 2011, Michel 2011, Baldassare 2012, Grosenick 2013, Gramfort 2013] 5

6 1 SpaceNet SpaceNet is a family of structure + sparsity priors for regularizing the models for brain decoding. SpaceNet generalizes TV [Michel 2001], Smooth-Lasso / GraphNet [Hebiri 2011, Grosenick 2013], and TV-L1 [Baldassare 2012, Gramfort 2013]. 6

7 2 Methods 7

8 2 The SpaceNet regularized model 8 y = X w + error Optimization problem (regularized model): minimize 1 2 y Xw penalty(w) 1 2 y Xw 2 2 is the loss term, and will be different for squared-loss, logistic loss,...

9 2 The SpaceNet regularized model 9 penalty(w) = αω ρ (w), where Ω ρ (w) := ρ w 1 + (1 ρ) 1 2 w 2, for GraphNet w TV, for TV-L1... α (0 < α < + ) is total amount regularization ρ (0 < ρ 1) is a mixing constant called the l 1 -ratio ρ = 1 for Lasso

10 2 The SpaceNet regularized model penalty(w) = αω ρ (w), where Ω ρ (w) := ρ w 1 + (1 ρ) 1 2 w 2, for GraphNet w TV, for TV-L1... α (0 < α < + ) is total amount regularization ρ (0 < ρ 1) is a mixing constant called the l 1 -ratio ρ = 1 for Lasso Problem is convex, non-smooth, and heavily-ill-conditioned. 9

11 2 Interlude: zoom on ISTA-based algorithms 10 Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz; both f and g convex ISTA: O(L f /ɛ) [Daubechies 2004] Step 1: Gradient descent on f Step 2: Proximal operator of g FISTA: O(L f / ɛ) [Beck Teboulle 2009] = ISTA with a Nesterov acceleration trick!

12 2 FISTA: Implementation for TV-L1 11 Convergence time in seconds α 1e-3 l 1 ratio e e e e e e e e [DOHMATOB 2014 (PRNI)] 1e e e e e e

13 2 FISTA: Implementation for GraphNet 12 Augment X: X := [X c α,ρ ] T R (n+3p) p Xz (t) = Xz (t) + c α,ρ (z (t) ) 1. Gradient descent step (datafit term): w (t+1) z (t) γ X T ( Xz (t) y) 2. Prox step (penalty term): w (t+1) soft αργ (w (t+1) ) 3. Nesterov acceleration: z (t+1) (1 + θ (t) )w (t+1) θ (t) w (t) Bottleneck: 80% of runtime spent doing Xz (t)! We badly need speedup!

14 2 Speedup via univariate screening 13 Whereby we detect and remove irrelevant voxels before optimization problem is even entered!

15 2 X T y maps: relevant voxels stick-out 14 L R y=20 100% brain vol

16 2 X T y maps: relevant voxels stick-out 14 L R L R y=20 100% brain vol y=20 50% brain vol

17 2 X T y maps: relevant voxels stick-out 14 L R L R L R y=20 y=20 y= % brain vol 50% brain vol 20% brain vol

18 2 X T y maps: relevant voxels stick-out 14 L R L R L R y=20 y=20 y= % brain vol 50% brain vol 20% brain vol The 20% mask has the 3 bright blobs we would expect to get... but contains much less voxels less run-time

19 2 Our screening heuristic t p := pth percentile of the vector X T y. Discard jth voxel if Xj T y < t p L R L R L R y=20 k = 100% voxels y=20 k = 50% voxels y=20 k = 20% voxels Marginal screening [Lee 2014], but without the (invertibility) restriction k min(n, p) The regularization will do the rest... 15

20 2 Our screening heuristic 16 See [DOHMATOB 2015 (PRNI)] for a more detailed exposition of speedup heuristics developed.

21 2 Automatic model selection via cross-validation regularization parameters: 0 < α L <... < α 3 < α 2 < α 1 = α max mixing constants: 0 < ρ M <... < ρ 3 < ρ 2 < ρ 1 1 Thus L M grid to search over for best parameters (α 1, ρ 1 ) (α 1, ρ 2 ) (α 1, ρ 3 )... (α 1, ρ M ) (α 2, ρ 1 ) (α 2, ρ 2 ) (α 2, ρ 3 )... (α 2, ρ M ) (α 3, ρ 1 ) (α 3, ρ 2 ) (α 3, ρ 3 )... (α 3, ρ M ) (α L, ρ 1 ) (α L, ρ 2 ) (α L, ρ L )... (α L, ρ M ) 17

22 2 Automatic model selection via cross-validation 18 The final model uses average of the the per-fold best weights maps (bagging) This bagging strategy ensures more stable and robust weights maps

23 3 Some experimental results 19

24 3 Weights: SpaceNet versus SVM Faces vs objects classification on [Haxby 2001] Smooth-Lasso weights TV-L1 weights SVM weights 20

25 3 Classification scores: SpaceNet versus SVM 21

26 3 Concluding remarks 22 SpaceNet enforces both sparsity and structure, leading to better prediction / classification scores and more interpretable brain maps. The code runs in 15 minutes for simple datasets, and 30 minutes for very difficult datasets.

27 3 Concluding remarks 22 SpaceNet enforces both sparsity and structure, leading to better prediction / classification scores and more interpretable brain maps. The code runs in 15 minutes for simple datasets, and 30 minutes for very difficult datasets. In the next release, SpaceNet will feature as part of Nilearn [Abraham et al. 2014]

28 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X 23

29 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X Marginal screening: Set S = indices of top k voxels j in terms of X T j y values In [Lee 2014], k min(n, p), so that ŵ LS (X T S X S ) 1 X T S y We don t require invertibility condition k min(n, p). Our spatial regularization will do the rest! 23

30 3 Why X T y maps give a good relevance measure? In an orthogonal design, least-squares solution is ŵ LS = (X T X) 1 X T y = (I) 1 X T y = X T y (intuition) X T y bears some info on optimal solution even for general X Marginal screening: Set S = indices of top k voxels j in terms of X T j y values In [Lee 2014], k min(n, p), so that ŵ LS (X T S X S ) 1 X T S y We don t require invertibility condition k min(n, p). Our spatial regularization will do the rest! Lots of screening rules out there: [El Ghaoui 2010, Liu 2014, Wang 2015, Tibshirani 2010, Fercoq 2015] 23

Speeding-up model-selection in GraphNet via early-stopping and univariate feature-screening

Speeding-up model-selection in GraphNet via early-stopping and univariate feature-screening Speeding-up model-selection in GraphNet via early-stopping and univariate feature-screening Elvis Dohmatob, Michael Eickenberg, Bertrand Thirion, Gaël Varoquaux To cite this version: Elvis Dohmatob, Michael

More information

Toward a rigorous statistical framework for brain mapping

Toward a rigorous statistical framework for brain mapping Toward a rigorous statistical framework for brain mapping Bertrand Thirion, bertrand.thirion@inria.fr 1 Cognitive neuroscience How are cognitive activities affected or controlled by neural circuits in

More information

Proximal operator and methods

Proximal operator and methods Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

Sparse Screening for Exact Data Reduction

Sparse Screening for Exact Data Reduction Sparse Screening for Exact Data Reduction Jieping Ye Arizona State University Joint work with Jie Wang and Jun Liu 1 wide data 2 tall data How to do exact data reduction? The model learnt from the reduced

More information

Total Variation meets Sparsity: statistical learning with segmenting penalties

Total Variation meets Sparsity: statistical learning with segmenting penalties Total Variation meets Sparsity: statistical learning with segmenting penalties Michael Eickenberg, Elvis Dohmatob, Bertrand Thirion, Gaël Varoquaux To cite this version: Michael Eickenberg, Elvis Dohmatob,

More information

Pattern Recognition for Neuroimaging Data

Pattern Recognition for Neuroimaging Data Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate

More information

Benchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging

Benchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging Benchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging Elvis Dohmatob, Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux To cite this version: Elvis Dohmatob, Alexandre

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

Learning brain regions via large-scale online structured sparse dictionary-learning

Learning brain regions via large-scale online structured sparse dictionary-learning Learning brain regions via large-scale online structured sparse dictionary-learning Elvis Dohmatob, Arthur Mensch, Gael Varoquaux, Bertrand Thirion firstname.lastname@inria.fr Parietal Team, INRIA / CEA,

More information

Composite Self-concordant Minimization

Composite Self-concordant Minimization Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) volkan.cevher@epfl.ch Paris 6 Dec 11, 2013 joint

More information

Learning with infinitely many features

Learning with infinitely many features Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example

More information

Seeking Interpretable Models for High Dimensional Data

Seeking Interpretable Models for High Dimensional Data Seeking Interpretable Models for High Dimensional Data Bin Yu Statistics Department, EECS Department University of California, Berkeley http://www.stat.berkeley.edu/~binyu Characteristics of Modern Data

More information

Identifying predictive regions from fmri with TV-L1 prior

Identifying predictive regions from fmri with TV-L1 prior Identifying predictive regions from fmri with TV-L1 prior Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux To cite this version: Alexandre Gramfort, Bertrand Thirion, Gaël Varoquaux. Identifying predictive

More information

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department

More information

Sparsity Based Regularization

Sparsity Based Regularization 9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Convex optimization algorithms for sparse and low-rank representations

Convex optimization algorithms for sparse and low-rank representations Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Can we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff

Can we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff Can we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff Computer Science Department, University College London, UK Laboratory of Behavioral and Cognitive Neuroscience,

More information

Optimization for Machine Learning

Optimization for Machine Learning with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Conditional gradient algorithms for machine learning

Conditional gradient algorithms for machine learning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

A primal-dual framework for mixtures of regularizers

A primal-dual framework for mixtures of regularizers A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland

More information

Lecture 19: November 5

Lecture 19: November 5 0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Asynchronous Multi-Task Learning

Asynchronous Multi-Task Learning Asynchronous Multi-Task Learning Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou December 14th, 2016 ICDM 2016 Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou 1 Outline 1 Introduction 2 Solving

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Parallel and Distributed Sparse Optimization Algorithms

Parallel and Distributed Sparse Optimization Algorithms Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

The picasso Package for High Dimensional Regularized Sparse Learning in R

The picasso Package for High Dimensional Regularized Sparse Learning in R The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Convex and Distributed Optimization. Thomas Ropars

Convex and Distributed Optimization. Thomas Ropars >>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and

More information

scikit-learn (Machine Learning in Python)

scikit-learn (Machine Learning in Python) scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize

More information

Sparse & Functional Principal Components Analysis

Sparse & Functional Principal Components Analysis Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College

More information

Interpreting predictive models in terms of anatomically labelled regions

Interpreting predictive models in terms of anatomically labelled regions Interpreting predictive models in terms of anatomically labelled regions Data Feature extraction/selection Model specification Accuracy and significance Model interpretation 2 Data Feature extraction/selection

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

A fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography

A fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography A fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography Davood Karimi, Rabab Ward Department of Electrical and Computer Engineering University of British

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex

More information

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Effectiveness of Sparse Features: An Application of Sparse PCA

Effectiveness of Sparse Features: An Application of Sparse PCA 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Conditional gradient algorithms for machine learning

Conditional gradient algorithms for machine learning Conditional gradient algorithms for machine learning Zaid Harchaoui LEAR-LJK, INRIA Grenoble Anatoli Juditsky LJK, Université de Grenoble Arkadi Nemirovski Georgia Tech Abstract We consider penalized formulations

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A.

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A. ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS Donghwan Kim and Jeffrey A. Fessler University of Michigan Dept. of Electrical Engineering and Computer Science

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Sparsity and image processing

Sparsity and image processing Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition

More information

MultiVariate Bayesian (MVB) decoding of brain images

MultiVariate Bayesian (MVB) decoding of brain images MultiVariate Bayesian (MVB) decoding of brain images Alexa Morcom Edinburgh SPM course 2015 With thanks to J. Daunizeau, K. Brodersen for slides stimulus behaviour encoding of sensorial or cognitive state?

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Face Recognition via Sparse Representation

Face Recognition via Sparse Representation Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Sparse Estimation of Movie Preferences via Constrained Optimization

Sparse Estimation of Movie Preferences via Constrained Optimization Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion

More information

Structured Sparsity with Group-Graph Regularization

Structured Sparsity with Group-Graph Regularization Structured Sparsity with Group-Graph Regularization Xin-Yu Dai, Jian-Bing Zhang, Shu-Jian Huang, Jia-Jun hen, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing

More information

Sparse wavelet expansions for seismic tomography: Methods and algorithms

Sparse wavelet expansions for seismic tomography: Methods and algorithms Sparse wavelet expansions for seismic tomography: Methods and algorithms Ignace Loris Université Libre de Bruxelles International symposium on geophysical imaging with localized waves 24 28 July 2011 (Joint

More information

Local-Aggregate Modeling for Big Data via Distributed Optimization: Applications to Neuroimaging

Local-Aggregate Modeling for Big Data via Distributed Optimization: Applications to Neuroimaging Biometrics 71, 905 917 December 2015 DOI: 10.1111/biom.12355 Local-Aggregate Modeling for Big Data via Distributed Optimization: Applications to Neuroimaging Yue Hu 1, * and Genevera I. Allen 2, ** 1 Department

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Convex Optimization. Lijun Zhang Modification of

Convex Optimization. Lijun Zhang   Modification of Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization

More information

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

More information

Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development

Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Technical University of Lisbon PORTUGAL

More information

A Multilevel Acceleration for l 1 -regularized Logistic Regression

A Multilevel Acceleration for l 1 -regularized Logistic Regression A Multilevel Acceleration for l 1 -regularized Logistic Regression Javier S. Turek Intel Labs 2111 NE 25th Ave. Hillsboro, OR 97124 javier.turek@intel.com Eran Treister Earth and Ocean Sciences University

More information

arxiv: v1 [stat.ap] 15 Jun 2018

arxiv: v1 [stat.ap] 15 Jun 2018 Statistical Inference with Ensemble of Clustered Desparsified Lasso Jérôme-Alexis Chevalier 1,2, Joseph Salmon 2, and Bertrand Thirion 1 1 Parietal Team, Inria, CEA, Paris-Saclay University, France 2 Telecom

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice

More information

Linear models. Subhransu Maji. CMPSCI 689: Machine Learning. 24 February February 2015

Linear models. Subhransu Maji. CMPSCI 689: Machine Learning. 24 February February 2015 Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015 Overvie Linear models Perceptron: model and learning algorithm combined as one Is there a better ay to learn

More information

An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems Kim-Chuan Toh Sangwoon Yun Abstract The affine rank minimization problem, which consists of finding a matrix

More information

The Benefit of Tree Sparsity in Accelerated MRI

The Benefit of Tree Sparsity in Accelerated MRI The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Asynchronous Multi-Task Learning

Asynchronous Multi-Task Learning Asynchronous Multi-Task Learning Inci M. Baytas 1, Ming Yan 2, Anil K. Jain 1, and Jiayu Zhou 1 1 Department of Computer Science and Engineering 2 Department of Mathematics Michigan State University East

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab  POSTECH. Table of Contents Optimization by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Optimization II. 2. Solving Optimization Problems III. 3. How do we Find x f(x) = 0 IV.

More information

Parallel Lasso Screening for Big Data Optimization

Parallel Lasso Screening for Big Data Optimization Parallel Lasso Screening for Big Data Optimization Qingyang Li Arizona State Univ Tempe, AZ 85287 qingyangli@asuedu Paul M Thompson Univ of Southern California Los Angeles, CA 989 pthomp@uscedu Shuang

More information

Decoding analyses. Neuroimaging: Pattern Analysis 2017

Decoding analyses. Neuroimaging: Pattern Analysis 2017 Decoding analyses Neuroimaging: Pattern Analysis 2017 Scope Conceptual explanation of decoding/machine learning (ML) and related concepts; Not about mathematical basis of ML algorithms! Learning goals

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

MULTIVARIATE ANALYSES WITH fmri DATA

MULTIVARIATE ANALYSES WITH fmri DATA MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning

More information

Logistic Regression: Probabilistic Interpretation

Logistic Regression: Probabilistic Interpretation Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

An MM Algorithm for Multicategory Vertex Discriminant Analysis

An MM Algorithm for Multicategory Vertex Discriminant Analysis An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information