Divide and Conquer Kernel Ridge Regression
|
|
- Zoe Palmer
- 5 years ago
- Views:
Transcription
1 Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
2 Problem set-up Goal Solve the following problem: minimize E[(f(x) y) 2 ] subject to f H where (x, y) is sampled from joint distribution P, and H is a Reproducing Kernel Hilbert Space (RKHS). Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
3 Problem set-up Goal Solve the following problem: minimize E[(f(x) y) 2 ] subject to f H where (x, y) is sampled from joint distribution P, and H is a Reproducing Kernel Hilbert Space (RKHS). H is defined by a kernel function k : X X R. Formally: H = {f : f = α i k(x i, ), x i X }. i=1 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
4 Kernel trick review 1 Linear regression f(x) = θ T x only fits linear problems. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
5 Kernel trick review 1 Linear regression f(x) = θ T x only fits linear problems. 2 For non-linear problems, the trick is to map x onto a high-dimension feature space x φ(x), then learn a model f(x) = θ T φ(x), so that f is a non-linear function of x. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
6 Kernel trick review 3 Usually, φ(x) is a infinite-dimensional vector or it is expensive to compute. We can reformulate the learning algorithm such that the input vector enters only in the form of inner product k(x, x ) = φ(x) T φ(x ). Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
7 Kernel trick review 3 Usually, φ(x) is a infinite-dimensional vector or it is expensive to compute. We can reformulate the learning algorithm such that the input vector enters only in the form of inner product k(x, x ) = φ(x) T φ(x ). 4 k is called the kernel function and should be easy to compute. Examples: Polynomial kernel: k(x, x ) = (1 + x T x ) d. Gaussian kernel: k(x, x ) = exp ( x x 2 2σ ). 2 Sobolev kernel in R 1 : k(x, x ) = 1 + min(x, x ). Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
8 Kernel ridge regression Given N samples (x 1, y 1 ),..., (x N, y N ), we want to compute the empirical minimizer ˆf = argmin f H 1 N N (f(x i ) y i ) 2 + λ f 2 H i=1 as an estimate to f = argmin f H E[(f(x) y) 2 ]. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
9 Kernel ridge regression Given N samples (x 1, y 1 ),..., (x N, y N ), we want to compute the empirical minimizer ˆf = argmin f H 1 N N (f(x i ) y i ) 2 + λ f 2 H i=1 as an estimate to f = argmin f H E[(f(x) y) 2 ]. This minimization problem has a closed-form solution: ˆf = N α i k(x i, ), where α = (K + λni) 1 y. i=1 K is the N N kernel matrix defined by K ij = k(x i, x j ). Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
10 Think about large datasets The matrix inversion α = (K + λni) 1 y takes O(N 3 ) time and O(N 2 ) memory space, which can be prohibitively expensive when N is large. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
11 Think about large datasets The matrix inversion α = (K + λni) 1 y takes O(N 3 ) time and O(N 2 ) memory space, which can be prohibitively expensive when N is large. Fast approaches to compute kernel ridge regression: 1 Low-rank matrix approximation: Kernel PCA. Incomplete Cholesky decomposition Nystrom sampling. 2 Iterative optimization algorithm: Gradient descent. Conjugate gradient methods. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
12 Think about large datasets The matrix inversion α = (K + λni) 1 y takes O(N 3 ) time and O(N 2 ) memory space, which can be prohibitively expensive when N is large. Fast approaches to compute kernel ridge regression: 1 Low-rank matrix approximation: Kernel PCA. Incomplete Cholesky decomposition Nystrom sampling. 2 Iterative optimization algorithm: Gradient descent. Conjugate gradient methods. However, none of these method can be shown achieving the same level of accuracy as the exact algorithm does. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
13 Our main idea Only keep the diagonal blocks, so that the matrix inversion is fast. K = K 11 K 12 K 13 K 14 K 15 K 16 K 21 K 22 K 23 K 24 K 25 K 26 K 31 K 32 K 33 K 34 K 35 K 36 K 41 K 42 K 43 K 44 K 45 K 46 K 51 K 52 K 53 K 54 K 55 K 56 K 61 K 62 K 63 K 64 K 65 K 66 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
14 Our main idea Only keep the diagonal blocks, so that the matrix inversion is fast. K = K 33 K 36 K 34 K 32 K 31 K 35 K 63 K 66 K 64 K 62 K 61 K 65 K 43 K 46 K 44 K 42 K 41 K 45 K 23 K 26 K 24 K 22 K 21 K 25 K 13 K 16 K 14 K 12 K 11 K 15 K 53 K 56 K 54 K 52 K 61 K 55 Random Shuffle K 11 K 12 K 13 K 14 K 15 K 16 K 21 K 22 K 23 K 24 K 25 K 26 K 31 K 32 K 33 K 34 K 35 K 36 K 41 K 42 K 43 K 44 K 45 K 46 K 51 K 52 K 53 K 54 K 55 K 56 K 61 K 62 K 63 K 64 K 65 K 66 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
15 Our main idea Only keep the diagonal blocks, so that the matrix inversion is fast. K = K 11 K 12 K 13 K 14 K 15 K 16 K 21 K 22 K 23 K 24 K 25 K 26 K 31 K 32 K 33 K 34 K 35 K 36 K 41 K 42 K 43 K 44 K 45 K 46 K 51 K 52 K 53 K 54 K 55 K 56 K 61 K 62 K 63 K 64 K 65 K 66 K 33 K 36 K 34 K 32 K 31 K 35 K 63 K 66 K 64 K 62 K 61 K 65 K 43 K 46 K 44 K 42 K 41 K 45 K 23 K 26 K 24 K 22 K 21 K 25 K 13 K 16 K 14 K 12 K 11 K 15 K 53 K 56 K 54 K 52 K 61 K 55 Random Shuffle K 33 K K 63 K K 44 K K 24 K K 11 K K 61 K 55 Block Diagonalize Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
16 Fast kernel ridge regression (Fast-KRR) We propose a divide-and-conquer approach: 1 Divide the set of samples {(x 1, y 1 ),..., (x N, y N )} evenly and uniformly at randomly into the m disjoint subsets: S 1,..., S m X R. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
17 Fast kernel ridge regression (Fast-KRR) We propose a divide-and-conquer approach: 1 Divide the set of samples {(x 1, y 1 ),..., (x N, y N )} evenly and uniformly at randomly into the m disjoint subsets: S 1,..., S m X R. 2 For each i = 1, 2,..., m, compute the local KRR estimate { 1 ˆf i := argmin f H S i (f(x) y) 2 (x,y) S i }{{} local risk + λ f 2 H }{{} under-regularized }. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
18 Fast kernel ridge regression (Fast-KRR) We propose a divide-and-conquer approach: 1 Divide the set of samples {(x 1, y 1 ),..., (x N, y N )} evenly and uniformly at randomly into the m disjoint subsets: S 1,..., S m X R. 2 For each i = 1, 2,..., m, compute the local KRR estimate { 1 ˆf i := argmin f H S i (f(x) y) 2 (x,y) S i }{{} local risk + λ f 2 H }{{} under-regularized 3 Average together the local estimates and output f = 1 m m i=1 ˆf i. }. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
19 Fast kernel ridge regression (Fast-KRR) We propose a divide-and-conquer approach: 1 Divide the set of samples {(x 1, y 1 ),..., (x N, y N )} evenly and uniformly at randomly into the m disjoint subsets: S 1,..., S m X R. 2 For each i = 1, 2,..., m, compute the local KRR estimate { 1 ˆf i := argmin f H S i (f(x) y) 2 (x,y) S i }{{} local risk + λ f 2 H }{{} under-regularized 3 Average together the local estimates and output f = 1 m m i=1 ˆf i. Computation time: O(N 3 /m 2 ); memory space: O(N 2 /m 2 ). }. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
20 Theoretical result Theorem With m splits, Fast-KRR achieves the mean square error: E[ f ( f 2 2] C λ f 2 H }{{} squared bias + γ(λ) ) + T (λ, m) } N{{} variance Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
21 Theoretical result Theorem With m splits, Fast-KRR achieves the mean square error: E[ f ( f 2 2] C λ f 2 H }{{} squared bias + γ(λ) ) + T (λ, m) } N{{} variance λ f 2 H is the bias introduced by regularization. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
22 Theoretical result Theorem With m splits, Fast-KRR achieves the mean square error: E[ f ( f 2 2] C λ f 2 H }{{} squared bias + γ(λ) ) + T (λ, m) } N{{} variance λ f 2 H is the bias introduced by regularization. T (λ, m) becomes a higher-order negligible term when m is below the threshold m N/γ 2 (λ). Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
23 Theoretical result Theorem With m splits, Fast-KRR achieves the mean square error: E[ f ( f 2 2] C λ f 2 H }{{} squared bias + γ(λ) ) + T (λ, m) } N{{} variance λ f 2 H is the bias introduced by regularization. T (λ, m) becomes a higher-order negligible term when m is below the threshold m N/γ 2 (λ). γ(λ) is the effective dimensionality: let µ 1 µ 2... be the sequence of eigenvalues in kernel k s eigen-expansion, then γ(λ) = µ k /(λ + µ k ). k=1 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
24 Theoretical result Theorem With m splits, Fast-KRR achieves the mean square error: E[ f ( f 2 2] C λ f 2 H }{{} squared bias + γ(λ) ) + T (λ, m) } N{{} variance λ f 2 H is the bias introduced by regularization. T (λ, m) becomes a higher-order negligible term when m is below the threshold m N/γ 2 (λ). γ(λ) is the effective dimensionality: let µ 1 µ 2... be the sequence of eigenvalues in kernel k s eigen-expansion, then γ(λ) = µ k /(λ + µ k ). k=1 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
25 Apply to specific kernels Corollary For polynomial kernel if m cn/ log N then ( ) E[ f 1 f 2 2] = O (minimax optimal rate) N Time: O(N 3 ) O(N log 2 N) Space: O(N 2 ) O(log 2 N) Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
26 Apply to specific kernels Corollary For polynomial kernel if m cn/ log N then ( ) E[ f 1 f 2 2] = O (minimax optimal rate) N Time: O(N 3 ) O(N log 2 N) Space: O(N 2 ) O(log 2 N) Corollary For Gaussian kernel, if m cn/log 2 N then ( ) E[ f log N f 2 2] = O (minimax optimal rate) N Time: O(N 3 ) O(N log 4 N) Space: O(N 2 ) O(log 4 N) Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
27 Apply to specific kernels Corollary For Sobolev kernel of smoothness ν, if m cn 2ν 1 2ν+1 / log N then E[ f ) f 2 2] = O (N 2ν 2ν+1 (minimax optimal rate) Time: O(N 3 ) O(N 2ν+5 2ν+1 log 2 N) Space: O(N 2 ) O(N 4 2ν+1 log 2 N) Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
28 Simulation Study Data (x, y) is generated by y = min(x, 1 x) + ε where ε N(0, 0.2) y x Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
29 Compare Fast-KRR and exact KRR We use a Sobolev kernel of smoothness-1 to fit the data. m=1 m=4 m=16 m=64 Mean square error Total number of samples (N) Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
30 Compare Fast-KRR and exact KRR We use a Sobolev kernel of smoothness-1 to fit the data. m=1 m=4 m=16 m=64 Mean square error Total number of samples (N) Fast-KRR s performance is very close to exact KRR for m 16. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
31 Threshold for data partitioning Mean square error is plotted for varied choices of m. Mean square error N=256 N=512 N=1024 N=2048 N=4096 N= log(# of partitions)/log(# of samples) Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
32 Threshold for data partitioning Mean square error is plotted for varied choices of m. Mean square error N=256 N=512 N=1024 N=2048 N=4096 N= log(# of partitions)/log(# of samples) As long as m N 0.45, the accuracy is not hurt. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
33 Conclusion We propose a divide-and-conquer approach for kernel ridge regression that leads to substantial reduction in computation time and memory space. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
34 Conclusion We propose a divide-and-conquer approach for kernel ridge regression that leads to substantial reduction in computation time and memory space. The proposed algorithm archives the optimal convergence rate for the full sample size N, as long as the partition number m is not too large. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
35 Conclusion We propose a divide-and-conquer approach for kernel ridge regression that leads to substantial reduction in computation time and memory space. The proposed algorithm archives the optimal convergence rate for the full sample size N, as long as the partition number m is not too large. As concrete examples, our theory guarantees that m may grow polynomially in N for Sobolev spaces, and grow nearly linearly in N for finite-rank kernels and Gaussian kernels. Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT / 15
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationKernels and representation
Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationChapter 5: Basis Expansion and Regularization
Chapter 5: Basis Expansion and Regularization DD3364 April 1, 2012 Introduction Main idea Moving beyond linearity Augment the vector of inputs X with additional variables. These are transformations of
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationrandom fourier features for kernel ridge regression: approximation bounds and statistical guarantees
random fourier features for kernel ridge regression: approximation bounds and statistical guarantees Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh Tel
More informationTransductive Learning: Motivation, Model, Algorithms
Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal
More informationMachine Learning Basics. Sargur N. Srihari
Machine Learning Basics Sargur N. srihari@cedar.buffalo.edu 1 Overview Deep learning is a specific type of ML Necessary to have a solid understanding of the basic principles of ML 2 Topics Stochastic Gradient
More informationSupport Vector Machines
Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationSVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1
SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationThe Pre-Image Problem in Kernel Methods
The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationLast time... Bias-Variance decomposition. This week
Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional
More informationPlease write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.
CS 189 Spring 016 Introduction to Machine Learning Final Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your two-page cheat sheet. Electronic
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationTree-GP: A Scalable Bayesian Global Numerical Optimization algorithm
Utrecht University Department of Information and Computing Sciences Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm February 2015 Author Gerben van Veenendaal ICA-3470792 Supervisor
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationA Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Xiaoye Sherry Li, xsli@lbl.gov Liza Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels Lawrence Berkeley National
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationIntroduction to Optimization
Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationMachine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationConflict Graphs for Parallel Stochastic Gradient Descent
Conflict Graphs for Parallel Stochastic Gradient Descent Darshan Thaker*, Guneet Singh Dhillon* Abstract We present various methods for inducing a conflict graph in order to effectively parallelize Pegasos.
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationAdvanced Machine Learning Practical 1: Manifold Learning (PCA and Kernel PCA)
Advanced Machine Learning Practical : Manifold Learning (PCA and Kernel PCA) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch, nadia.figueroafernandez@epfl.ch
More informationMULTI-DIMENSIONAL MONTE CARLO INTEGRATION
CS580: Computer Graphics KAIST School of Computing Chapter 3 MULTI-DIMENSIONAL MONTE CARLO INTEGRATION 2 1 Monte Carlo Integration This describes a simple technique for the numerical evaluation of integrals
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationSupport Vector Regression with ANOVA Decomposition Kernels
Support Vector Regression with ANOVA Decomposition Kernels Mark O. Stitson, Alex Gammerman, Vladimir Vapnik, Volodya Vovk, Chris Watkins, Jason Weston Technical Report CSD-TR-97- November 7, 1997!()+,
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationLearning from High Dimensional fmri Data using Random Projections
Learning from High Dimensional fmri Data using Random Projections Author: Madhu Advani December 16, 011 Introduction The term the Curse of Dimensionality refers to the difficulty of organizing and applying
More informationKernel Spectral Clustering
Kernel Spectral Clustering Ilaria Giulini Université Paris Diderot joint work with Olivier Catoni introduction clustering is task of grouping objects into classes (clusters) according to their similarities
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationA Weighted Kernel PCA Approach to Graph-Based Image Segmentation
A Weighted Kernel PCA Approach to Graph-Based Image Segmentation Carlos Alzate Johan A. K. Suykens ESAT-SCD-SISTA Katholieke Universiteit Leuven Leuven, Belgium January 25, 2007 International Conference
More informationBasis Functions. Volker Tresp Summer 2016
Basis Functions Volker Tresp Summer 2016 1 I am an AI optimist. We ve got a lot of work in machine learning, which is sort of the polite term for AI nowadays because it got so broad that it s not that
More informationConvex Optimization / Homework 2, due Oct 3
Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a
More informationMachine Learning (CSE 446): Unsupervised Learning
Machine Learning (CSE 446): Unsupervised Learning Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 19 Announcements HW2 posted. Due Feb 1. It is long. Start this week! Today:
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationMachine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationOnline Learning. Lorenzo Rosasco MIT, L. Rosasco Online Learning
Online Learning Lorenzo Rosasco MIT, 9.520 About this class Goal To introduce theory and algorithms for online learning. Plan Different views on online learning From batch to online least squares Other
More informationImage Analysis - Lecture 5
Texture Segmentation Clustering Review Image Analysis - Lecture 5 Texture and Segmentation Magnus Oskarsson Lecture 5 Texture Segmentation Clustering Review Contents Texture Textons Filter Banks Gabor
More informationLSML19: Introduction to Large Scale Machine Learning
LSML19: Introduction to Large Scale Machine Learning MINES ParisTech March 2019 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practical
More informationPositive Definite Kernel Functions on Fuzzy Sets
Positive Definite Kernel Functions on Fuzzy Sets FUZZ 2014 Jorge Guevara Díaz 1 Roberto Hirata Jr. 1 Stéphane Canu 2 1 Institute of Mathematics and Statistics University of Sao Paulo-Brazil 2 Institut
More informationLearning Kernels with Random Features
Learning Kernels with Random Features Aman Sinha John Duchi Stanford University NIPS, 2016 Presenter: Ritambhara Singh Outline 1 Introduction Motivation Background State-of-the-art 2 Proposed Approach
More informationHyperparameters and Validation Sets. Sargur N. Srihari
Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationA Polygonal Line Algorithm for Constructing Principal Curves
A Polygonal Line Algorithm for Constructing s Balázs Kégl, Adam Krzyżak Dept. of Computer Science Concordia University 45 de Maisonneuve Blvd. W. Montreal, Canada H3G M8 keglcs.concordia.ca krzyzakcs.concordia.ca
More informationCPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015
CPSC 340: Machine Learning and Data Mining Robust Regression Fall 2015 Admin Can you see Assignment 1 grades on UBC connect? Auditors, don t worry about it. You should already be working on Assignment
More informationFeature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22
Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationSecond Order SMO Improves SVM Online and Active Learning
Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms
More informationTransductive and Inductive Methods for Approximate Gaussian Process Regression
Transductive and Inductive Methods for Approximate Gaussian Process Regression Anton Schwaighofer 1,2 1 TU Graz, Institute for Theoretical Computer Science Inffeldgasse 16b, 8010 Graz, Austria http://www.igi.tugraz.at/aschwaig
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationAutoencoder Using Kernel Method
2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Autoencoder Using Kernel Method Yan Pei Computer Science Division University of
More informationSupport Vector Machines
Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationKernel spectral clustering: model representations, sparsity and out-of-sample extensions
Kernel spectral clustering: model representations, sparsity and out-of-sample extensions Johan Suykens and Carlos Alzate K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium
More informationClustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015
Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Predicting With Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Predicting With Hilbert Space Embeddings 1 HSE: Gram/Kernel Matrices bc YX = 1 N bc XX = 1 N NX (y i )'(x i ) > = 1 N Y > X 2 R 1 1 i=1 NX i=1 '(x
More informationConvex or non-convex: which is better?
Sparsity Amplified Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York March 7 / 4 Convex or non-convex: which is better? (for sparse-regularized
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:
More informationmodel order p weights The solution to this optimization problem is obtained by solving the linear system
CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples
More informationLab 2: Support vector machines
Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2
More informationChap.12 Kernel methods [Book, Chap.7]
Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first
More informationStanford University. A Distributed Solver for Kernalized SVM
Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More information