random fourier features for kernel ridge regression: approximation bounds and statistical guarantees
|
|
- Sabina Daniels
- 5 years ago
- Views:
Transcription
1 random fourier features for kernel ridge regression: approximation bounds and statistical guarantees Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh Tel Aviv University, EPFL, and MIT. ICML
2 overview Our Contributions: 1
3 overview Our Contributions: Analyze the random Fourier features method (Rahimi Recht 07) for kernel approximation using leverage score-based techniques. 1
4 overview Our Contributions: Analyze the random Fourier features method (Rahimi Recht 07) for kernel approximation using leverage score-based techniques. Concrete: Introduce new sampling distribution that gives statistical guarantees for kernel ridge regression when used to approximate the Gaussian kernel. 1
5 overview Our Contributions: Analyze the random Fourier features method (Rahimi Recht 07) for kernel approximation using leverage score-based techniques. Concrete: Introduce new sampling distribution that gives statistical guarantees for kernel ridge regression when used to approximate the Gaussian kernel. High Level: Hope that Fourier leverage scores will have further applications in kernel approximation, function approximation, and sparse Fourier transform methods. 1
6 kernel approximation Kernel methods are expensive. 2
7 kernel approximation Kernel methods are expensive. Even writing down K requires Ω(n 2 ) time. 2
8 kernel approximation Kernel methods are expensive. Even writing down K requires Ω(n 2 ) time. Other operations require even more. A single iteration of a linear system solver takes Ω(n 2 ) time. 2
9 kernel approximation Kernel methods are expensive. Even writing down K requires Ω(n 2 ) time. Other operations require even more. A single iteration of a linear system solver takes Ω(n 2 ) time. For n = 100, 000, K has 10 billion entries. Takes 80 GB of storage if each is a double. 2
10 solution: low-rank approximation. Employ classic solution: low-rank approximation 3
11 solution: low-rank approximation. Employ classic solution: low-rank approximation Storing Z uses O(ns) space and computing ZZ T x takes O(ns) time. Orthogonalization, eigendecomposition, and pseudo-inversion of ZZ T all take just O(ns 2 ) time. 3
12 efficient low-rank approximation? Low-rank approximation is itself an expensive task. 4
13 efficient low-rank approximation? Low-rank approximation is itself an expensive task. Optimal low-rank approximation via a direct eigendecomposition, or even approximation via Krylov subspace methods are out of the question since they at least require fully forming K. 4
14 efficient low-rank approximation? Low-rank approximation is itself an expensive task. Optimal low-rank approximation via a direct eigendecomposition, or even approximation via Krylov subspace methods are out of the question since they at least require fully forming K. Many faster methods have been studied: incomplete Cholesky factorization (Fine & Scheinberg 02, Bach & Jordan 02), entrywise sampling (Achlioptas, McSherry, & Schölkopf 01), Nyström approximation (Williams & Seeger 01), random Fourier features (Rahimi & Recht 07) 4
15 efficient low-rank approximation? Low-rank approximation is itself an expensive task. Optimal low-rank approximation via a direct eigendecomposition, or even approximation via Krylov subspace methods are out of the question since they at least require fully forming K. Many faster methods have been studied: incomplete Cholesky factorization (Fine & Scheinberg 02, Bach & Jordan 02), entrywise sampling (Achlioptas, McSherry, & Schölkopf 01), Nyström approximation (Williams & Seeger 01), random Fourier features (Rahimi & Recht 07) 4
16 random fourier features Rahimi & Recht NIPS 07: 5
17 random fourier features Rahimi & Recht NIPS 07: For any shift-invariant k(x i, x j ) = k(x i x j ) let p( ) be the Fourier transform of k( ). By Bochner s theorem, p(η) 0 for all η. 5
18 random fourier features Rahimi & Recht NIPS 07: For any shift-invariant k(x i, x j ) = k(x i x j ) let p( ) be the Fourier transform of k( ). By Bochner s theorem, p(η) 0 for all η. Sample η 1,..., η s R d with probabilities proportional to p(η). 5
19 random fourier features Rahimi & Recht NIPS 07: For any shift-invariant k(x i, x j ) = k(x i x j ) let p( ) be the Fourier transform of k( ). By Bochner s theorem, p(η) 0 for all η. Sample η 1,..., η s R d with probabilities proportional to p(η). 5
20 random fourier features Rahimi & Recht NIPS 07: For any shift-invariant k(x i, x j ) = k(x i x j ) let p( ) be the Fourier transform of k( ). By Bochner s theorem, p(η) 0 for all η. Sample η 1,..., η s R d with probabilities proportional to p(η). ] Set z i = 1 s [e 2πiηT 1 x i,..., e 2πiηT s x i. 5
21 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: 6
22 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: 6
23 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: η Φ i(η)p(η)φ j (η) dη 6
24 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: η Φ i(η)p(η)φ j (η) dη = η e 2πiηT (x i x j ) p(η)dη 6
25 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: η Φ i(η)p(η)φ j (η) dη = η e 2πiηT (x i x j ) p(η)dη = k(x i x j ) 6
26 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: η Φ i(η)p(η)φ j (η) dη = η e 2πiηT (x i x j ) p(η)dη = k(x i x j ) = K i,j. 6
27 view as matrix sampling method Fourier transform k(z) = η R d p(η)e 2πiηT z dη gives: η Φ i(η)p(η)φ j (η) dη = η e 2πiηT (x i x j ) p(η)dη = k(x i x j ) = K i,j. Set Φ = ΦP 1/2. So K = Φ Φ T. 6
28 view as matrix sampling method 7
29 view as matrix sampling method Z(j) = 1 Φ(η) with probability p(η). So E[ZZ T ] = K. sp(η) 7
30 view as matrix sampling method Z(j) = 1 Φ(η) with probability p(η). So E[ZZ T ] = K. sp(η) ] z i = 1 s [e 2πiηT 1 x i,..., e 2πiηT s x i for η 1,..., η s sampled according to p(η). 7
31 view as matrix sampling method 8
32 view as matrix sampling method Z is a sample of Φ = ΦP 1/2. Columns are sampled with probability p(η), i.e., their squared column norms. 8
33 view as matrix sampling method Z is a sample of Φ = ΦP 1/2. Columns are sampled with probability p(η), i.e., their squared column norms. It is well known from work on randomized methods in linear algebra that there are better sampling probabilities (in both theory and practice): the column leverage scores. 8
34 view as matrix sampling method Z is a sample of Φ = ΦP 1/2. Columns are sampled with probability p(η), i.e., their squared column norms. It is well known from work on randomized methods in linear algebra that there are better sampling probabilities (in both theory and practice): the column leverage scores. Also noted by Bach 17, implicit in Rudi et al
35 comparision of approximation guarantees Column Norm Sampling: s = Õ(d/ɛ 2 ) samples ensure that ( ZZ T ) i,j = K i,j ± ɛ for all i, j with high probability [RR07]. 9
36 comparision of approximation guarantees Column Norm Sampling: s = Õ(d/ɛ 2 ) samples ensure that ( ZZ T ) i,j = K i,j ± ɛ for all i, j with high probability [RR07]. Ridge Leverage Score Sampling: s = Õ(s λ/ɛ 2 ) samples gives spectral approximation: (1 ɛ)(zz T + λi) K + λi (1 + ɛ)(zz T + λi). where s λ = tr(k(k + λi) 1 ) is the statistical dimension. 9
37 comparision of approximation guarantees Column Norm Sampling: s = Õ(d/ɛ 2 ) samples ensure that ( ZZ T ) i,j = K i,j ± ɛ for all i, j with high probability [RR07]. Ridge Leverage Score Sampling: s = Õ(s λ/ɛ 2 ) samples gives spectral approximation: (1 ɛ)(zz T + λi) K + λi (1 + ɛ)(zz T + λi). where s λ = tr(k(k + λi) 1 ) is the statistical dimension. Spectral approximation gives statistical guarantees for kernel ridge regression (this work), and approximation bounds for kernel PCA and k-means clustering (Cohen, Musco, Musco 16, 17) 9
38 how to leverage score sample The ridge leverage score for frequency η is given by: τ λ (η) = Φ(η) T (K + λi) 1 Φ(η). 10
39 how to leverage score sample The ridge leverage score for frequency η is given by: τ λ (η) = Φ(η) T (K + λi) 1 Φ(η). Expensive to invert K + λi. But even if you could do that efficiently, it is not at all clear you could efficiently sample from the leverage score distribution. 10
40 bounding the fourier leverage scores Main goal: Get a handle on the Fourier ridge leverage scores for common kernels by upper bounding them with simple distributions. 11
41 bounding the fourier leverage scores Main goal: Get a handle on the Fourier ridge leverage scores for common kernels by upper bounding them with simple distributions. 11
42 bounding the fourier leverage scores Main goal: Get a handle on the Fourier ridge leverage scores for common kernels by upper bounding them with simple distributions. 1. Improve random Fourier features. 2. Bound statistical dimension by the sum of leverage scores. 3. Connections with sparse Fourier transforms, Fourier interpolation, and other problems. 11
43 alternative leverage score characterization Ridge leverage score τ λ (η) = Φ(η) T (K + λi) 1 Φ(η) also equals: τ λ (η) = min y λ 1 Φy Φ(η) y
44 alternative leverage score characterization Ridge leverage score τ λ (η) = Φ(η) T (K + λi) 1 Φ(η) also equals: τ λ (η) = min y λ 1 Φy Φ(η) y 2 2. Intuition: τ λ (η) is small iff there exists a function y : R d C with low energy ( y 2 2 small) whose ( p(η) weighted) Fourier transform is close to the frequency e 2πixT j η at each data point. 12
45 alternative leverage score characterization Ridge leverage score τ λ (η) = Φ(η) T (K + λi) 1 Φ(η) also equals: τ λ (η) = min y λ 1 Φy Φ(η) y 2 2. Intuition: τ λ (η) is small iff there exists a function y : R d C with low energy ( y 2 2 small) whose ( p(η) weighted) Fourier transform is close to the frequency e 2πixT j η at each data point. y reconstructs frequency η from other frequencies. The easier it is to reconstruct, the less important it is to sample. 12
46 frequency reconstruction for bounded data Assume data points are 1-dimensional and bounded: x 1,..., x n [ δ, δ]. One possibility is to choose y with ( p(η) weighted) Fourier transform equal to e 2πixη for all x [ δ, δ]. 13
47 frequency reconstruction for bounded data Assume data points are 1-dimensional and bounded: x 1,..., x n [ δ, δ]. One possibility is to choose y with ( p(η) weighted) Fourier transform equal to e 2πixη for all x [ δ, δ]. Achieved by the shifted sinc function weighted by 1/ p(η). 13
48 frequency reconstruction for bounded data Assume data points are 1-dimensional and bounded: x 1,..., x n [ δ, δ]. One possibility is to choose y with ( p(η) weighted) Fourier transform equal to e 2πixη for all x [ δ, δ]. Achieved by the shifted sinc function weighted by 1/ p(η). pure wave with frequency η target function for bounding the leverage τ λ (η) 1.5 candidate test function that reconstructs target target frequency, η e -2πixη, real component sinc(2δ[ξ-η]) δ -2δ -δ 0 δ 2δ 3δ x -0.5 frequency, ξ η 13
49 improved test function Unfortunately the sinc function falls off too slowly. 14
50 improved test function Unfortunately the sinc function falls off too slowly. 1 For the Gaussian kernel, the e η2 /4 weighting, will grow p(η) faster than sinc(2δη) = sin(2δη) η falls off. So y 2 is unbounded. 14
51 improved test function Unfortunately the sinc function falls off too slowly. 1 For the Gaussian kernel, the e η2 /4 weighting, will grow p(η) faster than sinc(2δη) = sin(2δη) η falls off. So y 2 is unbounded. Solution: Dampen the sinc by multiplying with a Gaussian, keeping Fourier transform nearly identical. 14
52 improved test function Unfortunately the sinc function falls off too slowly. 1 For the Gaussian kernel, the e η2 /4 weighting, will grow p(η) faster than sinc(2δη) = sin(2δη) η falls off. So y 2 is unbounded. Solution: Dampen the sinc by multiplying with a Gaussian, keeping Fourier transform nearly identical. smoothed approximation to target function 1.5 sinc and gaussian components dampened test function with lower energy real component g η (ξ) = y η (ξ)p(ξ) δ -2δ -δ 0 δ 2δ 3δ x -0.5 frequency, ξ η 14
53 ultimate gaussian kernel leverage score bound Upshot: easy to sample from approximate leverage distribution for the Gaussian kernel with x 1,..., x n [ δ, δ] d : Õ(δ d ) when η log n/λ τ λ (η) p(η) = e η 2 2 /2 otherwise. relative sampling probability ridge leverage function classical random Fourier features proposed modified random Fourier features frequency 15
54 experimental results Example of approximating a synthetic wiggly function : True Function KRR Estimator Data True Function MRF Estimator CRF Estimator CRF = classic random Fourier features column norm sampling, MRF = our modified sampling distribution. 16
55 Questions? 17
Divide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Predicting With Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Predicting With Hilbert Space Embeddings 1 HSE: Gram/Kernel Matrices bc YX = 1 N bc XX = 1 N NX (y i )'(x i ) > = 1 N Y > X 2 R 1 1 i=1 NX i=1 '(x
More informationLinear Algebra for Network Loss Characterization
Linear Algebra for Network Loss Characterization David Bindel UC Berkeley, CS Division Linear Algebra fornetwork Loss Characterization p.1/23 Application: Network monitoring Set of n hosts in a large network
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationRANDOM FEATURES FOR KERNEL DEEP CONVEX NETWORK
RANDOM FEATURES FOR KERNEL DEEP CONVEX NETWORK Po-Sen Huang, Li Deng, Mark Hasegawa-Johnson, Xiaodong He Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA
More informationOutline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff
Compressed Sensing David L Donoho Presented by: Nitesh Shroff University of Maryland Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution
More informationLocality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC)
Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC) Goal We want to design a binary encoding of data such that similar data points (similarity measures
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationKernel spectral clustering: model representations, sparsity and out-of-sample extensions
Kernel spectral clustering: model representations, sparsity and out-of-sample extensions Johan Suykens and Carlos Alzate K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium
More informationImplementing Randomized Matrix Algorithms in Parallel and Distributed Environments
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments Jiyan Yang ICME, Stanford University Nov 1, 2015 INFORMS 2015, Philadephia Joint work with Xiangrui Meng (Databricks),
More informationData Mining through Spectral Analysis
Data Mining through Spectral Analysis Frank McSherry Microsoft Research, SVC mcsherry@microsoft.com A Matrix Trick Let s imagine that you secretly choose two matrices that are tall and skinny, and multiply
More informationThe Pre-Image Problem in Kernel Methods
The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1
More informationA Geometric Analysis of Subspace Clustering with Outliers
A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace
More informationData Driven Frequency Mapping for Computationally Scalable Object Detection
8th IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2011 Data Driven Frequency Mapping for Computationally Scalable Object Detection Fatih Porikli Mitsubishi Electric Research
More informationKernel Methods for Topological Data Analysis
Kernel Methods for Topological Data Analysis Toward Topological Statistics Kenji Fukumizu The Institute of Statistical Mathematics (Tokyo Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationLecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationDiffusion Wavelets for Natural Image Analysis
Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationA Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Xiaoye Sherry Li, xsli@lbl.gov Liza Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels Lawrence Berkeley National
More informationKERNEL METHODS MATCH DEEP NEURAL NETWORKS ON TIMIT
KERNEL METHODS MATCH DEEP NEURAL NETWORKS ON TIMIT Po-Sen Huang, Haim Avron, Tara N. Sainath, Vikas Sindhwani, Bhuvana Ramabhadran Department of Electrical and Computer Engineering, University of Illinois
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationSequential Learning with LS-SVM for Large-Scale Data Sets
Sequential Learning with LS-SVM for Large-Scale Data Sets Tobias Jung 1 and Daniel Polani 2 1 Department of Computer Science, University of Mainz, Germany 2 School of Computer Science, University of Hertfordshire,
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationJohnson-Lindenstrauss Lemma, Random Projection, and Applications
Johnson-Lindenstrauss Lemma, Random Projection, and Applications Hu Ding Computer Science and Engineering, Michigan State University JL-Lemma The original version: Given a set P of n points in R d, let
More informationAdvanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material
More informationA (somewhat) Unified Approach to Semisupervised and Unsupervised Learning
A (somewhat) Unified Approach to Semisupervised and Unsupervised Learning Ben Recht Center for the Mathematics of Information Caltech April 11, 2007 Joint work with Ali Rahimi (Intel Research) Overview
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationLearning Kernels with Random Features
Learning Kernels with Random Features Aman Sinha John Duchi Stanford University NIPS, 2016 Presenter: Ritambhara Singh Outline 1 Introduction Motivation Background State-of-the-art 2 Proposed Approach
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationSpectral Clustering and Community Detection in Labeled Graphs
Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationApproximate Nearest Centroid Embedding for Kernel k-means
Approximate Nearest Centroid Embedding for Kernel k-means Ahmed Elgohary, Ahmed K. Farahat, Mohamed S. Kamel, and Fakhri Karray University of Waterloo, Waterloo, Canada N2L 3G1 {aelgohary,afarahat,mkamel,karray}@uwaterloo.ca
More informationEfficient O(N log N) algorithms for scattered data interpolation
Efficient O(N log N) algorithms for scattered data interpolation Nail Gumerov University of Maryland Institute for Advanced Computer Studies Joint work with Ramani Duraiswami February Fourier Talks 2007
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationA More Efficient Approach to Large Scale Matrix Completion Problems
A More Efficient Approach to Large Scale Matrix Completion Problems Matthew Olson August 25, 2014 Abstract This paper investigates a scalable optimization procedure to the low-rank matrix completion problem
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationFrom processing to learning on graphs
From processing to learning on graphs Patrick Pérez Maths and Images in Paris IHP, 2 March 2017 Signals on graphs Natural graph: mesh, network, etc., related to a real structure, various signals can live
More informationLecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.
Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject
More informationSection 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1
Section 5 Convex Optimisation 1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 2018 page 5-1 Convex Combination Denition 5.1 A convex combination is a linear combination of points where all coecients
More informationKernels and representation
Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical
More informationKernel Methods in Machine Learning
Outline Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong Joint work with Ivor Tsang, Pakming Cheung, Andras Kocsor, Jacek Zurada, Kimo Lai November
More informationFactorization with Missing and Noisy Data
Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,
More informationNYSTRÖM SAMPLING DEPENDS ON THE EIGENSPEC-
NYSTRÖM SAMPLING DEPENDS ON THE EIGENSPEC- TRUM SHAPE OF THE DATA Djallel Bouneffouf IBM Thomas J. Watson Research Center dbouneffouf@us.ibm.com ABSTRACT Spectral clustering has shown a superior performance
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /
More informationRanking with Uncertain Labels and Its Applications
Ranking with Uncertain Labels and Its Applications Shuicheng Yan 1, Huan Wang 2, Jianzhuang Liu, Xiaoou Tang 2, and Thomas S. Huang 1 1 ECE Department, University of Illinois at Urbana Champaign, USA {scyan,
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationApproximating Gaussian Processes with H 2 -matrices
Approximating Gaussian Processes with H 2 -matrices Steffen Börm 1 and Jochen Garcke 2 1 Max Planck Institute for Mathematics in the Sciences Inselstraße 22 26, 04103 Leipzig, Germany, sbo@mis.mpg.de 2
More informationADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING
ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING Fei Yang 1 Hong Jiang 2 Zuowei Shen 3 Wei Deng 4 Dimitris Metaxas 1 1 Rutgers University 2 Bell Labs 3 National University
More informationFast Approximation of Matrix Coherence and Statistical Leverage
Fast Approximation of Matrix Coherence and Statistical Leverage Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney ) Statistical
More informationA Stochastic Optimization Approach for Unsupervised Kernel Regression
A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural
More informationLecture Notes: Constraint Optimization
Lecture Notes: Constraint Optimization Gerhard Neumann January 6, 2015 1 Constraint Optimization Problems In constraint optimization we want to maximize a function f(x) under the constraints that g i (x)
More informationSparse Coding for Learning Interpretable Spatio-Temporal Primitives
Sparse Coding for Learning Interpretable Spatio-Temporal Primitives Taehwan Kim TTI Chicago taehwan@ttic.edu Gregory Shakhnarovich TTI Chicago gregory@ttic.edu Raquel Urtasun TTI Chicago rurtasun@ttic.edu
More informationarxiv: v2 [cs.lg] 3 Aug 2016
Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising George Papageorgiou, Pantelis Bouboulis and Sergios Theodoridis arxiv:1601.00595v2 [cs.lg] 3 Aug 2016
More informationGabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs
Gabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs Benjamin Ricaud Signal Processing Lab 2, EPFL Lausanne, Switzerland SPLab, Brno, 10/2013 B. Ricaud
More informationCS261: A Second Course in Algorithms Lecture #7: Linear Programming: Introduction and Applications
CS261: A Second Course in Algorithms Lecture #7: Linear Programming: Introduction and Applications Tim Roughgarden January 26, 2016 1 Preamble With this lecture we commence the second part of the course,
More informationThe convex geometry of inverse problems
The convex geometry of inverse problems Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Joint work with Venkat Chandrasekaran Pablo Parrilo Alan Willsky Linear Inverse Problems
More informationMultimedia Computing: Algorithms, Systems, and Applications: Edge Detection
Multimedia Computing: Algorithms, Systems, and Applications: Edge Detection By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854, USA Part of the slides
More informationRandomized Clustered Nyström for Large-Scale Kernel Machines
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Randomized Clustered Nyström for Large-Scale Kernel Machines Farhad Pourkamali-Anaraki University of Colorado Boulder farhad.pourkamali@colorado.edu
More informationOnline Subspace Estimation and Tracking from Missing or Corrupted Data
Online Subspace Estimation and Tracking from Missing or Corrupted Data Laura Balzano www.ece.wisc.edu/~sunbeam Work with Benjamin Recht and Robert Nowak Subspace Representations Capture Dependencies Subspace
More informationMarkov Networks in Computer Vision. Sargur Srihari
Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationA fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]
A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song
More informationModel Reduction for Variable-Fidelity Optimization Frameworks
Model Reduction for Variable-Fidelity Optimization Frameworks Karen Willcox Aerospace Computational Design Laboratory Department of Aeronautics and Astronautics Massachusetts Institute of Technology Workshop
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationThe convex geometry of inverse problems
The convex geometry of inverse problems Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Joint work with Venkat Chandrasekaran Pablo Parrilo Alan Willsky Linear Inverse Problems
More informationHash-Based Feature Learning for Incomplete Continuous-Valued Data
Hash-Based Feature Learning for Incomplete Continuous-Valued Data Shuai Yuan Pang-Ning Tan Kendra S. Cheruvelil C. Emi Fergus Nicholas K. Skaff Patricia A. Soranno Abstract Hash-based feature learning
More informationComputer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.
Announcements Edge and Corner Detection HW3 assigned CSE252A Lecture 13 Efficient Implementation Both, the Box filter and the Gaussian filter are separable: First convolve each row of input image I with
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin 1,2 Francis Bach 1,2 Jean Ponce 2 1 INRIA 2 Ecole Normale Supérieure January 15, 2010 Introduction Introduction problem: dividing simultaneously
More informationAnalyzing Stochastic Gradient Descent for Some Non- Convex Problems
Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University
More informationLTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning
Outline LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Guy Lebanon Motivation Introduction Previous Work Committee:John Lafferty Geoff Gordon, Michael I. Jordan, Larry Wasserman
More informationEdge and corner detection
Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements
More informationA Weighted Kernel PCA Approach to Graph-Based Image Segmentation
A Weighted Kernel PCA Approach to Graph-Based Image Segmentation Carlos Alzate Johan A. K. Suykens ESAT-SCD-SISTA Katholieke Universiteit Leuven Leuven, Belgium January 25, 2007 International Conference
More informationProblem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation
6.098/6.882 Computational Photography 1 Problem Set 4 Assigned: March 23, 2006 Due: April 17, 2006 Problem 1 (6.882) Belief Propagation for Segmentation In this problem you will set-up a Markov Random
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationGRAPH LEARNING UNDER SPARSITY PRIORS. Hermina Petric Maretic, Dorina Thanou and Pascal Frossard
GRAPH LEARNING UNDER SPARSITY PRIORS Hermina Petric Maretic, Dorina Thanou and Pascal Frossard Signal Processing Laboratory (LTS4), EPFL, Switzerland ABSTRACT Graph signals offer a very generic and natural
More informationCommunication Efficient Distributed Kernel Principal Component Analysis
Communication Efficient Distributed Kernel Principal Component Analysis Maria-Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cs.cmu.edu David Woodruff Almaden Research Center
More informationOne-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationarxiv: v4 [cs.lg] 13 Feb 2016
Communication Efficient Distributed Kernel Principal Component Analysis Maria-Florina Balcan, Yingyu Liang, Le Song, David Woodruff, Bo Xie arxiv:1503.06858v4 [cs.lg] 13 Feb 016 Abstract Kernel Principal
More informationSubsampling for Ridge Regression via Regularized Volume Sampling
Subsampling for Ridge Regression via Regularized Volume Sampling Michał Dereziński Department of Computer Science University of California Santa Cruz mderezin@ucsc.edu Manfred K. Warmuth Department of
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationA Local Learning Approach for Clustering
A Local Learning Approach for Clustering Mingrui Wu, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics 72076 Tübingen, Germany {mingrui.wu, bernhard.schoelkopf}@tuebingen.mpg.de Abstract
More informationParallel Hybrid Monte Carlo Algorithms for Matrix Computations
Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University
More informationNumerical Analysis I - Final Exam Matrikelnummer:
Dr. Behrens Center for Mathematical Sciences Technische Universität München Winter Term 2005/2006 Name: Numerical Analysis I - Final Exam Matrikelnummer: I agree to the publication of the results of this
More informationBlind Identification of Graph Filters with Multiple Sparse Inputs
Blind Identification of Graph Filters with Multiple Sparse Inputs Santiago Segarra, Antonio G. Marques, Gonzalo Mateos & Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania
More informationSeismic data interpolation beyond aliasing using regularized nonstationary autoregression a
Seismic data interpolation beyond aliasing using regularized nonstationary autoregression a a Published in Geophysics, 76, V69-V77, (2011) Yang Liu, Sergey Fomel ABSTRACT Seismic data are often inadequately
More informationAn Empirical Comparison of Sampling Techniques for Matrix Column Subset Selection
An Empirical Comparison of Sampling Techniques for Matrix Column Subset Selection Yining Wang and Aarti Singh Machine Learning Department, Carnegie Mellonn University {yiningwa,aartisingh}@cs.cmu.edu Abstract
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationSpectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,
Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4
More informationModern Signal Processing and Sparse Coding
Modern Signal Processing and Sparse Coding School of Electrical and Computer Engineering Georgia Institute of Technology March 22 2011 Reason D etre Modern signal processing Signals without cosines? Sparse
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationIncremental Learning of Robot Dynamics using Random Features
Incremental Learning of Robot Dynamics using Random Features Arjan Gijsberts, Giorgio Metta Cognitive Humanoids Laboratory Dept. of Robotics, Brain and Cognitive Sciences Italian Institute of Technology
More information