Adaptive Dropout Training for SVMs
|
|
- Madeline Price
- 5 years ago
- Views:
Transcription
1 Department of Computer Science and Technology Adaptive Dropout Training for SVMs Jun Zhu Joint with Ning Chen, Jingwei Zhuo, Jianfei Chen, Bo Zhang Tsinghua University ShanghaiTech Symposium on Data Science, June 23-26, 2015
2 Outline Overfitting in Big Data Dropout training for SVMs Adaptive dropout rates Big learning with Bayesian methods Conclusions 1
3 Overfitting Bias-variance tradeoff risk variance / estimation error bias / approximation error complexity of function class 2
4 Overfitting in Big Data Big Model + Big Data + Big/Super Cluster Big ML 9 layers sparse autoencoder with: -local receptive fields to scale up; - local L2 pooling and local contrast normalization for invariant features - 1B parameters (connections) - 10M 200x200 images - train with 1K machines (16K cores) for 3 days [Le et al., 2012] -able to build high-level concepts, e.g., cat faces and human bodies -15.8% accuracy in recognizing 22K objects (70% relative improvements) 3
5 Overfitting in Big Data Relevant information grows slower than linear Model capacity may grow faster than the amount of relevant information!
6 Overfitting in Big Data Relevant information grows slower than linear (Bialek et al., 2001)
7 Overfitting in Big Data Regularization to prevent overfitting is increasingly important, rather than increasingly irrelevant! Increasing attention, e.g., dropout training (Hinton, 2012) More theoretical understanding and extensions MCF (van der Maaten et al., 2013); Logistic-loss (Wager et al., 2013); Generalization error (Wager et al., 2014) Dropout SVM (Chen et al., 2014); Adaptive Dropout (Zhuo et al., 2015) 6
8 Amazon Reviews Classification Positive or Negative? I love the deeper meaning behind this movie. I had watched it years ago when it first came out but never understood it til now. Great spillberg film One of the best sci-fi /adventure movies I have ever seen. Great movie about robots and ones yearning to know ones creator. The ending will stick in your mind forever in a good way. a massive, manipulative tear jerker which did nothing to illuminate me on the subjects of love or parenting or relationship or science or sibling rivalry. stock characters Very long. Boring stretches made this movie hard to finish 7
9 Amazon Reviews Classification Positive or Negative? Regularized Empirical Risk Minimization: Instead of regularizing parameters, we incorporate knowledge directly from data to do regularization? Regularization by corrupting data 8
10 Regularization by Corruptions Original Features Corrupted Features Corrupted Features I love the deeper meaning behind this movie. I had watched it years ago when it first came out but never understood it til now. Great spillberg film One of the best sci-fi /adventure movies I have ever seen. Great movie about robots and ones yearning to know ones creator. The ending will stick in your mind forever in a good way. a massive, manipulative tear jerker which did nothing to illuminate me on the subjects of love or parenting or relationship or science or sibling rivalry. stock characters I love the deeper meaning behind this movie. I had watched it years ago when it first came out but never understood it til now. Great spillberg film One of the best sci-fi /adventure movies I have ever seen. Great movie about robots and ones yearning to know ones creator. The ending will stick in your mind forever in a good way. a massive, manipulative tear jerker which did nothing to illuminate me on the subjects of love or parenting or relationship or science or sibling rivalry. stock characters I love the deeper meaning behind this movie. I had watched it years ago when it first came out but never understood it til now. Great spillberg film One of the best sci-fi /adventure movies I have ever seen. Great movie about robots and ones yearning to know ones creator. The ending will stick in your mind forever in a good way. a massive, manipulative tear jerker which did nothing to illuminate me on the subjects of love or parenting or relationship or science or sibling rivalry. stock characters Very long. Boring stretches made this movie hard to finish Very long. Boring stretches made this movie hard to finish Very long. Boring stretches made this movie hard to finish 9
11 Feature Noising Models Define a label-invariant corrupting distribution, assume the corruption is independent across features Very long. Boring stretches made this movie hard to finish x x d Very long. Boring stretches made this movie hard to finish ~x d ~x d D Corrupting distributions: Dropout Gaussian Laplace Poisson 10
12 Feature Noising Models Feature noising models control over-fitting! (Hinton et.al, 2012) Augment the data size by corrupting the given training examples with a fixed noise distribution. Very long. Boring stretches made this movie hard to finish Very long. Boring stretches made this movie hard to finish Explicit Corruption Downside: gets computationally prohibitive, unless... 11
13 Feature Noising Models Feature noising models control over-fitting! (Hinton et.al, 2012) Augment the data size by corrupting the given training examples with a fixed noise distribution. Very long. Boring stretches made this movie hard to finish Very long. Boring stretches made this movie hard to finish Explicit Corruption Downside: gets computationally prohibitive, unless Implicit Corruption (MCF) 12
14 Regularization by corrupting data Feature noising as minimizing the expected loss functions Theoretical understanding L2-regularization for additive Gaussian noise (Bishop, 1995) Adaptive regularization for dropout LR (Wager et. al., 2013) Generalization bound (Wager et al., 2014) Empirical results in various applications Document classification (van der Maaten et. al., 2013); Entity recognition (Wang et.al., 2013); Image classification (Wang & Manning, 2013). Expected Losses Quadratic loss (Bishop, 1995) Exponential loss (van der Maaten et. al., 2013) Logistic loss (van der Maaten et. al., 2013; Wager et al., 2013) 13
15 Losses in Machine Learning 14
16 Dropout Training for Support Vector Machines Explicit corruption for SVM (Burges & Scholkopf, 1997) One technical challenge is the non-smoothness of the hinge loss makes it hard to compute Intractable! Our work: Develop an iteratively re-weighted least square (IRLS) algorithm to minimize a variational bound; Apply ideas to develop IRLS for dropout Logistic regression; Derive an adaptive learning rule to decide the noise level [Chen et al., AAAI 2014; Zhuo et al., IJCAI 2015] 15
17 16 Variational bound with Data Augmentation Theorem: Let, and be the pseudo-likelihood of response variables for sample n. The pseudo-likelihood can be expressed as a scale-location mixture of Gaussians where is a generalized inverse Gaussian variable, proof follows (Polson & Scott, 2011) with some careful treatments.
18 Variational bound with Data Augmentation A variational bound with data augmentation The expected hinge loss is: (c is the regularization parameter) Using the ideas of data augmentation Variational upper bound 17
19 Iteratively Re-weighted Least Square Algorithm Variational optimization problem Coordinate Descent (variational EM) For (i.e., E-step) 18
20 Iteratively Re-weighted Least Square Algorithm Coordinate Descent (variational EM) For (i.e., M-step) A re-weighted least square problem under feature noising adaptive weights h n := E q [ 1 n ]= Reduces to the square loss by setting 1 c p E p [³ 2 n] re-weighted label yn h = (`+ 1 )y n c n h n = 1 c ; and ` = 0 19
21 20 Variational bound for Expected Logistic Loss The expected logistic loss function Theorem: Let, and be the pseudo-likelihood of response variables where, and is a Polya-Gamma variable, proof follows (Polson & Scott, 2012) with some careful treatments.
22 Variational bound for Expected Logistic Loss Variational optimization problem Coordinate Descent (variational EM) For (i.e., E-step) a Polya-Gamma distribution (Polson et al., 2012) 21
23 Iteratively Re-weighted Least Square Algorithm Coordinate Descent (variational EM) For (i.e., M-step) A re-weighted least square problem under feature noising l n := 1 c E q[ n ]= adaptive weights Reduces to the square loss by setting 1 2 p E p [! n 2 ] 1 + e ep E p [! 2 n ] 1 p E p [! 2 n ] l n = c 2 re-weighted label y l n = c 2 n y n 22
24 23 Comparison of logistic / hinge loss under IRLS Comparison of hinge & logistic loss under the IRLS framework. Hinge Parameter Parameter c Update Update Logistic -- c ` ` n y n c n h yn h n l yn l Both losses iteratively minimize the expectation of a reweighted quadratic loss, but differ in the update rules of the weights and the labels at each iteration. Quadratic loss is a special case with a single iteration
25 Experiment Compare Dropout-SVM and Dropout-Logistic with state-ofart models: MCF-logistic, MCF-quadratic All our predictors use L2-regularization, with parameters set by cross-validation Single-parameter noising model: x d x d d ~x d D ~x d D 24
26 Better 25 Experiment 1: Review Classification (P / N) No Corruption No Corruption No Corruption No Corruption
27 Experiment 1: Review Classification (P / N) Comparing explicit and implicit dropout corruption (Amazon Books; hinge loss): 26
28 Experiment 2: Nightmare at test time In some settings, features may be randomly unobserved at test time We experiment with this nightmare at test time scenario on MNIST digits: Train regular dropout classifiers on the original training set Randomly delete features from the test images, and measure classification error 27
29 Better 28 Experiment 2: Nightmare at test time Classification error on test images with randomly deleted features: More test corruption
30 Adaptive Dropout Rates A Bayesian feature noising model p( i j ) = ~x i = x i ± i Y D d= 1 (1 d) id 1 id x id ~x id Allows various dimensions to have different dropout rates d Automatically infer the dropout rates (non-informative prior) d id D d N ^ d = P N i= 1 I(y iµ d x id < 0) P N i= 1 I(x id 6= 0) Group-wise structure among features is allowed [Zhuo, Zhu, & Zhang, 2015] 29
31 30 Adaptive Dropout Rates Some results on Amazon books and kitchen review data: Adaptive rates can improve the performance
32 Big Learning with Bayesian Methods Why Bayes? Robust to overfitting Flexible in modeling Avoid (heavy) parameter tuning Generic algorithms to do inference Why not Bayes? Computationally too slow Not scalable to big data Good news: Much recent progress on scalable Bayesian methods 31
33 Big Learning with Bayesian Methods Stochastic/Online Methods Variational, MCMC Distributed Methods Variational, MCMC Data-Parallel Graph-Parallel Model-Parallel master server map reduce slave client 32
34 Big Learning with Bayesian Methods Online/Stochastic Learning Online Bayesian PA (Shi & Zhu, ICML 2014) Stochastic subgradient MCMC (Hu et al., arxiv: , 2015) Deep generative models (Li et al., arxiv: , 2015; Du et al., arxiv: , 2015) 33
35 Big Learning with Bayesian Methods Online/Stochastic Learning Online Bayesian PA (Shi & Zhu, ICML 2014) Stochastic subgradient MCMC (Hu et al., arxiv: , 2015) Deep generative models (Li et al., arxiv: , 2015; Du et al., arxiv: , 2015) 34
36 Big Learning with Bayesian Methods Online/Stochastic Learning Online Bayesian PA (Shi & Zhu, ICML 2014) Stochastic subgradient MCMC (Hu et al., arxiv: , 2015) Deep generative models (Li et al., arxiv: , 2015; Du et al., arxiv: , 2015) 35
37 Big Learning with Bayesian Methods Online/Stochastic Learning Online Bayesian PA (Shi & Zhu, ICML 2014) Stochastic subgradient MCMC (Hu et al., arxiv: , 2015) Deep generative models (Li et al., arxiv: , 2015; Du et al., arxiv: , 2015) Distributed Learning Distributed Bayesian Inference (Xu et al., NIPS 2014) Scalable topic graph learning (Chen et al., NIPS 2013) Scalable dynamic LDA (Bhadury et al., 2015, preprint) A comprehensive survey Big Learning with Bayesian Methods, Zhu et al., arxiv:
38 Conclusions Feature noising controls over-fitting Dropout training for SVMs with an iteratively re-weighted least square (IRLS) algorithm Apply ideas to develop IRLS for dropout Logistic regression Adaptive update rule for dropout levels Future Work Kernel trick in dropout learning Dropout-SVM in deep architectures Big learning with Bayesian methods 37
39 Department of Computer Science and Technology Thank you!
Adaptive Dropout Rates for Learning with Corrupted Features
Adaptive Dropout Rates for Learning with Corrupted Features Jingwei Zhuo, Jun Zhu, Bo Zhang Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing
More informationAdaptive Dropout Rates for Learning with Corrupted Features
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Adaptive Dropout Rates for Learning with Corrupted Features Jingwei Zhuo, Jun Zhu, Bo Zhang Dept.
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationDeep Generative Models and a Probabilistic Programming Library
Deep Generative Models and a Probabilistic Programming Library Discriminative (Deep) Learning Learn a (differentiable) function mapping from input to output x f(x; θ) y Gradient back-propagation Generative
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationLearning with Marginalized Corrupted Features and Labels Together
Learning with Marginalized Corrupted Features and Labels Together Yingming Li, Ming Yang, Zenglin Xu, and Zhongfei (Mark) Zhang School of Computer Science and Engineering, Big Data Research Center University
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationCase Study 1: Estimating Click Probabilities
Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationVariational Methods for Discrete-Data Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationSupplementary material for: BO-HB: Robust and Efficient Hyperparameter Optimization at Scale
Supplementary material for: BO-: Robust and Efficient Hyperparameter Optimization at Scale Stefan Falkner 1 Aaron Klein 1 Frank Hutter 1 A. Available Software To promote reproducible science and enable
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationMore Data, Less Work: Runtime as a decreasing function of data set size. Nati Srebro. Toyota Technological Institute Chicago
More Data, Less Work: Runtime as a decreasing function of data set size Nati Srebro Toyota Technological Institute Chicago Outline we are here SVM speculations, other problems Clustering wild speculations,
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationEfficient Deep Learning Optimization Methods
11-785/ Spring 2019/ Recitation 3 Efficient Deep Learning Optimization Methods Josh Moavenzadeh, Kai Hu, and Cody Smith Outline 1 Review of optimization 2 Optimization practice 3 Training tips in PyTorch
More informationChallenges motivating deep learning. Sargur N. Srihari
Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationDynamic Bayesian network (DBN)
Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationA Fast Learning Algorithm for Deep Belief Nets
A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationRecognition Tools: Support Vector Machines
CS 2770: Computer Vision Recognition Tools: Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh January 12, 2017 Announcement TA office hours: Tuesday 4pm-6pm Wednesday 10am-12pm Matlab
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationMatrix Computations and " Neural Networks in Spark
Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More informationMachine Learning Basics. Sargur N. Srihari
Machine Learning Basics Sargur N. srihari@cedar.buffalo.edu 1 Overview Deep learning is a specific type of ML Necessary to have a solid understanding of the basic principles of ML 2 Topics Stochastic Gradient
More informationGraphGAN: Graph Representation Learning with Generative Adversarial Nets
The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationDeep Learning for Program Analysis. Lili Mou January, 2016
Deep Learning for Program Analysis Lili Mou January, 2016 Outline Introduction Background Deep Neural Networks Real-Valued Representation Learning Our Models Building Program Vector Representations for
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationHidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017
Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationCAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification
CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationEnergy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt
Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationNetwork embedding. Cheng Zheng
Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationDeep Learning & Neural Networks
Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Umar Syed (Google Research)
Deep Boosting Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Deep Boosting Essence
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationAll You Want To Know About CNNs. Yukun Zhu
All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image
More informationGlobal Optimality in Neural Network Training
Global Optimality in Neural Network Training Benjamin D. Haeffele and René Vidal Johns Hopkins University, Center for Imaging Science. Baltimore, USA Questions in Deep Learning Architecture Design Optimization
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationOn Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution
ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,
REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More information