Bayesian Optimization Nando de Freitas
|
|
- Cynthia Farmer
- 5 years ago
- Views:
Transcription
1 Bayesian Optimization Nando de Freitas Eric Brochu, Ruben Martinez-Cantin, Matt Hoffman, Ziyu Wang,
2 More resources This talk follows the presentation in the following review paper available fro Oxford website.
3 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
4 Black-box optimization / design
5
6 Automatic machine learning [Hoffman, Shahriari & df, 2013]
7 Information extraction / concept learning I love silence of the lambs. It s a scary movie. The confit de canard was delish! Expert tuning by Hector Garcia-Molina et al: Towards The Web Of Concepts: Extracting Concepts from Large Datasets [Ziyu Wang et al, 2014]
8 Automatic (Adaptive) Monte Carlo samplers [Wang, Mohamed & df, 2013]
9 Analytics, dynamic creative content and A/B testing [Steve Scott on Bayesian bandits at Google]
10 animation session target [Brochu, Ghosh & df, Brochu, Brochu, df, 2010] Winner of the SRC competition - SIGGRAPH
11 Tuning NP hard problem solvers lpsolve is a mixed integer programming solver, downloaded over 40,000 times last year. 47 discrete parameters (choices)
12 [Denil et al 2012]
13 GP Policy for tracking Digits Experiment: Face Experiment: 13
14 Hierarchical reinforcement learning High-level model-based learning for deciding when to navigate, park, pickup and dropoff passengers. Mid-level active path learning for navigating a topological map. Low-level active policy optimizer to learn control of continuous non-linear vehicle dynamics.
15 Active Path Finding in Middle Level Navigate policy generates sequence of waypoints on a topological map to navigate from a location to a destination.
16 Low-Level: Trajectory following TORCS: 3D game engine that implements complex vehicle dynamics complete with manual and automatic transmission, engine, clutch, tire, and suspension models.
17 Bayesian optimization was used to find the neural net low-level policy and the value function for the upper levels
18 Many other applications Robotics, control, reinforcement learning, SAT solvers, scheduling, planning Configuration of ad-centers, compilers, hardware, software Programming by optimization
19 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
20 Bayesian parametric optimization
21 Beta-Bernoulli Bayesian optimization Consider the following K-armed bandit setting. money!
22 Beta-Bernoulli Bayesian optimization Specify a Beta prior on each arm:
23 Beta-Bernoulli Bayesian optimization
24 Thompson sampling At each iteration, draw a sample from the poster Pick the action of highest expected return
25 Thompson sampling
26 Linear bandits Introduce correlations among the arms Use standard conjugate Normal Inverse Gamma p
27 Linear bandits The posterior mean and covariance are analytical Once again, it is straightforward to do Thompson
28 Being linear in features It is trivial to extend the linear model using feature Features can be RBFs, sinusoids Or even popular deep networks
29 One reason for Bayes The predictive distribution at a test point x is: Z p(yjx; D; ¾ 2) = N (yjxµ; ¾ 2)N (µjµn ; V n )dµ = N (yjxµn ; ¾ 2 + xv n xt ) The variance in this prediction, ¾n2 (x), depends on two terms: the servation noise with variance ¾ 2, and the parameter uncertainty xv this latter term varies depending on how close x is to the training D. The error bars get larger as we move away from the training po representing increased uncertainty. By contrast, the plugin approximation has constant sized error bars, Z ^ ¾ 2) p(yjx; D; ¾ 2) ¼ N (yjxµ; ¾ 2)±µ^(µ)dµ = p(yjxµ;
30 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
31 From linear models to Gaussian processes
32 Gaussian processes
33 Gaussian processes
34 GP log marginal likelihood And GP hyper-parameters
35 Trees for regression [Criminisi et al, 2011]
36 Regression trees [Criminisi et al, 2011]
37 Regression forests [Criminisi et al, 2011]
38 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
39 Bayesian experimental design Uses the principle of maximum expected utility: Example 1: active learning Example 2: sequential decision making with discou
40 Some acquisition functions
41 Thompson sampling with GPs Posterior over location of the minimum: Thompson: Mechanism:
42 Randomized Thompson sampling Bochner s lemma tell us that: With sampling and this lemma, we can construct a feature based approximation that is amenable to computing. Then do linear Bayesian optimization with sinuso
43 Entropy search and variants Posterior over location of the minimum: Utility: minimize uncertainty of location of minimum
44 The choice of utility in practice [Hoffman, Shahriari & df, 2013]
45 The choice of utility in practice
46 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
47 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
48 Hyper-parameters and robustness Learning the hyper-parameters of the GP is important. The tuning method must be automatic! One of the best ways to manage the GP hyper-parameters is to integrate them out, e.g. with slice sampling as in Spearmint. But this is still dangerous!
49 Hyper-parameters and robustness
50 Spearmint Slice sampling Ziyu Wang & NdF 2014 Theory bounds
51 Spearmint Slice sampling Ziyu Wang & NdF 2014 Theory bounds
52 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
53 Analysis of Bayes Opt Proposition 1 (Variance Bound) : Theorem 2 (Vanishing regret) : [df, Zoghi & Smola, 2012]
54
55 Multi-scale optimistic optimization [Remi Munos - SOO, UCT] [Ziyu Wang et al, 2014]
56 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
57 Conditional parameters A big problem in deep learning (see Torch 7)
58 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
59 Input warping [Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams,
60 Treed BayesOpt [Assael, Wang and NdF Rob Gramacy has may papers using trees and GPs ]
61
62 Treed BayesOpt
63 Treed BayesOpt
64 Treed BayesOpt Aggregate to deal with paucity of data in leaves to estimate the hyper-parameters
65 Warping + trees
66 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
67 Parallelization Talk to David Ginsbourger. Essentially, augment the observations for finished runs with predicted observations for unfinished runs. E.g.,
68 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
69 Constraints and cost sensitivity There are many approaches (Gramacy, Snoek, ). Could use a GP with binary observations h(.) that indicate the probability of the constraint being satisfied: Often trivial scaling is used to deal with time.
70 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
71 Random Embedding Bayesian Optimization [Wang, Zoghi, Matheson, Hutter & df, IJCAI 2013 Distinguished paper award]
72
73 Scaling to over a billion dimensions
74 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
75 Multi-task Define a kernel over tasks (ouputs) and inputs. E. Can also do BayesOpt with many parametric approaches to contextual / multi-task regression. E.g. context could be features of
76 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
77 Freeze-thaw learning curve prediction [Kevin Swersky et al, See also work of David Ginsbourger and Frank Hutter]
78 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
79 Unknown optimization boundary [Bobak Shahriari et al, 2015
80 Unknown optimization boundary [Bobak Shahriari et al, 2015
81 Outline Some applications Parametric Bayesian optimization Beta-Bernoulli bandit models Linear bandit models Neural network and other feature-based models Non-Parametric Bayesian optimization Gaussian processes Random forests Acquisition functions A huge bag of problems Hyper-parameters and robustness Optimizing acquisition functions Conditional spaces Non-stationarity Parallelization Constraints and cost sensitivity High-dimensions Multi-task / context Freeze-thaw / early stopping Unknown optimization regions Empirical hardness models and variants
82 Empirical Hardness Models Formally, given a set of particular problem instances, p P, and an algorithm, a A (with free-parameters), the objective of performance prediction is to build a model that predicts the performance of a when run on an arbitrary p P. See work of Kevin Leyton-Brown and Holger Hoos.
83 Thank you
arxiv: v1 [cs.lg] 29 May 2014
BayesOpt: A Bayesian Optimization Library for Nonlinear Optimization, Experimental Design and Bandits arxiv:1405.7430v1 [cs.lg] 29 May 2014 Ruben Martinez-Cantin rmcantin@unizar.es May 30, 2014 Abstract
More informationOverview on Automatic Tuning of Hyperparameters
Overview on Automatic Tuning of Hyperparameters Alexander Fonarev http://newo.su 20.02.2016 Outline Introduction to the problem and examples Introduction to Bayesian optimization Overview of surrogate
More informationAdvancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down
Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down Kim P. Wabersich Machine Learning and Robotics Lab University of Stuttgart, Germany wabersich@kimpeter.de
More informationBayesian Optimization for Parameter Selection of Random Forests Based Text Classifier
Bayesian Optimization for Parameter Selection of Random Forests Based Text Classifier 1 2 3 4 Anonymous Author(s) Affiliation Address email 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
More information2. Blackbox hyperparameter optimization and AutoML
AutoML 2017. Automatic Selection, Configuration & Composition of ML Algorithms. at ECML PKDD 2017, Skopje. 2. Blackbox hyperparameter optimization and AutoML Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin
More informationEfficient Hyper-parameter Optimization for NLP Applications
Efficient Hyper-parameter Optimization for NLP Applications Lidan Wang 1, Minwei Feng 1, Bowen Zhou 1, Bing Xiang 1, Sridhar Mahadevan 2,1 1 IBM Watson, T. J. Watson Research Center, NY 2 College of Information
More informationSequential Model-based Optimization for General Algorithm Configuration
Sequential Model-based Optimization for General Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia LION 5, Rome January 18, 2011 Motivation Most optimization
More informationStructure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels
Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels Dhanesh Ramachandram 1, Michal Lisicki 1, Timothy J. Shields, Mohamed R. Amer and Graham W. Taylor1 1- Machine Learning
More informationGPflowOpt: A Bayesian Optimization Library using TensorFlow
GPflowOpt: A Bayesian Optimization Library using TensorFlow Nicolas Knudde Joachim van der Herten Tom Dhaene Ivo Couckuyt Ghent University imec IDLab Department of Information Technology {nicolas.knudde,
More informationTowards efficient Bayesian Optimization for Big Data
Towards efficient Bayesian Optimization for Big Data Aaron Klein 1 Simon Bartels Stefan Falkner 1 Philipp Hennig Frank Hutter 1 1 Department of Computer Science University of Freiburg, Germany {kleinaa,sfalkner,fh}@cs.uni-freiburg.de
More informationGaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017
Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationInfinite dimensional optimistic optimisation with applications on physical systems
Infinite dimensional optimistic optimisation with applications on physical systems Muhammad F. Kasim & Peter A. Norreys University of Oxford Oxford, OX1 3RH {m.kasim1, peter.norreys}@physics.ox.ac.uk Abstract
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationEfficient Transfer Learning Method for Automatic Hyperparameter Tuning
Efficient Transfer Learning Method for Automatic Hyperparameter Tuning Dani Yogatama Carnegie Mellon University Pittsburgh, PA 1513, USA dyogatama@cs.cmu.edu Gideon Mann Google Research New York, NY 111,
More informationBayesian Multi-Scale Optimistic Optimization
Ziyu Wang Babak Shakibi Lin Jin Nando de Freitas University of Oxford University of British Columbia Rocket Gaming Systems University of Oxford Abstract Bayesian optimization is a powerful global optimization
More informationPerformance Prediction and Automated Tuning of Randomized and Parametric Algorithms
Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms Frank Hutter 1, Youssef Hamadi 2, Holger Hoos 1, and Kevin Leyton-Brown 1 1 University of British Columbia, Vancouver,
More informationThink Globally, Act Locally: a Local Strategy for Bayesian Optimization
Think Globally, Act Locally: a Local Strategy for Bayesian Optimization Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytic (PraDa), Deakin University
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationScalable Meta-Learning for Bayesian Optimization using Ranking-Weighted Gaussian Process Ensembles
ICML 2018 AutoML Workshop Scalable Meta-Learning for Bayesian Optimization using Ranking-Weighted Gaussian Process Ensembles Matthias Feurer University of Freiburg Benjamin Letham Facebook Eytan Bakshy
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationSpeed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization
Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization Daniel Beck Adrià de Gispert Gonzalo Iglesias Aurelien Waite Bill Byrne Department of Computer Science, University
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationarxiv: v1 [cs.lg] 31 Mar 2016
arxiv:1603.09441v1 [cs.lg] 31 Mar 2016 Ian Dewancker Michael McCourt Scott Clark Patrick Hayes Alexandra Johnson George Ke SigOpt, San Francisco, CA 94108 USA Abstract Empirical analysis serves as an important
More informationarxiv: v1 [stat.ml] 14 Aug 2015
Unbounded Bayesian Optimization via Regularization arxiv:1508.03666v1 [stat.ml] 14 Aug 2015 Bobak Shahriari University of British Columbia bshahr@cs.ubc.ca Alexandre Bouchard-Côté University of British
More informationLearning to Choose Instance-Specific Macro Operators
Learning to Choose Instance-Specific Macro Operators Maher Alhossaini Department of Computer Science University of Toronto Abstract The acquisition and use of macro actions has been shown to be effective
More informationLearning to Transfer Initializations for Bayesian Hyperparameter Optimization
Learning to Transfer Initializations for Bayesian Hyperparameter Optimization Jungtaek Kim, Saehoon Kim, and Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and
More informationLearning from Data: Adaptive Basis Functions
Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.
More informationProbabilistic Robotics
Probabilistic Robotics Sebastian Thrun Wolfram Burgard Dieter Fox The MIT Press Cambridge, Massachusetts London, England Preface xvii Acknowledgments xix I Basics 1 1 Introduction 3 1.1 Uncertainty in
More informationarxiv: v4 [stat.ml] 22 Apr 2018
The Parallel Knowledge Gradient Method for Batch Bayesian Optimization arxiv:1606.04414v4 [stat.ml] 22 Apr 2018 Jian Wu, Peter I. Frazier Cornell University Ithaca, NY, 14853 {jw926, pf98}@cornell.edu
More informationTime Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel
More informationL10. PARTICLE FILTERING CONTINUED. NA568 Mobile Robotics: Methods & Algorithms
L10. PARTICLE FILTERING CONTINUED NA568 Mobile Robotics: Methods & Algorithms Gaussian Filters The Kalman filter and its variants can only model (unimodal) Gaussian distributions Courtesy: K. Arras Motivation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationProbabilistic Robotics
Probabilistic Robotics Bayes Filter Implementations Discrete filters, Particle filters Piecewise Constant Representation of belief 2 Discrete Bayes Filter Algorithm 1. Algorithm Discrete_Bayes_filter(
More informationPSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.
PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning
More informationModel learning for robot control: a survey
Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationBoosting as a Metaphor for Algorithm Design
Boosting as a Metaphor for Algorithm Design Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, and Yoav Shoham {kevinlb;eugnud;galand;jmcf;shoham}@cs.stanford.edu Stanford University, Stanford
More informationarxiv: v2 [stat.ml] 15 May 2018
Mark McLeod 1 Michael A. Osborne 1 2 Stephen J. Roberts 1 2 arxiv:1703.04335v2 [stat.ml] 15 May 2018 Abstract We propose a novel Bayesian Optimization approach for black-box objectives with a variable
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation
More informationTracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods
Tracking Algorithms CSED441:Introduction to Computer Vision (2017F) Lecture16: Visual Tracking I Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state,
More informationarxiv: v1 [cs.lg] 6 Jun 2016
Learning to Optimize arxiv:1606.01885v1 [cs.lg] 6 Jun 2016 Ke Li Jitendra Malik Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720 United States
More informationarxiv: v3 [math.oc] 18 Mar 2015
A warped kernel improving robustness in Bayesian optimization via random embeddings arxiv:1411.3685v3 [math.oc] 18 Mar 2015 Mickaël Binois 1,2 David Ginsbourger 3 Olivier Roustant 1 1 Mines Saint-Étienne,
More informationAdaptive Dropout Training for SVMs
Department of Computer Science and Technology Adaptive Dropout Training for SVMs Jun Zhu Joint with Ning Chen, Jingwei Zhuo, Jianfei Chen, Bo Zhang Tsinghua University ShanghaiTech Symposium on Data Science,
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationAn introduction to multi-armed bandits
An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Gaussian Processes Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann SS08, University of Freiburg, Department for Computer Science Announcement
More informationPractical Course WS12/13 Introduction to Monte Carlo Localization
Practical Course WS12/13 Introduction to Monte Carlo Localization Cyrill Stachniss and Luciano Spinello 1 State Estimation Estimate the state of a system given observations and controls Goal: 2 Bayes Filter
More informationIROS 05 Tutorial. MCL: Global Localization (Sonar) Monte-Carlo Localization. Particle Filters. Rao-Blackwellized Particle Filters and Loop Closing
IROS 05 Tutorial SLAM - Getting it Working in Real World Applications Rao-Blackwellized Particle Filters and Loop Closing Cyrill Stachniss and Wolfram Burgard University of Freiburg, Dept. of Computer
More informationVariational Methods for Discrete-Data Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian
More informationUsing Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe
Using Reinforcement Learning to Optimize Storage Decisions Ravi Khadiwala Cleversafe Topics What is Reinforcement Learning? Exploration vs. Exploitation The Multi-armed Bandit Optimizing read locations
More informationStatistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400
Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,
More informationLevel-set MCMC Curve Sampling and Geometric Conditional Simulation
Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve
More informationEfficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates
Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates Ilija Ilievski Graduate School for Integrative Sciences and Engineering National University of Singapore
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationParticle Filter in Brief. Robot Mapping. FastSLAM Feature-based SLAM with Particle Filters. Particle Representation. Particle Filter Algorithm
Robot Mapping FastSLAM Feature-based SLAM with Particle Filters Cyrill Stachniss Particle Filter in Brief! Non-parametric, recursive Bayes filter! Posterior is represented by a set of weighted samples!
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationNERC Gazebo simulation implementation
NERC 2015 - Gazebo simulation implementation Hannan Ejaz Keen, Adil Mumtaz Department of Electrical Engineering SBA School of Science & Engineering, LUMS, Pakistan {14060016, 14060037}@lums.edu.pk ABSTRACT
More informationInformation Filtering for arxiv.org
Information Filtering for arxiv.org Bandits, Exploration vs. Exploitation, & the Cold Start Problem Peter Frazier School of Operations Research & Information Engineering Cornell University with Xiaoting
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationTraditional Acquisition Functions for Bayesian Optimization
Traditional Acquisition Functions for Bayesian Optimization Jungtaek Kim (jtkim@postech.edu) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si
More informationBayesian Adaptive Direct Search: Hybrid Bayesian Optimization for Model Fitting
Bayesian Adaptive Direct Search: Hybrid Bayesian Optimization for Model Fitting Luigi Acerbi Center for Neural Science New Yor University luigi.acerbi@nyu.edu Wei Ji Ma Center for Neural Science & Dept.
More informationAutomated Configuration of MIP solvers
Automated Configuration of MIP solvers Frank Hutter, Holger Hoos, and Kevin Leyton-Brown Department of Computer Science University of British Columbia Vancouver, Canada {hutter,hoos,kevinlb}@cs.ubc.ca
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online
More informationarxiv: v2 [cs.lg] 8 Nov 2017
ACCELERATING NEURAL ARCHITECTURE SEARCH US- ING PERFORMANCE PREDICTION Bowen Baker 1, Otkrist Gupta 1, Ramesh Raskar 1, Nikhil Naik 2 (* - indicates equal contribution) 1 MIT Media Laboratory, Cambridge,
More informationWhere we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)
Where we are Background (15 min) Graph models, subgraph isomorphism, subgraph mining, graph clustering Eploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationACCELERATING NEURAL ARCHITECTURE SEARCH US-
ACCELERATING NEURAL ARCHITECTURE SEARCH US- ING PERFORMANCE PREDICTION Bowen Baker, Otkrist Gupta, Ramesh Raskar Media Laboratory Massachusetts Institute of Technology Cambridge, MA 02142, USA {bowen,otkrist,raskar}@mit.edu
More informationProbabilistic Robotics
Probabilistic Robotics FastSLAM Sebastian Thrun (abridged and adapted by Rodrigo Ventura in Oct-2008) The SLAM Problem SLAM stands for simultaneous localization and mapping The task of building a map while
More informationCombining Hyperband and Bayesian Optimization
Combining Hyperband and Bayesian Optimization Stefan Falkner Aaron Klein Frank Hutter Department of Computer Science University of Freiburg {sfalkner, kleinaa, fh}@cs.uni-freiburg.de Abstract Proper hyperparameter
More informationSupplementary material for: BO-HB: Robust and Efficient Hyperparameter Optimization at Scale
Supplementary material for: BO-: Robust and Efficient Hyperparameter Optimization at Scale Stefan Falkner 1 Aaron Klein 1 Frank Hutter 1 A. Available Software To promote reproducible science and enable
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationCSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3
CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas
More informationCSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel
CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationCSC 411 Lecture 4: Ensembles I
CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationarxiv: v1 [cs.cv] 5 Jan 2018
Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning arxiv:1801.01596v1 [cs.cv] 5 Jan 2018 Jiazhuo Wang Blippar jiazhuo.wang@blippar.com Jason Xu Blippar
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationSAS High-Performance Analytics Products
Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products
More informationarxiv: v1 [stat.ml] 17 Nov 2015
Bayesian Optimization with Dimension Scheduling: Application to Biological Systems Doniyor Ulmasov a, Caroline Baroukh b, Benoit Chachuat c, Marc Peter Deisenroth a,* and Ruth Misener a,* arxiv:1511.05385v1
More informationCSE 490R P1 - Localization using Particle Filters Due date: Sun, Jan 28-11:59 PM
CSE 490R P1 - Localization using Particle Filters Due date: Sun, Jan 28-11:59 PM 1 Introduction In this assignment you will implement a particle filter to localize your car within a known map. This will
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationOverview. EECS 124, UC Berkeley, Spring 2008 Lecture 23: Localization and Mapping. Statistical Models
Introduction ti to Embedded dsystems EECS 124, UC Berkeley, Spring 2008 Lecture 23: Localization and Mapping Gabe Hoffmann Ph.D. Candidate, Aero/Astro Engineering Stanford University Statistical Models
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationLearning algorithms for physical systems: challenges and solutions
Learning algorithms for physical systems: challenges and solutions Ion Matei Palo Alto Research Center 2018 PARC 1 All Rights Reserved System analytics: how things are done Use of models (physics) to inform
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationWarped Mixture Models
Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable
More informationBAYESIAN GLOBAL OPTIMIZATION
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global
More information