What is machine learning?

Similar documents
Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Last time... Bias-Variance decomposition. This week

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Network Traffic Measurements and Analysis

Theoretical Concepts of Machine Learning

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Simple Model Selection Cross Validation Regularization Neural Networks

Topics in Machine Learning-EE 5359 Model Assessment and Selection

3 Feature Selection & Feature Extraction

Machine Learning / Jan 27, 2010

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Machine Learning. Chao Lan

Learning from Data: Adaptive Basis Functions

Unsupervised Learning: Clustering

Nonparametric Methods Recap

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Machine Learning. Supervised Learning. Manfred Huber

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

Supervised Learning for Image Segmentation

Lecture 13: Model selection and regularization

Machine Learning Lecture 3

Generative and discriminative classification techniques

Clustering Lecture 5: Mixture Model

Machine Learning (BSMC-GA 4439) Wenke Liu

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Neural Networks and Deep Learning

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

Experimental Data and Training

Classification: Linear Discriminant Functions

A Taxonomy of Semi-Supervised Learning Algorithms

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

A General Greedy Approximation Algorithm with Applications

Random Forest A. Fornaser

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning. Topic 4: Linear Regression Models

Large Scale Data Analysis Using Deep Learning

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Probabilistic Graphical Models

Expectation Maximization (EM) and Gaussian Mixture Models

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Moving Beyond Linearity

10601 Machine Learning. Model and feature selection

Topics in Machine Learning

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Building Classifiers using Bayesian Networks

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Introduction to Mobile Robotics

Content-based image and video analysis. Machine learning

Machine Learning Lecture 3

Challenges motivating deep learning. Sargur N. Srihari

Machine Learning Lecture 3

Linear Methods for Regression and Shrinkage Methods

Supervised vs unsupervised clustering

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Contents. Preface to the Second Edition

Basis Functions. Volker Tresp Summer 2017

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

1) Give decision trees to represent the following Boolean functions:

Day 3 Lecture 1. Unsupervised Learning

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Penalizied Logistic Regression for Classification

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

CS 229 Midterm Review

Introduction to Automated Text Analysis. bit.ly/poir599

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Deep Generative Models Variational Autoencoders

Machine Learning Techniques for Data Mining

Grundlagen der Künstlichen Intelligenz

Introduction to Machine Learning CMU-10701

Bioinformatics - Lecture 07

5 Learning hypothesis classes (16 points)

Model selection and validation 1: Cross-validation

Adaptive Metric Nearest Neighbor Classification

Predictive modelling / Machine Learning Course on Big Data Analytics

CSE 446 Bias-Variance & Naïve Bayes

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

A Course in Machine Learning

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Bayes Net Learning. EECS 474 Fall 2016

Notes and Announcements

scikit-learn (Machine Learning in Python)

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Overview of machine learning

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Naïve Bayes for text classification

Transcription:

Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship between variables (predictors and responses) finding naural groups or classes in data relating observables to physical quantities Prediction capturing relationship between inputs and outputs for a set of labelled data with the goal of predicting outputs for unlabelled data ( pattern recognition ) Learning from data dealing with noise coping with high dimensions (many potentially relevant variables) fitting models to data generalizing 2

Concepts: types of problems supervised learning predictors (x) and responses (y) infer P(y x), perhaps modelled as f(x ; w) discrete y is a classification problem; real-valued is regression unsupervised learning no distinction between predictors and responses infer P(x), or things about this e.g. no. of modes/classes (mixture modelling, peak finding) low dimensional projections (descriptions) (PCA, SOM, MDS) outlier detection (discovery) 3 Concepts: probabilities and Bayes likelihood of x given y posterior probability of y given x prior over y evidence for model 4

Concepts: solution procedure 1. need some kind of expression for P(y x) or P(x) e.g. f(x ; w) = P(y x ) 2. parametric, semi-, or non-parametric. E.g. density estimation and nonlinear regression parametric: Gaussian distribution P(x), spline f(x) semi-parametric: sum of several Gaussians, additive model, local regression non-parametric: k-nn, kernel estimate, k-nn 3. parametric models: fit to data a) need to infer adjustable parameters, w, from data b) generally minimize a loss function on a labelled data set w.r.t w 4. compare different models 5 Concepts: objective function 6

Loss functions 7 Models: linear modelling (linear least squares) 8

Concepts: maximum likelihood (as a loss function) 9 Concepts: generalization and regularization given a specific set of data, we nonetheless want a general solution therefore, must make some kind of assumption(s) smoothness in functions priors on model parameters (or functions, or predictions) restricting model space regularization involves a free parameter, although this can also be inferred from the data 10

Models: penalized linear modelling (ridge regression) 11 Models: ridge regression (as regularization) the regularization projects the data onto the PCs and downweights ( shrinks ) them inversely proportional to their variance limits the model space one free parameter: large implies large degree of regularization, df() is small 12

Models: ridge regression vs. df() Hastie, Tibshirani, Friedman (2001) 13 Models: splines Hastie, Tibshirani, Friedman (2001) 14

Concepts: regularization (in splines) Avoid know selection by selecting all points as knots Avoid overfitting via regularization that is, minimise a penalized sum-of-squares 15 Concepts: regularization (in smoothing splines) 16

Concepts: regularization (in smoothing splines) 17 Concepts: regularization in ANNs and SVMs 18

Concepts: model comparison and selection cross validation n-fold, leave-one-out, generalized compare and select models using just the training set account for model complexity plus bias from finite-sized training set Bayes Information Criterion Akaike Information Criterion k is no. of parameters; N is no. of training vectors smallest BIC or AIC corresponds to optimal model Bayesian evidence for model (hypothesis) H, P(D H) probability that data arises from model, marginalized over all model parameters 19 Concepts: Occam's razor and Bayesian evidence D = data H = hypothesis (model) w = model parameters Simpler model, H 1, predicts less of the data space Evidence naturally penalizes more complex models after MacKay (1992) 20

Concepts: curse of dimensionality to retain density, no. vectors must grow exponentially with no. dimensions generally cannot do this overcome curse in various ways make assumptions: structured regression limit model space generalized additive models basis functions and kernels 21 Models: basis expansions linear model quadractic terms higher order terms other transformations, e.g. split range with an indicator function generalized additive models 22

Models: MLP neural network basis functions 23 Models: radial basis function neural networks 24

Concepts: optimization With gradient information gradient descent add second derivative (Hessian): Newton, quasi-newton, Levenberg-Marquardt, conjugate gradients pure gradient methods get stuck in local minima random restart committee/ensemble of models momentum terms (non-gradient info.) without gradient information expectation-maximization (EM) algorithm simulated annealing genetic algorithms 25 Concepts: marginalization (Bayes again) 26