FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Size: px
Start display at page:

Download "FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu"

Transcription

1 FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu

2 Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s) determined by an estimator (ML, MAP, etc.) Analysis (e.g. the bias variance decomposition) is realized by computing expectations over all possible datasets Empirically, uncertainty (error bars) for the estimate is obtained by approximating the distribution of possible datasets e.g. by bootstrap sampling the original, observed data, with replacement In a Bayesian perspective, there is a single dataset, the one actually observed, and uncertainty is expressed through a probability distribution over all model parameters and variables of interest In this lecture we will analyze linear models both from a frequentist and from a Bayesian perspective

3 Linear Models It is mathematically easy to fit linear models to data and we can get insight by analyzing this relatively simple case. We already studied the polynomial curve fitting problem in the previous lectures There are many ways to make linear models more powerful while retaining their attractive mathematical properties By using non linear, non adaptive basis functions, we obtain generalized linear models. These offer non linear mappings from inputs to outputs but are linear in their parameters. Typically, only the linear part of the model is learnt By using kernel methods we can handle expansions of the raw data that use a huge number of non linear, non adaptive basis functions. By using large margin loss functions, we can avoid overfitting even when we use huge numbers of basis functions adapted from MLG@Toronto

4 The Loss Function Once a model class is chosen, fitting the model to data is typically done by finding the parameter values that minimize a loss function There are many possible loss functions. What criterion should we use in selecting one? Loss that makes analytic calculations easy (squared error) Loss that makes the fitting correspond to maximizing the likelihood of the training data, given some noise model for the observed outputs Loss that makes it easy to interpret the learned coefficients (easy if mostly zeros) Loss that corresponds to the real loss on a practical application (losses are often asymmetric) adapted from MLG@Toronto

5 Linear Basis Function Models (1) Consider the polynomial curve fitting. The model is linear in the parameters and non linear in inputs Slides adapted from textbook (Bishop)

6 Linear Basis Function Models (2) Generally where are known as basis functions. Typically, = 1 (dummy basis function) so that acts as a bias (allows for any fixed offset in the data do not confuse with statistical bias) In the simplest case, we use linear basis functions:

7 Linear Basis Function Models (3) Polynomial basis functions: These are global; a small change in affects all basis functions. The issue can be resolved by dividing input space into regions and fitting a different polynomial in each region. This leads to spline functions.

8 Linear Basis Function Models (4) Gaussian basis functions: These are local; a small change in only affect nearby basis functions. and control location and scale (width).

9 Linear Basis Function Models (5) Sigmoidal basis functions: where Also these are local; a small change in only affect nearby basis functions. and control location and scale (slope).

10 Other Basis Function Models (6) In a Fourier representation, each basis function represents a given frequency and has infinite spatial extent Wavelets are localized in both space and frequency, and by definition are linearly orthogonal

11 Maximum Likelihood and Least Squares (1) Assume observations from a deterministic function with added Gaussian noise: where which is the same as saying, /2 exp 2, Given observed inputs,, and targets,, we obtain the likelihood function

12 Maximum Likelihood and Least Squares (2) Taking the logarithm, we get where is the sum of squares error

13 Maximum Likelihood and Least Squares (3) Computing the gradient and setting it to zero yields Solving for where, we get normal equations The Moore Penrose pseudo inverse,. design matrix (NxM)

14 Maximum Likelihood and Least Squares (4) Maximizing with respect to the bias,, we get Bias compensates for the difference between averages of the target values and weighted sum of averages of basis function values We can also maximize with respect to, giving Inverse noise precision is given by residual variance of targets around regression function

15 Geometry of Least Squares The space has one axis for each training case. The vector of target values,, is a point in space Each vector of the values of one component of the input is also a point in this space. The input component vectors span a subspace,. A weighted sum of the input component vectors must lie in. The optimal solution is the orthogonal projection of the vector of target values onto.

16 Sequential Learning Data items considered one at a time (a.k.a. online learning); use stochastic (sequential) gradient descent: This is known as the least mean squares (LMS) algorithm. Issue: how to choose?

17 Regularized Least Squares (1) Consider the error function: Data term + Regularization term With the sum of squares error function and a quadratic regularizer, we get which is minimized by is called the regularization coefficient.

18 Regularized Least Squares (2) With a more general regularizer, we have Lasso Quadratic

19 Effect of a Quadratic Regularizer The overall cost function is the sum of two quadratic functions The sum is also quadratic The combined minimum lies on the line between the minimum of the squared error and the origin The regularizer shrinks the weights, but does not make them exactly zero

20 Lasso: penalizing the absolute values of weights Finding the minimum requires quadratic programming but it is still unique because the cost function is convex Sum of two convex functions: a bowl plus an inverted pyramid As is increased, many of the weights go exactly to zero This is good for interpretation, e.g. identify factors causing a given disease, given input vectors containing many measurement components (temperature, blood pressure, sugar level, etc.) It is also good in preventing overfitting

21 Penalty over square weights vs. Lasso Lasso tends to generate sparser solutions than a quadratic regularizer Notice that 0at the optimum

22 1 d loss functions with different powers Negative log of Gaussian Negative log of Laplacian

23 Multiple Outputs If there are multiple outputs we can often treat the learning problem as a set of independent problems, one per output This approach is suboptimal if the output noise is correlated and changes across datapoints Even though they are independent problems we can save work by only multiplying the input vectors by the inverse covariance of the input components once

24 Multiple Outputs (1) Analogously to the single output case we have: Given observed inputs,, and targets,, we obtain the log likelihood function

25 Multiple Outputs (2) Maximizing with respect to, we obtain If we consider a single target variable,, we see that where, which is identical with the single output case.

26 Decision Theory for Regression (L2) Inference step Determine Decision step For given, make optimal prediction for Loss function:

27 The Squared Loss Function (L2) substitute and integrate over, w. r. t where regression function Optimal least squares predictor given by the conditional average First term in gives prediction error according to conditional mean estimates Second term is independent of. It measures the intrinsic variability of the target data, hence it represents the irreducible minimum value of loss, independent of

28 The Bias Variance Decomposition (1) Expected squared loss, where The second term of corresponds to the noise inherent in the random variable. What about the first term?

29 The Bias Variance Decomposition (2) Suppose we were given multiple data sets, each of size. Any particular data set,, will give a particular function. We then have

30 The Bias Variance Decomposition (3) Taking the expectation over yields

31 The Bias Variance Decomposition (4) Thus we can write where

32 The Bias Variance Decomposition (5) Example: 25 data sets from the sinusoidal, varying the degree of regularization,

33 The Bias Variance Decomposition (6) Example: 25 data sets from the sinusoidal, varying the degree of regularization,

34 The Bias Variance Decomposition (7) Example: 25 data sets from the sinusoidal, varying the degree of regularization,

35 The Bias Variance Trade off From these plots, we note that an over regularized model (large will have a high bias, while an under regularized model (small ) will have a high variance.

36 Recall: The Gaussian Distribution (L2) Can also use precision, Λ Σ

37 Linear Gaussian Models (L2) Given we have where Many applications for linear Gaussian models: time series models linear dynamical systems (Kalman filtering), Bayesian modeling

38 Predictive Distribution

39 Bayesian Regression (1) Determine by minimizing regularized sum of squares error, with regularizer corresponding to /

40 Bayesian Regression (2) In compact form, for,

41 Bayesian Linear Regression (3) 0 data points observed Red lines: samples from the model prior Prior Data Space Fit,, assume known. 25 Ground truth,, ; 0.3, 0.5, 1,1, ~ 0,0.2

42 Bayesian Linear Regression (4) 1 data point observed Likelihood, Posterior Red lines: samples from the model posterior Data Space Fit,, assume known. 25 Ground truth,, ; 0.3, 0.5, 1,1, ~ 0,0.2

43 Bayesian Linear Regression (5) 2 data points observed, second Likelihood, Posterior Red lines: samples from the model posterior Data Space Fit,, assume known. 25 Ground truth,, ; 0.3, 0.5, 1,1, ~ 0,0.2

44 Bayesian Linear Regression (6) 20 data points observed, last being Likelihood, Posterior Red lines: samples from the model posterior Data Space Fit,, assume known. 25 Ground truth,, ; 0.3, 0.5, 1,1, ~ 0,0.2

45 Predictive Distribution (1) Predict for new values of over : by integrating where Noise in Uncertainty for the data parameters Can prove:, 0

46 Predictive Distribution (2) Example: Sinusoidal data, 9 Gaussian basis functions, 1 data point

47 Predictive Distribution (3) Example: Sinusoidal data, 9 Gaussian basis functions, 2 data points

48 Predictive Distribution (4) Example: Sinusoidal data, 9 Gaussian basis functions, 4 data points

49 Predictive Distribution (5) Example: Sinusoidal data, 9 Gaussian basis functions, 25 data points

50 Equivalent Kernel (1) The predictive mean can be written Equivalent kernel or smoother matrix. This is a weighted sum of the training data target values,

51 Equivalent Kernel (2) Weight of depends on distance between nearby carry more weight. and ;

52 Bayesian Model Comparison (1) How do we choose the right model? Assume we want to compare models, using data ; this requires computing Posterior Prior Model evidence or marginal likelihood Bayes Factor: ratio of evidence for two models

53 Bayesian Model Comparison (2) Having computed we can compute the predictive (mixture) distribution A simpler approximation, known as model selection, is to use the model with the highest evidence.

54 Bayesian Model Comparison (3) For a model with parameters, we get the model evidence by marginalizing over Note that

55 Bayesian Model Comparison (4) For a given model with a single parameter,, consider the approximation where the posterior is assumed to be sharply peaked.

56 Bayesian Model Comparison (5) Taking logarithms, we obtain Negative With parameters, all assumed to have the same ratio, we get Negative and linear in M.

57 Bayesian Model Comparison (6) Matching data and model complexity A simple model (e.g. ) has little variability, so it will generate datasets that are fairly similar to each other, with distribution confined narrowly on the axis Complex models (e.g. ) can generate a large variety of datasets and assign a relatively low probability to any one of them

58 The Evidence Approximation (1) The full Bayesian predictive distribution is given by but this integral is intractable. Approximate with where is the mode of, which is assumed to be sharply peaked; a.k.a. empirical Bayes, type II or generalized maximum likelihood, or evidence approximation.

59 The Evidence Approximation (2) From Bayes theorem we have and if we assume to be flat we see that General results for Gaussian integrals give

60 The Evidence Approximation (3) Example: sinusoidal data, degree polynomial,

61 Readings Bishop Ch. 3, sections (optional 3.5)

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Linear regression Linear Basis FuncDon Models (1) Example: Polynomial Curve FiLng Linear Basis FuncDon Models (2) Generally

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Learning via Optimization

Learning via Optimization Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Linear Regression QuadraJc Regression Year

Linear Regression QuadraJc Regression Year Linear Regression Regression Given: n Data X = x (1),...,x (n)o where x (i) 2 R d n Corresponding labels y = y (1),...,y (n)o where y (i) 2 R September Arc+c Sea Ice Extent (1,000,000 sq km) 9 8 7 6 5

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Multiple Model Estimation : The EM Algorithm & Applications

Multiple Model Estimation : The EM Algorithm & Applications Multiple Model Estimation : The EM Algorithm & Applications Princeton University COS 429 Lecture Dec. 4, 2008 Harpreet S. Sawhney hsawhney@sarnoff.com Plan IBR / Rendering applications of motion / pose

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous

More information

EE 511 Linear Regression

EE 511 Linear Regression EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due

More information

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

Introduction to Automated Text Analysis. bit.ly/poir599

Introduction to Automated Text Analysis. bit.ly/poir599 Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions

More information

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation

More information

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Module 4. Non-linear machine learning econometrics: Support Vector Machine Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Introduction to ANSYS DesignXplorer

Introduction to ANSYS DesignXplorer Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

Non-linearity and spatial correlation in landslide susceptibility mapping

Non-linearity and spatial correlation in landslide susceptibility mapping Non-linearity and spatial correlation in landslide susceptibility mapping C. Ballabio, J. Blahut, S. Sterlacchini University of Milano-Bicocca GIT 2009 15/09/2009 1 Summary Landslide susceptibility modeling

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1 Interactive Graphics Lecture 9: Introduction to Spline Curves Interactive Graphics Lecture 9: Slide 1 Interactive Graphics Lecture 13: Slide 2 Splines The word spline comes from the ship building trade

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information