# 3 Nonlinear Regression

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic form, i.e., y = f(x) () In linear regression, we have f(x) = Wx+b; the parameters W and b must be fit to data. What nonlinear function do we choose? In principle, f(x) could be anything: it could involve linear functions, sines and cosines, summations, and so on. However, the form we choose will make a big difference on the effectiveness of the regression: a more general model will require more data to fit, and different models are more appropriate for different problems. Ideally, the form of the model would be matched exactly to the underlying phenomenon. If we re modeling a linear process, we d use a linear regression; if we were modeling a physical process, we could, in principle, model f(x) by the equations of physics. In many situations, we do not know much about the underlying nature of the process being modeled, or else modeling it precisely is too difficult. In these cases, we typically turn to a few models in machine learning that are widely-used and quite effective for many problems. These methods include basis function regression (including Radial Basis Functions), Artificial Neural Networks, and k-nearest Neighbors. There is one other important choice to be made, namely, the choice of objective function for learning, or, equivalently, the underlying noise model. In this section we extend the LS estimators introduced in the previous chapter to include one or more terms to encourage smoothness in the estimated models. It is hoped that smoother models will tend to overfit the training data less and therefore generalize somewhat better. 3. Basis function regression A common choice for the function f(x) is a basis function representation : y = f(x) = k w k b k (x) (2) for the D case. The functions b k (x) are called basis functions. Often it will be convenient to express this model in vector form, for which we define b(x) = [b (x),...,b M (x)] T and w = [w,...,w M ] T where M is the number of basis functions. We can then rewrite the model as y = f(x) = b(x) T w (3) Two common choices of basis functions are polynomials and Radial Basis Functions (RBF). A simple, common basis for polynomials are the monomials, i.e., b (x) =, b (x) = x, b 2 (x) = x 2, b 3 (x) = x 3,... (4) In the machine learning and statistics literature, these representations are often referred to as linear regression, since they are linear functions of the features b k (x) Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 9

2 CSC 4 / CSC D / CSC C Polynomial basis functions x x x 2 x Radial Basis Functions x x Figure : The first three basis functions of a polynomial basis, and Radial Basis Functions With a monomial basis, the regression model has the form f(x) = w k x k, (5) Radial Basis Functions, and the resulting regression model are given by b k (x) = e (x c k ) 2 2σ 2, (6) f(x) = w k e (x c k ) 2 2σ 2, (7) where c k is the center (i.e., the location) of the basis function and σ 2 determines the width of the basis function. Both of these are parameters of the model that must be determined somehow. In practice there are many other possible choices for basis functions, including sinusoidal functions, and other types of polynomials. Also, basis functions from different families, such as monomials and RBFs, can be combined. We might, for example, form a basis using the first few polynomials and a collection of RBFs. In general we ideally want to choose a family of basis functions such that we get a good fit to the data with a small basis set so that the number of weights to be estimated is not too large. To fit these models, we can again use least-squares regression, by minimizing the sum of squared residual error between model predictions and the training data outputs: ( E(w) = i (y i f(x i )) 2 = i y i k w k b k (x)) 2 (8) To minimize this function with respect tow, we note that this objective function has the same form as that for linear regression in the previous chapter, except that the inputs are now theb k (x) values. Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker

3 CSC 4 / CSC D / CSC C In particular, E is still quadratic in the weights w, and hence the weights w can be estimated the same way. That is, we can rewrite the objective function in matrix-vector form to produce E(w) = y Bw 2 (9) where denotes the Euclidean norm, and the elements of the matrixbare given byb i,j = b j (x i ) (for row i and column j). In Matlab the least-squares estimate can be computed as w = B\y. Picking the other parameters. The positions of the centers and the widths of the RBF basis functions cannot be solved directly for in closed form. So we need some other criteria to select them. If we optimize these parameters for the squared-error, then we will end up with one basis center at each data point, and with tiny width that exactly fit the data. This is a problem as such a model will not usually provide good predictions for inputs other than those in the training set. The following heuristics instead are commonly used to determine these parameters without overfitting the training data. To pick the basis centers:. Place the centers uniformly spaced in the region containing the data. This is quite simple, but can lead to empty regions with basis functions, and will have an impractical number of data points in higher-dimensinal input spaces. 2. Place one center at each data point. This is used more often, since it limits the number of centers needed, although it can also be expensive if the number of data points is large. 3. Cluster the data, and use one center for each cluster. We will cover clustering methods later in the course. To pick the width parameter:. Manually try different values of the width and pick the best by trial-and-error. 2. Use the average squared distances (or median distances) to neighboring centers, scaled by a constant, to be the width. This approach also allows you to use different widths for different basis functions, and it allows the basis functions to be spaced non-uniformly. In later chapters we will discuss other methods for determining these and other parameters of models. 3.2 Overfitting and Regularization Directly minimizing squared-error can lead to an effect called overfitting, wherein we fit the training data extremely well (i.e., with low error), yet we obtain a model that produces very poor predictions on future test data whenever the test inputs differ from the training inputs (Figure 2(b)). Overfitting can be understood in many ways, all of which are variations on the same underlying pathology: Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker

4 CSC 4 / CSC D / CSC C. The problem is insufficiently constrained: for example, if we have ten measurements and ten model parameters, then we can often obtain a perfect fit to the data. 2. Fitting noise: overfitting can occur when the model is so powerful that it can fit the data and also the random noise in the data. 3. Discarding uncertainty: the posterior probability distribution of the unknowns is insufficiently peaked to pick a single estimate. (We will explain what this means in more detail later.) There are two important solutions to the overfitting problem: adding prior knowledge and handling uncertainty. The latter one we will discuss later in the course. In many cases, there is some sort of prior knowledge we can leverage. A very common assumption is that the underlying function is likely to be smooth, for example, having small derivatives. Smoothness distinguishes the examples in Figure 2. There is also a practical reason to prefer smoothness, in that assuming smoothness reduces model complexity: it is easier to estimate smooth models from small datasets. In the extreme, if we make no prior assumptions about the nature of the fit then it is impossible to learn and generalize at all; smoothness assumptions are one way of constraining the space of models so that we have any hope of learning from small datasets. One way to add smoothness is to parameterize the model in a smooth way (e.g., making the width parameter for RBFs larger; using only low-order polynomial basis functions), but this limits the expressiveness of the model. In particular, when we have lots and lots of data, we would like the data to be able to overrule the smoothness assumptions. With large widths, it is impossible to get highly-curved models no matter what the data says. Instead, we can add regularization: an extra term to the learning objective function that prefers smooth models. For example, for RBF regression with scalar outputs, and with many other types of basis functions or multi-dimensional outputs, this can be done with an objective function of the form: E(w) = y Bw 2 }{{} data term + λ w 2 }{{} smoothness term This objective function has two terms. The first term, called the data term, measures the model fit to the training data. The second term, often called the smoothness term, penalizes non-smoothness (rapid changes in f(x)). This particular smoothness term ( w ) is called weight decay, because it tends to make the weights smaller. 2 Weight decay implicitly leads to smoothness with RBF basis functions because the basis functions themselves are smooth, so rapid changes in the slope of f (i.e., high curvature) can only be created in RBFs by adding and subtracting basis functions with large weights. (Ideally, we might directly penalize smoothness, e.g., using an objective term that directly penalizes the integral of the squared curvature of f(x), but this is usually impractical.) 2 Estimation with this objective function is sometimes called Ridge Regression in Statistics. () Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 2

5 CSC 4 / CSC D / CSC C This regularized least-squares objective function is still quadratic with respect to w and can be optimized in closed-form. To see this, we can rewrite it as follows: E(w) = (y Bw) T (y Bw)+λw T w () = w T B T Bw 2w T B T y+λw T w+y T y (2) = w T (B T B+λI)w 2w T B T y+y T y (3) To minimize E(w), as above, we solve the normal equations E(w) = (i.e., E/ w i = for alli). This yields the following regularized LS estimate for w: w = (B T B+λI) B T y (4) 3.3 Artificial Neural Networks Another choice of basis function is the sigmoid function. Sigmoid literally means s-shaped. The most common choice of sigmoid is: g(a) = +e a (5) Sigmoids can be combined to create a model called an Artificial Neural Network (ANN). For regression with multi-dimensional inputs x R K 2, and multidimensional outputs y R K : y = f(x) = ( ) w () j g w (2) k,j x k +b (2) j +b () (6) j This equation describes a process whereby a linear regressor with weights w (2) is applied to x. The output of this regressor is then put through the nonlinear Sigmoid function, the outputs of which act as features to another linear regressor. Thus, note that the inner weightsw (2) are distinct parameters from the outer weights w () j. As usual, it is easiest to interpret this model in the D case, i.e., y = f(x) = j w () j g k ( w (2) j x+b (2) j ) +b () (7) Figure 3(left) shows plots of g(wx) for different values of w, and Figure 3(right) shows g(x+b) for different values of b. As can be seen from the figures, the sigmoid function acts more or less like a step function for large values of w, and more like a linear ramp for small values of w. The bias b shifts the function left or right. Hence, the neural network is a linear combination of shifted (smoothed) step functions, linear ramps, and the bias term. To learn an artificial neural network, we can again write a regularized squared-error objective function: E(w,b) = y f(x) 2 +λ w 2 (8) Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 3

6 CSC 4 / CSC D / CSC C.5 training data points original curve estimated curve training data points original curve estimated curve (a) (b) training data points original curve estimated curve training data points original curve estimated curve (c) (d) Figure 2: Least-squares curve fitting of an RBF. (a) Point data (blue circles) was taken from a sine curve, and a curve was fit to the points by a least-squares fit. The horizontal axis is x, the vertical axis is y, and the red curve is the estimated f(x). In this case, the fit is essentially perfect. The curve representation is a sum of Gaussian basis functions. (b) Overfitting. Random noise was added to the data points, and the curve was fit again. The curve exactly fits the data points, which does not reproduce the original curve (a green, dashed line) very well. (c) Underfitting. Adding a smoothness term makes the resulting curve too smooth. (In this case, weight decay was used, along with reducing the number of basis functions). (d) Reducing the strength of the smoothness term yields a better fit. Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 4

7 CSC 4 / CSC D / CSC C w= w=2 w= w=.5 w= g(x 4) g(x) g(x+4).6.6 g x Figure 3: Left: Sigmoidsg(wx) = /(+e wx ) for various values ofw, ranging from linear ramps to smooth steps to nearly hard steps. Right: Sigmoids g(x + b) = /( + e x b ) with different shifts b. wherewcomprises the weights at both levels for allj. Note that we regularize by applying weight decay to the weights (both inner and outer), but not the biases, since only the weights affect the smoothness of the resulting function (why?). Unfortuntely, this objective function cannot be optimized in closed-form, and numerical optimization procedures must be used. We will study one such method, gradient descent, in a later chapter. 3.4 K-Nearest Neighbors At their heart, many learning procedures especially when our prior knowledge is weak amount to smoothing the training data. RBF fitting is an example of this. However, many of these fitting procedures require making a number of decisions, such as the locations of the basis functions, and can be sensitive to these choices. This raises the question: why not cut out the middleman, and smooth the data directly? This is the idea behind K-Nearest Neighbors regression. The idea is simple. We first select a parameterk, which is the only parameter to the algorithm. Then, for a new input x, we find the K nearest neighbors to x in the training set, based on their Euclidean distance x x i 2. Then, our new outputyis simply an average of the training outputs for those nearest neigbors. This can be expressed as: y = K i N K (x) y i (9) where the set N K (x) contains the indicies of the K training points closest to x. Alternatively, we might take a weighted average of the K-nearest neighbors to give more influence to training points Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 5

8 CSC 4 / CSC D / CSC C close toxthan to those further away: i N y = K (x) w(x i)y i i N K (x) w(x i), w(x i ) = e x i x 2 /2σ 2 (2) where σ 2 is an additional parameter to the algorithm. The parameters K and σ control the degree of smoothing performed by the algorithm. In the extreme case of K =, the algorithm produces a piecewise-constant function. K-nearest neighbors is simple and easy to implement; it doesn t require us to muck about at all with different choices of basis functions or regularizations. However, it doesn t compress the data at all: we have to keep around the entire training set in order to use it, which could be very expensive, and we must search the whole data set to make predictions. (The cost of searching can be mitigated with spatial data-structures designed for searching, such as k-d-trees and localitysensitive hashing. We will not cover these methods here). Copyright c 25 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 6

### Concept of Curve Fitting Difference with Interpolation

Curve Fitting Content Concept of Curve Fitting Difference with Interpolation Estimation of Linear Parameters by Least Squares Curve Fitting by Polynomial Least Squares Estimation of Non-linear Parameters

### A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

### Artificial Neural Networks (Feedforward Nets)

Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x

### Vertical and Horizontal Translations

SECTION 4.3 Vertical and Horizontal Translations Copyright Cengage Learning. All rights reserved. Learning Objectives 1 2 3 4 Find the vertical translation of a sine or cosine function. Find the horizontal

### CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

### Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

### Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

### COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation

### CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

### NONPARAMETRIC REGRESSION TECHNIQUES

NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940

### Voluntary State Curriculum Algebra II

Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.

### CHAPTER IX Radial Basis Function Networks

CHAPTER IX Radial Basis Function Networks Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. They are typically configured with a single hidden

### Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

### CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

### Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

### CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

### Calibrating an Overhead Video Camera

Calibrating an Overhead Video Camera Raul Rojas Freie Universität Berlin, Takustraße 9, 495 Berlin, Germany http://www.fu-fighters.de Abstract. In this section we discuss how to calibrate an overhead video

### Lab 2: Support vector machines

Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

### Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

### model order p weights The solution to this optimization problem is obtained by solving the linear system

CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples

### Louis Fourrier Fabien Gaie Thomas Rolf

CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

### CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,

### Machine Learning. Topic 4: Linear Regression Models

Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

### Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

### CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

### Lecture #11: The Perceptron

Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

### 3. Data Structures for Image Analysis L AK S H M O U. E D U

3. Data Structures for Image Analysis L AK S H M AN @ O U. E D U Different formulations Can be advantageous to treat a spatial grid as a: Levelset Matrix Markov chain Topographic map Relational structure

### Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

### GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

### Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

### Cost Functions in Machine Learning

Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something

### Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations

### Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

### Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

### Introduction to Support Vector Machines

Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

### EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation

### Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

### Machine Learning: Think Big and Parallel

Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

### All lecture slides will be available at CSC2515_Winter15.html

CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

### Other Linear Filters CS 211A

Other Linear Filters CS 211A Slides from Cornelia Fermüller and Marc Pollefeys Edge detection Convert a 2D image into a set of curves Extracts salient features of the scene More compact than pixels Origin

### Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

### University of Florida CISE department Gator Engineering. Clustering Part 2

Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

### The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

### Computational Physics PHYS 420

Computational Physics PHYS 420 Dr Richard H. Cyburt Assistant Professor of Physics My office: 402c in the Science Building My phone: (304) 384-6006 My email: rcyburt@concord.edu My webpage: www.concord.edu/rcyburt

### What is machine learning?

Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

### EE 511 Linear Regression

EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due

### Introduction to Homogeneous coordinates

Last class we considered smooth translations and rotations of the camera coordinate system and the resulting motions of points in the image projection plane. These two transformations were expressed mathematically

### Lesson 5 overview. Concepts. Interpolators. Assessing accuracy Exercise 5

Interpolation Tools Lesson 5 overview Concepts Sampling methods Creating continuous surfaces Interpolation Density surfaces in GIS Interpolators IDW, Spline,Trend, Kriging,Natural neighbors TopoToRaster

### CHAPTER 6 IMPLEMENTATION OF RADIAL BASIS FUNCTION NEURAL NETWORK FOR STEGANALYSIS

95 CHAPTER 6 IMPLEMENTATION OF RADIAL BASIS FUNCTION NEURAL NETWORK FOR STEGANALYSIS 6.1 INTRODUCTION The concept of distance measure is used to associate the input and output pattern values. RBFs use

### Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

### CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

### 1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check

Thinkwell s Placement Test 5 Answer Key If you answered 7 or more Test 5 questions correctly, we recommend Thinkwell's Algebra. If you answered fewer than 7 Test 5 questions correctly, we recommend Thinkwell's

### Distribution-free Predictive Approaches

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

### Splines and penalized regression

Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

### MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points

### 2. On classification and related tasks

2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

### Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

### Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

### [1] CURVE FITTING WITH EXCEL

1 Lecture 04 February 9, 2010 Tuesday Today is our third Excel lecture. Our two central themes are: (1) curve-fitting, and (2) linear algebra (matrices). We will have a 4 th lecture on Excel to further

### Learning to Learn: additional notes

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2008 Recitation October 23 Learning to Learn: additional notes Bob Berwick

### Introduction to Machine Learning

Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

### MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

### Spatial Interpolation & Geostatistics

(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

### Nonparametric Methods Recap

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

### Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

### Geostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents

GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points

### Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

### Programming Exercise 4: Neural Networks Learning

Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written

Radial Basis Function Networks As we have seen, one of the most common types of neural network is the multi-layer perceptron It does, however, have various disadvantages, including the slow speed in learning

### Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x

### Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

### Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

### Model Fitting. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Model Fitting CS6240 Multimedia Analysis Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore (CS6240) Model Fitting 1 / 34 Introduction Introduction Model

### Image Processing. Image Features

Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

### DIGITAL COLOR RESTORATION OF OLD PAINTINGS. Michalis Pappas and Ioannis Pitas

DIGITAL COLOR RESTORATION OF OLD PAINTINGS Michalis Pappas and Ioannis Pitas Department of Informatics Aristotle University of Thessaloniki, Box 451, GR-54006 Thessaloniki, GREECE phone/fax: +30-31-996304

### Multivariate Conditional Distribution Estimation and Analysis

IT 14 62 Examensarbete 45 hp Oktober 14 Multivariate Conditional Distribution Estimation and Analysis Sander Medri Institutionen för informationsteknologi Department of Information Technology Abstract

### Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

### Clustering and Visualisation of Data

Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

### Biomedical Image Analysis. Point, Edge and Line Detection

Biomedical Image Analysis Point, Edge and Line Detection Contents: Point and line detection Advanced edge detection: Canny Local/regional edge processing Global processing: Hough transform BMIA 15 V. Roth

### Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

### Opening the Black Box Data Driven Visualizaion of Neural N

Opening the Black Box Data Driven Visualizaion of Neural Networks September 20, 2006 Aritificial Neural Networks Limitations of ANNs Use of Visualization (ANNs) mimic the processes found in biological

### Clustering Part 4 DBSCAN

Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

### In the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System

In the Name of God Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System Outline ANFIS Architecture Hybrid Learning Algorithm Learning Methods that Cross-Fertilize ANFIS and RBFN ANFIS as a universal

### Understanding Gridfit

Understanding Gridfit John R. D Errico Email: woodchips@rochester.rr.com December 28, 2006 1 Introduction GRIDFIT is a surface modeling tool, fitting a surface of the form z(x, y) to scattered (or regular)

### 4.5 The smoothed bootstrap

4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution

### Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions

### CS4442/9542b Artificial Intelligence II prof. Olga Veksler

CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 2 Computer Vision Introduction, Filtering Some slides from: D. Jacobs, D. Lowe, S. Seitz, A.Efros, X. Li, R. Fergus, J. Hayes, S. Lazebnik,

Unit 1 Basic Skills Review BEDMAS a way of remembering order of operations: Brackets, Exponents, Division, Multiplication, Addition, Subtraction Collect like terms gather all like terms and simplify as

### Support Vector Machines

Support Vector Machines VL Algorithmisches Lernen, Teil 3a Norman Hendrich & Jianwei Zhang University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de

### Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

### Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

### lecture 10: B-Splines

9 lecture : -Splines -Splines: a basis for splines Throughout our discussion of standard polynomial interpolation, we viewed P n as a linear space of dimension n +, and then expressed the unique interpolating

### CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

### Nonparametric Approaches to Regression

Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

### Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

### Class 6 Large-Scale Image Classification

Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual

### Machine Learning for NLP

Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs