Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
|
|
- Raymond Randall
- 5 years ago
- Views:
Transcription
1 Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving Least Squares Problems, C.L.Lawson & R.J. Hanson, SIAM, An Introduction to Support Vector Machines, N. Cristiani & J. Shawe-Taylor, Cambridge, COM3250 /
2 Numeric Prediction So far we have primarily focused on concept learning binary classification For example: credit-worthy loan application mushroom data: edible vs poisonous (= edible vs non-edible) However, most algorithms extend easily to n-ary classification zoo data (7 classes) COM3250 /
3 Numeric Prediction So far we have primarily focused on concept learning binary classification For example: credit-worthy loan application mushroom data: edible vs poisonous (= edible vs non-edible) However, most algorithms extend easily to n-ary classification zoo data (7 classes) Key characteristic of these problems is that the target attribute is a nominal attribute COM3250 / a
4 Numeric Prediction So far we have primarily focused on concept learning binary classification For example: credit-worthy loan application mushroom data: edible vs poisonous (= edible vs non-edible) However, most algorithms extend easily to n-ary classification zoo data (7 classes) Key characteristic of these problems is that the target attribute is a nominal attribute In most cases non-target attributes have also been nominal Have seen how numeric attributes can be converted to nominal attributes using a variety of discretization approaches COM3250 / b
5 Numeric Prediction So far we have primarily focused on concept learning binary classification For example: credit-worthy loan application mushroom data: edible vs poisonous (= edible vs non-edible) However, most algorithms extend easily to n-ary classification zoo data (7 classes) Key characteristic of these problems is that the target attribute is a nominal attribute In most cases non-target attributes have also been nominal Have seen how numeric attributes can be converted to nominal attributes using a variety of discretization approaches What if target attribute is numeric? For example: Heuristic evaluation functions for board games, such as checkers Numeric functions to relate one physical quantity to others: temperature/pressure, lean body mass/muscle strength, etc. In such cases usual that non-target attributes are also numeric COM3250 / c
6 Linear Regression If target and non-target attributes are numeric then a classic technique to consider is linear regression. COM3250 /
7 Linear Regression If target and non-target attributes are numeric then a classic technique to consider is linear regression. Output class/target attribute x is expressed as a linear combination of the other attributes a 1,...,a n with predetermined weights w 1,...,w n : x = w 0 + w 1 a 1 + w 2 a w n a n COM3250 / a
8 Linear Regression If target and non-target attributes are numeric then a classic technique to consider is linear regression. Output class/target attribute x is expressed as a linear combination of the other attributes a 1,...,a n with predetermined weights w 1,...,w n : x = w 0 + w 1 a 1 + w 2 a w n a n The machine learning challenge is to compute the weights from the training data. I.e. View an assignment to weights w i as an hypothesis Pick the hypothesis that best fits the training data COM3250 / b
9 Linear Regression If target and non-target attributes are numeric then a classic technique to consider is linear regression. Output class/target attribute x is expressed as a linear combination of the other attributes a 1,...,a n with predetermined weights w 1,...,w n : x = w 0 + w 1 a 1 + w 2 a w n a n The machine learning challenge is to compute the weights from the training data. I.e. View an assignment to weights w i as an hypothesis Pick the hypothesis that best fits the training data In linear regression the technique used to do this choses the w i so as to minimize the sum of the squares of the differences between the actual and predicted values for the target attribute over the training data called least squares approximation Note that if difference between actual and predicted target value is viewed as error then least squares approximation minimizes sum of the errors squared and hence minimizes sum of errors across training data COM3250 / c
10 Linear Regression: Example 1 Estimating pressure of fixed amount of gas in a tank, given temperature Under these conditions, Charles Law states that the pressure of a gas is proportional to its temperature can use this to determine line based on true parameters Given set of temperature/pressure data points can use linear regression to derive a line based on estimated parameters Source: NIST Engineering Statistics Handbook, Section Least Squares, COM3250 /
11 Linear Regression (cont) While linear regression is frequently thought of as fitting a line/plane to a set of data points it can be used to fit the data with any function of the form: in which ( x; β) = β 0 + β 1 x 1 + β 2 x each explanatory variable (x i ) in the function is multiplied by an unknown parameter (β i ) 2. there is at most one unknown parameter with no corresponding explanatory variable (β 0 ), and 3. all of the individual terms are summed to produce the final function value. So, quadratic curves, straight-line models in log(x), polynomials in sin(x) are linear in the statistical sense so long as they are linear in the parameters β i, even though they are not linear in respect of the explanatory variables. COM3250 /
12 Linear Regression: Example 2 Detecting craters (ellipses/circles) on Mars from 2D image data Randomly sample dark points in images and estimate linear parameters a, b, c, d, e for conic sections: ax 2 + bxy+cy 2 + dx+ey = 1 COM3250 /
13 Least Squares Approximation Suppose there are m training examples where each instance is represented by values for n numeric non-target attributes a 0,a 1,...,a n, where the value of j-th attribute for the i-th example is denoted a i, j a i,0 = 1, 1 i m a value for the target attribute x, denoted x i for the i-th example COM3250 /
14 Least Squares Approximation Suppose there are m training examples where each instance is represented by values for n numeric non-target attributes a 0,a 1,...,a n, where the value of j-th attribute for the i-th example is denoted a i, j a i,0 = 1, 1 i m a value for the target attribute x, denoted x i for the i-th example We wish to learn weights w 0,w 1,...w n so as to minimize m i=1 (x i n w j a i, j ) 2 j=0 COM3250 / a
15 Least Squares Approximation Suppose there are m training examples where each instance is represented by values for n numeric non-target attributes a 0,a 1,...,a n, where the value of j-th attribute for the i-th example is denoted a i, j a i,0 = 1, 1 i m a value for the target attribute x, denoted x i for the i-th example We wish to learn weights w 0,w 1,...w n so as to minimize m i=1 (x i n w j a i, j ) 2 j=0 The problem is naturally represented in matrix notation. Ideally we would like to find a column vector of weights w 0...w n such that: a 1,0 a 1,1 a 1,2... a 1,n w 0 x 1 a 2,0 a 2,1 a 2,2... a 2,n w 1 x 2... = a m,0 a m,1 a m,2... a m,n w n x m i.e. such that Aw = x. Failing this we want a vector of weights w that minimizes Aw x COM3250 / b
16 Least Squares Approximation (cont) A vector w that minimizes Aw x is called a least squares solution of Aw = x Such a solution is given by: w = (A T A) 1 A T x (1) COM3250 /
17 Least Squares Approximation (cont) A vector w that minimizes Aw x is called a least squares solution of Aw = x Such a solution is given by: A proof of (1) can be arrived at in various ways: w = (A T A) 1 A T x (1) Reasoning about projections onto the column space of A (i.e. using linear algebra) Differentiating the sum of squares error expression with respect to the weights w and computing the value of w for which this derivative = 0 COM3250 / a
18 Least Squares Approximation (cont) A vector w that minimizes Aw x is called a least squares solution of Aw = x Such a solution is given by: A proof of (1) can be arrived at in various ways: w = (A T A) 1 A T x (1) Reasoning about projections onto the column space of A (i.e. using linear algebra) Differentiating the sum of squares error expression with respect to the weights w and computing the value of w for which this derivative = 0 Consider the latter. The error function Err(w) = m i=1 (x i n j=0 w ja i, j ) 2 can be written: Err(w) = (x Aw) T (x Aw) (2) = x T x 2w T A T x+w T A T Aw (3) So, differentiating wrt w: Setting (4) = 0 yields δerr(w) δw = 2A T x+2a T Aw (4) A T Aw = A T x (5) So, if the inverse of A T A exists we have: w = (A T A) 1 A T x (6) COM3250 / b
19 Least Squares Approximation (cont) How do we compute a least squares solution? Considerable work has been put into developing efficient solutions, given the wide range of applications Can compute (6) using matrix manipulation packages such as Matlab can use \, inverse, pseudo-inverse, QR decomposition operators depending on characteristics of A and A T A An extensive treatment of algorithms for least squares can be found in Lawson & Hanson. A simple algorithm which converges to the least squares solution is the Widrow-Hoff algorithm (here w = w 1,...w n and b = w 0, the bias, is explicit): Given training set S = {a 1,...,a n }, learning rate η R + w 0; b 0 repeat for i = 1 to n (w,b) (w,b) η( w a i +b x i )(w,1) end for until convergence criterion satisfied return (w,b) (Cristiani & Shawe-Taylor, p. 23) COM3250 /
20 Linear Classification Linear regression (or any regression technique) can be used for classification in domains with numeric attributes Perform regression for each class setting output to 1 for training instances in the class and 0 to the others Result is a linear expression for each class For test instances calculate value of each linear expression and assign class the value for whose linear expression is largest This approach is called multiresponse linear regression COM3250 /
21 Linear Classification Linear regression (or any regression technique) can be used for classification in domains with numeric attributes Perform regression for each class setting output to 1 for training instances in the class and 0 to the others Result is a linear expression for each class For test instances calculate value of each linear expression and assign class the value for whose linear expression is largest This approach is called multiresponse linear regression Another technique for multiclass classification (i.e. more than two classes) is pairwise classification: Build a classifier for every pair of classes uses only training instances from those classes Output for test instance is class which receives most votes (across classifiers) If there are k classes this method results in k(k 1)/2 classifiers, but is not overly computationally expensive, since for each classifier training takes place on just the subset of instances in two classes Note this technique can be used with any classification algorithm COM3250 / a
22 Other Linear Classifiers: The Perceptron If the training instances are linearly separable into two classes, i.e. there is a hyperplane that separates them, then a simple algorithm that separates them is the perceptron learning rule The perceptron ancestor of the neural net can be pictured as a two layer network (graph) of neurons (nodes) the input layer: one node per attribute plus an extra node (the bias) = 1 the output layer: a single node each input node is linked to the output node via a weighted connection when an instance is presented to the input layer its attribute values activate the input layer input activations are multiplied by weights and summed if weighted sum > 0 then the output signal is 1; otherwise output is -1 Ouput Layer b (= w0) w1 w2 w3 wk 1 ("bias")... attribute attribute attribute attribute a1 a2 a3 an Input Layer COM3250 /
23 The Perceptron Learning Rule Basic idea: Incorrectly classified +ve examples lead to small increase in weights Incorrectly classified -ve examples lead to small decrease in weights Given a linearly separable training set S = {a 1,...,a n }, learning rate η R + w 0 0; b 0 0; k 0 R max 1 i n a i repeat for i = 1 to n end for if x i ( w k a i +b k ) 0 then end if w k+1 w k + ηx i a i b k+1 b k + ηx i R 2 k k+ 1 until no mistakes made within the for loop return (w k,b k ) where k is the number of mistakes More on this in next lecture... (Cristiani & Shawe-Taylor, p. 12) (incorrect classification) COM3250 /
24 Support Vector Machines Limitation of simple linear classifiers above (perceptron, linear regression with lines) is that they can only represent linear class boundaries. Makes them too simple for many applications Support vector machines (SVMs) use linear models to implement non-linear class boundaries by transforming input using a non-linear mapping instance space mapped into new space non-linear class boundary in original space maps onto linear boundary in new space norikazu/research.en.html COM3250 /
25 Support Vector Machines (cont) For example, suppose we replace the original set of n attributes by a set including all products of k factors that can be constructed from these attributes i.e. we move from a linear expression in n variables to a multivariate polynomial of degree k So, if we started with a linear model with two attributes and two weights x = w 1 a 1 + w 2 a 2 we would move to one with four synthetic attributes and four weights x = w 1 a w 2a 2 1 a 2 + w 3 a 1 a w 4a 3 2 To generate a linear model in the space spanned by these products of factors each training instance is mapped into new space by computing all possible 3 factor products of its two attribute values the learning algorithm is applied to the transformed instances To classify a test instance it is transformed prior to classification Problems: computational complexity: 5 factors of 10 attributes > 2000 coefficients overfitting: if # of coefficients large relative to # training instances model will overfit training data too nonlinear COM3250 /
26 Support Vector Machines (cont) SVMs solve both problems using a linear model called the maximum margin hyperplane Maximum margin hyperplane gives greatest separation between classes maximum margin hyperplane the perpendicular bisector of the shortest line connecting the convex hulls tighest enclosing convex polygon of the sets of points in each class Instances closest to the maximum margin hyperplane are called support vectors (at least one per class) Support vectors uniquely define maximum margin hyperplane given them we can construct the maximum margin hyperplane and all other instances can be discarded support vectors COM3250 /
27 Support Vector Machines (cont) SVMs are unlikely to overfit as overfitting is caused by too much flexibility in the decision boundary maximum margin hyperplane is relatively stable only changes if training instances that are support vectors are added or removed usually few support vectors (can be thought of as global representatives of training set) which give little flexibility SVMs are not computationally infeasible To classify a test instance the vector dot product of the test instance with all support vectors must be calculated Dot product involves one multiplication and one addition per attribute expensive in new high-dimensional space resulting from nonlinear mapping However, can compute dot product on original attribute set before mapping e.g. if using high dimensional feature space based on products of k factors take dot product of vectors in low dimensional space and raise to power k function doing this called a polynomial kernel choosing k: usually start with k = 1 (linear model) and increase until no reduction in estimated error COM3250 /
28 Support Vector Machines (cont) Other kernel functions can be used to implement different nonlinear mappings radial basis function (RBF) kernel RBF neural network sigmoid kernel multilayer perceptron with one hidden layer Choice of kernel depends on application may not be much difference in practice SVMs can be generalised to cases where the training data is not linearly separable SVMs are slow during training compared to other algorithms, such as decision trees However SVMs can produce very accurate classifiers Best results on text classification tasks are now typically obtained using SVMs COM3250 /
29 Summary In learning from examples where attributes are numeric, natural to start with linear models COM3250 /
30 Summary In learning from examples where attributes are numeric, natural to start with linear models When target and non-target attributes are numeric, i.e. the task is numeric prediction, the problem is referred to as linear regression the goal is to fit a line to the training instances and then predict a value for a test instance using the induced linear equation a common computational technique is least squares approximation which selects the line based on minimizing the sum of the squared errors COM3250 / a
31 Summary In learning from examples where attributes are numeric, natural to start with linear models When target and non-target attributes are numeric, i.e. the task is numeric prediction, the problem is referred to as linear regression the goal is to fit a line to the training instances and then predict a value for a test instance using the induced linear equation a common computational technique is least squares approximation which selects the line based on minimizing the sum of the squared errors Linear models can be used for classification by finding a line(s) that separate the classes linear regression can be used to perform linear classification multiresponse linear regression binary linear classification can also be performed using the perceptron COM3250 / b
32 Summary In learning from examples where attributes are numeric, natural to start with linear models When target and non-target attributes are numeric, i.e. the task is numeric prediction, the problem is referred to as linear regression the goal is to fit a line to the training instances and then predict a value for a test instance using the induced linear equation a common computational technique is least squares approximation which selects the line based on minimizing the sum of the squared errors Linear models can be used for classification by finding a line(s) that separate the classes linear regression can be used to perform linear classification multiresponse linear regression binary linear classification can also be performed using the perceptron In cases where classes are not linearly separable in the initial attribute space, a linear model may be found in a higher dimensional space arrived at by nonlinear mapping from the initial space support vector machines are computationally efficient algorithms for mapping instances into higher dimensional feature spaces and finding hyperplanes in these spaces to perform classification COM3250 / c
Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationData Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.
Data Mining Lesson 9 Support Vector Machines MSc in Computer Science University of New York Tirana Assoc. Prof. Dr. Marenglen Biba Data Mining: Content Introduction to data mining and machine learning
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationCS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationSupport Vector Machines
Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization
More informationCSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18
CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationSupport vector machines. Dominik Wisniewski Wojciech Wawrzyniak
Support vector machines Dominik Wisniewski Wojciech Wawrzyniak Outline 1. A brief history of SVM. 2. What is SVM and how does it work? 3. How would you classify this data? 4. Are all the separating lines
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationLinear methods for supervised learning
Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationCS 8520: Artificial Intelligence
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationKernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi
Kernel Methods Chapter 9 of A Course in Machine Learning by Hal Daumé III http://ciml.info Conversion to beamer by Fabrizio Riguzzi Kernel Methods 1 / 66 Kernel Methods Linear models are great because
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationCSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationLearning from Data Linear Parameter Models
Learning from Data Linear Parameter Models Copyright David Barber 200-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2 chirps per sec 26 24 22 20
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationIntroduction to ANSYS DesignXplorer
Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters
More informationSupport Vector Machines (a brief introduction) Adrian Bevan.
Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin
More informationAM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,
More informationKernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017
Kernel SVM Course: MAHDI YAZDIAN-DEHKORDI FALL 2017 1 Outlines SVM Lagrangian Primal & Dual Problem Non-linear SVM & Kernel SVM SVM Advantages Toolboxes 2 SVM Lagrangian Primal/DualProblem 3 SVM LagrangianPrimalProblem
More informationSVM Classification in Multiclass Letter Recognition System
Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 9 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationLecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem
Computational Learning Theory Fall Semester, 2012/13 Lecture 10: SVM Lecturer: Yishay Mansour Scribe: Gitit Kehat, Yogev Vaknin and Ezra Levin 1 10.1 Lecture Overview In this lecture we present in detail
More informationLECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from
LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min
More informationSupport Vector Machines
Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose
More informationNeural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick
Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationSUPPORT VECTOR MACHINE ACTIVE LEARNING
SUPPORT VECTOR MACHINE ACTIVE LEARNING CS 101.2 Caltech, 03 Feb 2009 Paper by S. Tong, D. Koller Presented by Krzysztof Chalupka OUTLINE SVM intro Geometric interpretation Primal and dual form Convexity,
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationLARGE MARGIN CLASSIFIERS
Admin Assignment 5 LARGE MARGIN CLASSIFIERS David Kauchak CS 451 Fall 2013 Midterm Download from course web page when you re ready to take it 2 hours to complete Must hand-in (or e-mail in) by 11:59pm
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationLarge Margin Classification Using the Perceptron Algorithm
Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationLecture 7: Support Vector Machine
Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationPerceptron Learning Algorithm
Perceptron Learning Algorithm An iterative learning algorithm that can find linear threshold function to partition linearly separable set of points. Assume zero threshold value. 1) w(0) = arbitrary, j=1,
More informationLearning and Generalization in Single Layer Perceptrons
Learning and Generalization in Single Layer Perceptrons Neural Computation : Lecture 4 John A. Bullinaria, 2015 1. What Can Perceptrons do? 2. Decision Boundaries The Two Dimensional Case 3. Decision Boundaries
More informationParallel & Scalable Machine Learning Introduction to Machine Learning Algorithms
Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationPractice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More information