Combine the PA Algorithm with a Proximal Classifier
|
|
- Angela Wilson
- 5 years ago
- Views:
Transcription
1 Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU Sept. 29, 2011
2 Outline 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
3 Outline 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
4 Supervised Learning Problems Assumption: training instances are drawn from an unknown but fixed probability distribution P(x, y) independently. Our learning task: Given a training set T = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m )} We would like to construct a rule, f (x) that can correctly predict the label y given unseen x If f (x) y then we get some loss or penalty For example: l(f (x), y) = 1 2 f (x) y Learning examples: classification, regression and sequence labeling If y is drawn from a finite set it will be a classification problem. The simplest case: y { 1, +1} called binary classification problem If y is a real number it becomes a regression problem More general case, y can be a vector and each element is drawn from a finite set. This is the sequence labeling problem
5 Binary Classification Problem Given a training set T = {(x j, y j ) x j R d, y j { 1, 1}, j = 1,..., m} Main Goal: x j P y j = 1 & x j N y j = 1 Predict the unseen class label for new data Find a function f : R d R by learning from data f (x) 0 x P and f (x) < 0 x N The simplest function is linear: f (x) = w, x + b
6 Expected Risk vs. Empirical Risk Assumption: training instances are drawn from an unknown but fixed probability distribution P(x, y) independently. Ideally, we would like to have the optimal rule f that minimizes the Expected Risk: E(f ) = l(f (x), y)dp(x, y) among all functions Unfortunately, we can not do it. P(x, y) is unknown and we have to restrict ourselves in a certain hypothesis space, F How about compute fm F that minimizes the Empirical Risk: E m (f ) = 1 m l(f (x j ), y j ) j Only minimizing the empirical risk will be in danger of overfitting
7 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
8 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
9 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
10 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
11 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
12 Approximation Optimization Approach Most of learning algorithms can be formulated as an optimization problem The objective function consists of two parts: E m (f )+ controls on VC-error bound Controlling the VC-error bound will avoid the overfitting risk It can be achieved via adding the regularization term into the objective function Note that: We have made lots of approximations when formulate a learning task as an optimization problem Why bother to find the optimal solution for the problem? One could stop the optimization iteration before its convergence
13 Gradient Descent: Batch Learning For an optimization problem min f (w) = min r(w) + 1 m m l(w; (x j, y j )) GD tries to find a direction and the learning rate decreasing the objective function value. j=1 w t+1 = w t η f (w t ) where η is the learning rate, f (w t ) is the steepest direction f (w t ) = r(w t ) + 1 m m l(w t ; (x i, y i )) When m is large, computing m l(w t ; (x j, y j )) may cost much time. j=1 i=1
14 Stochastic Gradient Descent: Online Learning In GD, we compute the gradient using the entire training set. In stochastic gradient descent(sgd), we use l(w t ; (x t, y t )) So the gradient of f (w t ) instead of 1 m m l(w t ; (x j, y j )) j=1 f (w t ) = r(w t ) + l(w t ; (x t, y t )) SGD computes the gradient using only one instance. In experiment, SGD is significantly faster than GD when m is large.
15 Large-Scale Problems Collecting and distributing large-scale datasets are no longer expensive tasks. Two definitions of large-scale problems, It consists of problems where the main computational constraint is the amount of time available, rather than the number of instances [Bottou, 2008]. Training set may not be stored in modern computer s memory [Langford, 2008]. We are in a need of learning algorithms that scale linearly with the size of datasets The performance of the algorithms should be better than processing a random subset of the data via conventional learning algorithms
16 Large-Scale Problems Collecting and distributing large-scale datasets are no longer expensive tasks. Two definitions of large-scale problems, It consists of problems where the main computational constraint is the amount of time available, rather than the number of instances [Bottou, 2008]. Training set may not be stored in modern computer s memory [Langford, 2008]. We are in a need of learning algorithms that scale linearly with the size of datasets The performance of the algorithms should be better than processing a random subset of the data via conventional learning algorithms
17 Outline Perceptron Algorithm Passive and Aggressive (PA) Algorithm 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
18 Online Learning Perceptron Algorithm Passive and Aggressive (PA) Algorithm Definition of online learning Given a set of new training data, Online learner can update its model without reading old data while improving its performance. In contrast, off-line learner must combine old and new data and start the learning all over again, otherwise the performance will suffer. Online is considered as a solution of large learning tasks Usually require several passes (or epochs) through the training instances Need to keep all instances unless we only run the algorithm one single pass
19 Outline Perceptron Algorithm Passive and Aggressive (PA) Algorithm 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
20 Perceptron Algorithm Passive and Aggressive (PA) Algorithm Perceptron Algorithm [Rosenblatt, 1956] An online learning algorithm and a mistake-driven procedure The current classifier is updated whenever the new arriving instance is misclassified Initiation: k = 0, R = max 1 j m xj 2 repeat for t = 1 : m if y t( w k, x t + b k ) 0 w k+1 = w k + ηy tx t b k+1 = b k + ηy tr 2 k = k + 1 end end until no mistake made within the for-loop k is number of mistakes. η > 0 is the learning rate.
21 Perceptron Algorithm Passive and Aggressive (PA) Algorithm Perceptron Algorithm [Rosenblatt, 1956] The Perceptron is considered as a SGD method. The underlying optimization problem of the algorithm min (w,b) R d+1 m ( y j ( w, x j + b)) + j=1 In the linearly separable case, the Perceptron alg. will be terminated in finite steps no matter what learning rate is chosen In the nonseparable case, how to decide the appropriate learning rate that will make the least mistake is very difficult Learning rate can be a nonnegative number. More general case, it can be a positive definite matrix
22 Outline Perceptron Algorithm Passive and Aggressive (PA) Algorithm 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
23 Perceptron Algorithm Passive and Aggressive (PA) Algorithm Key Idea of PA Algorithm [Crammer, 2006] An online learning algorithm and a mistake-driven procedure The PA algorithm suggests that the new classifier should not only classify the new arriving data correctly but also as close to the current classifier as possible It can be formulated the problem as follows: w t+1 1 arg min w R d 2 w wt 2 2 (1) s.t. l(w; (x t, y t )) = 0 where l(w; (x t, y t )) is a hinge loss function
24 Simplify the PA Algorithm Perceptron Algorithm Passive and Aggressive (PA) Algorithm We can simplify Eq.(1) as follows: w t+1 1 arg min w R d 2 w 2 2 w, wt (2) s.t. l(w; (x t, y t )) = 0 In the Eq.(2), the PA algorithm implicitly minimizes the regularization term, 1 2 w 2 2 Minimizing w, w t is also expressed that the new updated classifier must be similar to the current classifier
25 Notes on PA Algorithm Perceptron Algorithm Passive and Aggressive (PA) Algorithm If (x t, y t ) is correctly classified then w t is the solution of the optimization problem It is called the Passive phase If (x t, y t ) is misclassified then w t should be updated to correctly classify (x t, y t ) It is called the Aggressive phase We may treat the classifier learned by the PA algorithm as a hard-margin classifier However, if an instance comes with noise, the current classifier may be led to the wrong direction seriously
26 PA-1 and PA-2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm Based on the concept of the soft-margin classifier, the non-negative slack variable ξ was introduced into the PA algorithm in two different ways Linear scale with ξ (called PA-1) w t+1 1 arg min w R d 2 w wt Cξ s.t. l(w; (x t, y t )) ξ and ξ 0 Square scale with ξ (called PA-2) w t+1 1 arg min w R d 2 w wt Cξ 2 s.t. l(w; (x t, y t )) ξ
27 Perceptron Algorithm Passive and Aggressive (PA) Algorithm PA Algorithm Closed Form Updating It seems that we have to solve an optimization problem for each instance. Fortunately, PA, PA-1 and PA-2 come with the closed form of updating schemes They share the same closed form w t+1 = w t + τ t y t x t where τ t > 0 is defined as l(w t ; (x t,y t )) (PA) x t 2 2 τ t = min{c, l(wt ; (x t,y t )) } (PA-1) x t 2 2 (PA-2) l(w t ; (x t,y t )) x t C It replaced the fixed learning rate such as Perceptron algorithm with the dynamic learning rate depending on the current instance
28 Outline 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
29 Motivation for a Proximal Classifier In online learning framework, we do not keep the previous instances in the memory A lack of memory for previous instances might hurt the learning efficiency The previous instances which have been classified correctly may be misclassified again We keep some simple statistical information (mean and variance) to summarize the previous instances Have to take the cost into account, fixed and small size memory and linear CPU time
30 Illustrations of Simple Proximal Models If the dataset is easy to separate, the means difference suggests a good proximal classifier (left figure) For the more complicated dataset, the performance of the LDA direction is better (right figure) The proximal classifier would provide a good suggestion for our classifier updating
31 Proximal Classifier: Quasi-LDA However, the cost of computing the LDA is expensive when the input space is in the high dimensional space We proposed the quasi-lda direction as our proximal classifier (w t p) i = (mt + ) i (m t ) i (s t + ) i +(s t ) i, i = 1, 2,..., d where w t p m t +/ s t +/ : the proximal classifier on round t : mean vector of Pos./Neg. class on round t : variance vector of Pos./Neg. class on round t It is very cheap both in CPU time and in memory usage
32 The Details of Proximal Classifier Small Trick: Var(X) = E((X E(X)) 2 ) = E(X 2 ) (E(X)) 2 The details about the statistical information we maintained P t = {j y j positive, j = 1, 2,..., t} N t = {j y j negative, j = 1, 2,..., t} (m t 1 +) i = (x j ) P t i i = 1, 2,..., d j P t (m t 1 ) i = (x j ) N t i i = 1, 2,..., d j N t (s t 1 +) i = (x j ) 2 P t 1 i P t j P t P t 1 (mt +) 2 i i = 1, 2,..., d (s t 1 ) i = (x j ) 2 N t 1 i N t j N t N t 1 (mt ) 2 i i = 1, 2,..., d All we need are (m t +) i, (m t ) i, (x j ) 2 i and (x j ) 2 i for each attribute in j P t j N t online fashion
33 Key Idea of Our Proposed Method How to combine the PA algorithm with the proximal classifier? Remember, in the PA algorithm, w, w t is expressed as the similarity between w and w t Therefore, we can formulate our idea by adding γ w, w t p into the Eq.(2), called PAm: w t+1 1 arg min w R d 2 w 2 2 w, w t γ w, w t p s.t. l(w; (x t, y t )) = 0 Minimizing w, w t γ w, w t p is the way to keep the new classifier w to be close to w t and w t p γ w, w t p also can be introduced into the PA-1 and PA-2, called PAm-1 and PAm-2, respectively
34 Closed Form Updating Rule of PAm Algorithm We derive the closed form updating rules for PAm, PAm-1 and PAm-2 as follows, they also shared the same closed form where α t > 0 is defined as α t = w t+1 = w t + γw t p + α t y t x t l(w t ;(x t,y t )) γy t w t p,xt x t 2 2 (PAm) min{c, l(wt ;(x t,y t )) γy t w t p,xt } (PAm-1) x t 2 2 l(w t ;(x t,y t )) γy t w t p,xt x t C (PAm-2)
35 PAm Algorithm (Single Pass) Given a training set T = {(x t, y t ) x t R d, y t { 1, 1}, t = 1, 2,..., m} for t = 1,2,...,m Update the statistical information if 1 y t w t, x t > 0 Obtain the proximal classifier w t p Compute the learning rate α t w t+1 = w t + γw t p + α t y t x t end end
36 PAm Algorithm After a single pass, we will not update the statistical information. Updating scheme is very similar to PA algorithm The input data order is less affected on the final classifier after single pass. In our experiments, we show that classifiers between two consecutive passes are very similar when PAm algorithm is applied
37 Extension of PAm to Sliding Window of Previous Classifiers In each updating of the PA algorithm, it only considers one previous classifier. We suggest that taking more than one previous classifiers into account will be more robust. We maintain a queue which saves previous k classifiers and k is small, we can formulate as follow, w t+1 1 arg min w R d 2 w 2 2 w, w t γ w, w t p + Cξ s.t. l(w; (x t, y t )) ξ and ξ 0 where w t is the average of k previous classifiers on round t We call it as PAmQ-1.
38 Global and Local Information We can also derive the formulations and closed form updating rules for PAmQ and PAmQ-2 All we need to do is replacing w t with w t PAm algorithm takes global and local information into account in online learning framework The statistical information we kept can be treated as the global information The previous classifier w t or w t can be treated as the local information
39 Outline 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
40 Numerical Experiments Setting In the first part of our experiments, we would like to investigate the effect of the input order of instances We run a single epoch for PA algorithm and PAm algorithm 10 times with different input order at each time The second part of our experiments, we would like to observe the convergence behavior of our proposed methods How good it could be if we only run a single pass of the PAm family algorithms?
41 Benchmark Datasets Used in Numerical Experiments The sources of benchmark datasets 3 medium-size datasets from LIBSVM library site 3 large-scale datasets from Pascal Large Scale Learning Challenge in 2008 Dataset Training Testing Features Svmguide Ijcnn Cod-rna Alpha Delta Gamma
42 Comparison in the Training (Single Pass) Training Error Rate and Standard Deviation (%) Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 LDA Svmguide1 7.5(3.2) 5.0(1.0) 6.9(3.0) 4.9(0.3) 6.4(1.1) 4.9(0.3) 5.0(0.4) 12.2 Ijcnn 10.1(2.2) 9.5(1.0) 8.9(0.4) 8.5(0.1) 9.0(0.3) 9.0(0.1) 8.7(0.2) 13.9 Cod-rna 10.5(10.5) 7.7(1.7) 6.8(0.5) 6.3(0.1) 7.0(0.5) 6.9(0.6) 6.2(0.1) 6.4 Alpha 31.7(3.9) 26.9(1.7) 24.3(1.4) 26.3(0.7) 24.2(1.8) 26.1(0.6) 26.4(0.3) 21.7 Delta 35.0(0.7) 25.2(0.3) 30.7(0.2) 21.5(0.01) 33.5(0.5) 24.4(0.2) 21.5(0.02) 21.5 Gamma 32.2(1.0) 23.8(0.4) 25.8(0.7) 19.9(0.01) 30.0(1.0) 22.3(0.2) 19.9(0.01) 19.9 Min/Max Training Error Rate (%) Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 LDA Svmguide1 5.0/ / / / / / / Ijcnn 8.8/ / / / / / / Cod-rna 6.5/ / / / / / / Alpha 27.6/ / / / / / / Delta 33.7/ / / / / / / Gamma 30.8/ / / / / / /
43 Comparison in the Testing Test Error Rate and Standard Deviation (%) Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 LDA Svmguide1 7.4(3.2) 5.1(0.6) 7.2(4.5) 4.4(0.2) 6.6(2.0) 4.9(0.6) 4.8(0.3) 11.8 Ijcnn 9.7(1.8) 11.0(2.9) 8.8(0.3) 8.7(0.2) 8.7(0.4) 9.3(0.4) 8.9(0.2) 18.6 Cod-rna 9.2(6.9) 6.3(1.7) 5.4(0.7) 4.7(0.2) 5.8(0.7) 5.3(0.67) 4.6(0.1) 4.7 Alpha 31.7(4.9) 29.1(3.0) 23.8(1.0) 23.7(1.3) 23.6(0.7) 24.0(0.9) 23.7(0.4) 21.8 Delta 35.0(0.8) 25.3(0.3) 29.7(0.7) 21.7(0.01) 33.0(0.5) 24.5(0.2) 21.7(0.01) 21.7 Gamma 32.2(1.0) 23.9(0.4) 25.8(0.6) 20.0(0.01) 30.0(1.1) 22.3(0.2) 20.0(0.01) 20.0 Min/Max Test Error Rate (%) Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 LDA Svmguide1 4.7/ / / / / / / Ijcnn 8.4/ / / / / / / Cod-rna 4.8/ / / / / / / Alpha 26.8/ / / / / / / Delta 33.7/ / / / / / / Gamma 30.9/ / / / / / /
44 Comparison of Average Running Time of 10 Runs Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 LDA Svmguide Ijcnn Cod-rna Alpha Delta Gamma The unit of these results is Sec. The running times of PA family and PAm family are comparable Keep simple statistical information may not produce extra CPU time burden
45 Comparison of Average Number of Updating Dataset PA PAm PA-1 PAm-1 PA-2 PAm-2 PAmQ-1 Svmguide Ijcnn Cod-rna Alpha Delta Gamma Our method has a less number of updating than conventional PA algorithm Less number of updating indicates less aggressive phases in an epoch
46 Observation on Convergence Behavior Run 10 epochs with the same input order for each methods and record the classifier when an epoch is completed The proximal classifier will not be changed once the first epoch is finished We compute the cosine similarity between two consecutive epoch classifiers
47 Test Error Rate of each Epoch It becomes a constant after 2 to 3 epochs
48 Number of Updating in these 10 Epochs It becomes a constant after 2 to 4 epochs
49 Outline 1 Machine Learning vs. Optimization 2 Perceptron Algorithm Passive and Aggressive (PA) Algorithm 3 4 5
50 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
51 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
52 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
53 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
54 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
55 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
56 Conclusion We used the online learning framework to solve large-scale binary classification problem Similar to PA algorithm, we also derived a set of update schemes in closed form for PAm, PAm-1 and PAm-2 Compared to the conventional PA algorithm, utilizing the quasi-lda direction as a proximal classifier will have Less sensitive to the input order Less number of updating made in a single pass Higher similarity between two consecutive epochs resulting classifiers We can have a near-optimal classifier in a single pass
57 Future Work Try to explore the statistical information in a single pass to build a better proximal classifier In PAmQ, we can use weighted average instead of simple average and the weights can be determined by the performance of each classifier on a certain validation set We might collect multiple misclassified instances, then use these instances to approximate the Hessian inverse into the updating rule Extend the linear classifier to nonlinear classifier via reduced kernel trick Find an efficient updating scheme for small batch input instances. Similar to SMO vs. Decomposition method for SVMs
58 Reference Léon Bottou and Olivier Bousquet. The Tradeoffs of Large Scale Learning. Advances in Neural Information Processing Systems, 20: , 2008 John Langford. Concerns about the Large Scale Learning Challenge, 2008 K. Crammer et al. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7: , K. Hiraoka et al. Convergence analysis of online linear discriminant analysis. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 3: , Como, Italy, L. I. Kuncheva and C. O. Plumpton. Adaptive learning rate for online linear discriminant classifiers. Structural, Syntactic, and Statistical Pattern Recognition, 5342: , 2008.
59
60
All lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationBehavioral Data Mining. Lecture 10 Kernel methods and SVMs
Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes
More informationHow Learning Differs from Optimization. Sargur N. Srihari
How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationOnline Learning for URL classification
Online Learning for URL classification Onkar Dalal 496 Lomita Mall, Palo Alto, CA 94306 USA Devdatta Gangal Yahoo!, 701 First Ave, Sunnyvale, CA 94089 USA onkar@stanford.edu devdatta@gmail.com Abstract
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationSVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1
SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationLarge Margin Classification Using the Perceptron Algorithm
Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationLinear Discriminant Functions: Gradient Descent and Perceptron Convergence
Linear Discriminant Functions: Gradient Descent and Perceptron Convergence The Two-Category Linearly Separable Case (5.4) Minimizing the Perceptron Criterion Function (5.5) Role of Linear Discriminant
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationCSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18
CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationBagging and Boosting Algorithms for Support Vector Machine Classifiers
Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationOptimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund
Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationLogistic Regression
Logistic Regression ddebarr@uw.edu 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationSVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines
SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationLinear Classification and Perceptron
Linear Classification and Perceptron INFO-4604, Applied Machine Learning University of Colorado Boulder September 7, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationCS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationOne-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationSupervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning
Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from
More informationPerceptron Learning Algorithm (PLA)
Review: Lecture 4 Perceptron Learning Algorithm (PLA) Learning algorithm for linear threshold functions (LTF) (iterative) Energy function: PLA implements a stochastic gradient algorithm Novikoff s theorem
More informationPart 5: Structured Support Vector Machines
Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data
More informationPredict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry
Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More informationPractice Questions for Midterm
Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationComparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationSoftware Documentation of the Potential Support Vector Machine
Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany
More informationRobust 1-Norm Soft Margin Smooth Support Vector Machine
Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University
More informationIn stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,
Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationFace Recognition using SURF Features and SVM Classifier
International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 8, Number 1 (016) pp. 1-8 Research India Publications http://www.ripublication.com Face Recognition using SURF Features
More informationSupervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.
Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning
More informationChakra Chennubhotla and David Koes
MSCBIO/CMPBIO 2065: Support Vector Machines Chakra Chennubhotla and David Koes Nov 15, 2017 Sources mmds.org chapter 12 Bishop s book Ch. 7 Notes from Toronto, Mark Schmidt (UBC) 2 SVM SVMs and Logistic
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More information3 Perceptron Learning; Maximum Margin Classifiers
Perceptron Learning; Maximum Margin lassifiers Perceptron Learning; Maximum Margin lassifiers Perceptron Algorithm (cont d) Recall: linear decision fn f (x) = w x (for simplicity, no ) decision boundary
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More information