Advanced Video Content Analysis and Video Imaging (5LSH0), Module 09. Sequential data / Introduction (1) Sequential data / Introduction (2)

Size: px
Start display at page:

Download "Advanced Video Content Analysis and Video Imaging (5LSH0), Module 09. Sequential data / Introduction (1) Sequential data / Introduction (2)"

Transcription

1 Advanced Video Content Analysis and Video Imaging (5LSH0), Module 09 Semantic-level content analysis and classification II Sveta Zinger & Fons van der Sommen Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl ) 1 Sequential data / Introduction (1) When do we encounter sequential data? Measurements of time series Rainfall measurements on successive days Daily values of a currency exchange rate Acoustic features used for speech recognition Sequence of DNA elements Sequence of characters in a language 2 Sequential data / Introduction (2) 3 Sequential data / Introduction (3) 4 Example of sequential data: spectrogram of the spoken words Bayes theorem Sequential distributions Stationary data evolves in time, but the distribution from which it is generated remains the same Nonstationary not treated here Distribution itself is evolving with time Sequential data / Introduction (4) 5 Markov models (1) 6 Prediction of the next value in time series Recent observations are likely to be more informative than more historical observations It is impractical to consider a general dependence of future observations on all previous observations Number of observations increases => complexity of the model grows Markov models assume that future predictions are independent of all but the most recent observations If we ignore sequential aspects in data Treat the observations as i.i.d. (independent and identically distributed) Corresponds to a graph without links Fail to exploit sequential patterns in the data correlations between observations that are close in the sequence 1

2 Markov models (2) 7 Markov models (3) 8 Example: sequential patterns in data We observe a binary variable denoting whether it rained on a particular day We want to predict whether it will rain on the next day If we treat the data as i.i.d. => we have only relative frequency of rainy days In practice, weather exhibits trends that may last for several days => knowing that it rains today helps predicting rain for tomorrow The product rule applied to the joint distribution of a sequence of observations p N ( x1,..., xn ) = p( xn x1,..., xn 1) n= 1 If each of the conditional distributions on the right-hand side is independent of all previous observations except the most recent, we obtain the first-order Markov chain Markov models (4) { x n } p A first-order markov chain of observations ( ) x n x n 1 of the previous observation of a particular observation x n 1 x n in which the distribution is conditioned on the value 9 Markov models (5) First-order Markov chain defines joint distribution for a sequence of N observations p N ( x1,..., xn ) = p( x1 ) p( xn xn 1) n= 2 given all observations, conditional distribution of one of them p( xn x1,..., xn 1) = p( xn xn 1) distribution of predictions depends only on the value of the immediately preceding observation and is independent of all earlier observations 10 Markov models (6) Homogeneous Markov chain assumes stationary time series => conditional distributions constrained to be equal If the conditional distributions depend on adjustable parameters, then all of the conditional distributions in the chain share the same values of those parameters 11 Markov models (7) Higher-order Markov chains trends in the data over several successive observations will provide important information in predicting the next value First-order Markov chain is still very restrictive => move to higher-order Markov chains M th order Markov chains increase flexibility, but also increase the number of parameters in the model => impractical for large values of M 12 2

3 Markov models (8) Second-order Markov chain Joint distribution is given by p N ( x1,..., xn ) = p( x1 ) p( x2 x1 ) p( xn xn 1, xn 2 ) n= 3 Conditional distribution of a particular observation depends on the values of the two previous observations 13 Markov models (9) How to build a model not limited by the Markov assumption and with a limited number of parameters? Latent variables permit a rich class of models to be constructed from simple components For each observation x n introduce a corresponding latent variable z n and now assume that it is the latent variables that form a Markov chain 14 Markov models (10) Representation of sequential data using a Markov chain of latent variables, with each observation conditioned on the state of the corresponding latent variable: foundation for HMM (Hidden Markov Model) and for linear dynamical systems 15 Markov models (11) Markov chain with latent variables there is always a path connecting any two observed variables via the latent variables This path is never blocked Predictions depend on all previous observations Observed variables do not satisfy the Markov property of any order 16 Hidden Markov Models (1) 17 Hidden Markov Models (2) 18 Hidden Markov Model (HMM) Can be viewed as a Markov chain with discrete latent variables Examine a single slice of HMM It corresponds to a mixture distribution with component densities given by p( x z) The choice of mixture component depends on the choice made for the previous observation Applications of HMM Speech recognition Natural language modeling On-line handwriting recognition Analysis of biological sequences (DNA) 3

4 Hidden Markov Models (3) Definition of transition probabilities we allow the probability distribution of z n to depend on the state of the previous latent variable z n-1 through a conditional distribution p( z n z n 1 ) Latent variables are K-dimensional binary => conditional distribution corresponds to a table of transition probabilities, with elements of the table given by A p z = z 1 ( ) jk nk 1 n 1, j = 19 Hidden Markov Models (4) Transition matrix can be illustrated diagrammatically by drawing the states as nodes Transition diagram shows a model whose latent variables have three possible states corresponding to the three boxes. The black lines denote the elements of the transition matrix 20 Hidden Markov Models (5) If we unfold the state transition diagram, we obtain a lattice, or trellis, representation of the latent states Each column of this diagram corresponds to one of the latent variables z n 21 Hidden Markov Models (6) Joint probability distribution over both latent and observed variables N N p( X, ZΘ) = p( z1 π ) p( zn zn 1, A) p( xm zm, φ) n= 2 m= 1 where p(z 1 ) marginal distribution of the initial latent variable, π vector of probabilities for initial latent variable ( ) p x m z m,φ conditional distribution of the observed variables emission probabilities governed by distribution parameters φ z 1 22 Hidden Markov Models (7) 23 Hidden Markov Models (8) 24 Variants of the standard HMM model obtained for instance by imposing constraints on the form of the transition matrix Left-to-right HMM model of particular practical importance sets the elements A jk of the transition matrix to zero if k < j Example of the state transition diagram for a three-state left-to-right hidden Markov model Left-to-right HMM is used for speech recognition, on-line character recognition 4

5 Hidden Markov Models (9) Lattice diagram for a three-state left-to-right hidden Markov model in which the state index k is allowed to increase by at most 1 at each transition 25 Hidden Markov Models (10) Example of left-to-right HMM: handwritten digits On-line data: digit is represented as a trajectory of a pen as a function of time Train HMM on 45 examples of the digit 2 16 states: line segments having one of 16 possible angles Model parameters are are optimized using 25 iterations of EM 26 Hidden Markov Models (11) Top row: examples of on-line handwritten digits Bottom row: synthetic digits sampled generatively from a left-to-right HMM that has been trained on 45 handwritten digits 27 Summary and conclusions Markov chains assumes dependency inside a fixed neighborhood with latent variables => widely used hidden Markov fields Hidden Markov models Powerful method to model sequential data Requires its parameters to be estimated Invariant to some degree to local warping (compression and stretching) of the time axis speech recognition: variations of speed of speech => warping of the time axis => HMM can accommodate such a distortion and not penalize it too heavily 28 References 29 Supervised learning 30 Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2002 Chapter 8 Chapter 13 Catogarized / labeled data Objects in a picture: chair, desk, person, Handwritten digits: 3, 6, 5 Medical diagnosis: OCT image T2 colon cancer Goal: identify the class of a new data point Statistical modeling and machine learning Disctinctive properties (features) 5

6 Supervised learning: example (1/5) 31 Supervised learning: example (2/5) 32 Separate lemons from oranges Separate lemons from oranges Color: orange Shape: sphere Ø: ± 8 cm Weigth: ±0.1 kg Color: yellow Shape: elipsoid Ø: ± 8 cm Weigth: ±0.1 kg Color Use color and shape as features Shape Supervised learning: example (3/5) 33 Supervised learning: example (4/5) 34 Separate lemons from oranges Model the given -training - data Separate lemons from oranges New data point Color Oranges Lemons Color Oranges Lemons Classifier: It s an orange! Shape Shape Supervised learning: example (5/5) 35 Supervised learning 36 Diameter What if we had chosen the wrong features? Diameter New data point??? Summary Choose distinctive features Make a model based on labeled data (a.k.a. supervised learning) Use the learned model to predict the class of new, unseen data points Weight Weight 6

7 Models for classification 37 (1) 38 Random Forests k Nearest Neighbours (k-nn) Boosting Neural Networks Convolutional Neural Networks (CNN) & Deep learning Find Stanford lectures CS231n on YouTube (19 videos)! Find a hyperplane that separates the classes with a maximum margin Margin (2) 39 (3) 40 Based on emperical risk minimization (1960s) Non-linearity added in 1992 (Boser, Guyon & Vapnik) Soft-margin SVM introduced in 1995 (Cortes & Vapnik) Has become very popular since then Easy to use, a lot of open libraries available Fast learning and very fast classification Good generalization properties How to find the optimal hyperplane? Optimal? No! (4) 41 (4) 42 How to find the optimal hyperplane? How to find the optimal hyperplane? m w x+ b = 1 b w w x+ b= 0 w x+ b= 1 Width of the margin: b + 1 b 1 2 m = = w w w Maximize margin: Support vectors Vectors for which the constraint is exactly met. These vectors support the dividing hyperplane. Removal of one of those vectors typically leads to a different optimal hyperplane. 7

8 (5) 43 (6) 44 We can rewrite the optimization problem to The data is usually not linearly separable Introduce slack variables Formulate as a Quadratic Programming problem: Put a cost C on crossing the margin, so the optimization problem becomes: Efficient methods available to solve this problem! SFvd1 Non-linear SVMs (1) A more complex extension: non-linear SVMs Basic idea: map the data to a higher-dimensional space, in which we can apply a linear SVM 45 Non-linear SVMs (2) Map the data to a higher dimension Example: Add a new dimension as a function of the data 46 Not linearly separable Non-linear SVMs (3) 47 Non-linear SVMs (4) 48 Problems with mapping phi How to find a good mapping? Number of dimensions can blow up! Computationally expensive Data becomes sparser in a higher dimensional space Solution: Kernel functions! Mapping only occurs in the dual problem as inner product Kernel functions Do not define explicitly, only define inner product, (Kernel function) Note that can be infinite dimensional, but since we only use the inner product,, it is not more computationally expensive! Can we choose any Kernel function we like? No! It needs to satisfy Mercers condition. Typically you pick your kernel from a set of commonly-used options. 8

9 Slide 45 SFvd1 If possible, add a more intuitive example, e.g. circular data and a cone to rise the surrounding ring-shaped cluster Sommen, F. van der, 8/24/2016

10 Polynomial Radial Basis Functions (RBF) Sigmoid Non-linear SVMs (5) Popular kernel functions, 1, exp 2, tanh 49 Find optimal parameters using cross-validation on training data Classification of new data Linear SVM Straightforward: we have function! " and from training we know and ". The used constraints enforced that #1 for positive samples and $ 1 for negative samples. For soft-margin SVM, these constraints can be voilated. Hence, we can use the sign of to predict the label: %& sign * + 50 Classification of new data Non-linear SVM Classification function:! " Problem: exists in some high-dimensional space Typically the mapping to this *+ space is unkown, since we use a kernel function, * +.. Solution: can be written as / -% * +, hence:. 0- % /. "0- % K, " / Representer theorem 51 Classification of new data Non-linear SVM Classify new point using Lagrange multipliers 2 only non-zero for support vectors Required at test time: only 2 and support vectors. %& sign 0- % K, / " 52 Cost parameter & generalization (1) 53 Cost parameter & generalization (2) 54 Optimal hyperplane for C=100 SVM decision for C=100 Optimal hyperplane for C=10 SVM decision for C=10 9

11 Cost parameter & generalization (3) 55 Cost parameter & generalization (4) 56 Optimal hyperplane for C=1 SVM decision for C=1 Optimal hyperplane for C=0.1 SVM decision for C=0.1 Non-linear SVM examples 57 Summary Fast and efficient method for binary classification Splits the classes based on maximizing the margin Optimal hyperplane can be computed using Quadratic Programming Cost-parameter for points crossing the margin Non-linear SVM can also handle more complex class distributions by mapping the data to another space Kernel functions: typically increase complexity 58 Random Forest (1) Build decision trees on subsets of the data Let the trees vote on the class of a new sample Orange (60%) Lemon (40%) Orange (95%) Lemon (5%) Orange (72%) Lemon (28%) Orange (35%) Lemon (65%) Orange (84%) Lemon (16%) 59 Random Forest (2) General model for machine learning Density estimation, regression, classification, Robustness through randomness A random subset is used to train each tree For training a tree, each node receives a random set of split options Probabilistic output: model uncertainty! Automatic feature selection Naturally multi-class Runs efficiently trees can run in parallel 60 10

12 61 62 A forest consists of trees Start at the root node True/false question at each split node Stop when a leaf node is reached: prediction internal (split) node A general tree structure terminal (leaf) node root node Is it a male? Does he have a beard? Does he wear glasses? Jake, Joshua, Mike or Justin Example: GUESS WHO* *Credits to Mark Janse How to build a tree? Special type of graph: collection of nodes and edges Directed Acyclic Graph (DAG) Internal (split) nodes and terminal (leaf) nodes The upper/start node is called the root Each internal node has one incoming edge and two outgoing edges All nodes (exept the root) have exactly one incoming edge Mathematical notation Data point: v,,, 5 7 5, label: % Features:, dimensionality: 8 Binary split function: 9 :,; :7 5 => 0,1 Split parameters of node A: ; > Set of possible parameters > Training points reaching node A: B B C B E Complete training set B F Left Right How to split the data? Axis aligned hyperplane 9 :,; J LG : HLJ For 2D example: H B G : 1! e.g. H 1 0 M N Split function 9 :,; depends on parameters ; G :,H,I Feature selection function G : Geometric primitive H (e.g. a line) Thresholds I Note that setting either J or J corresponds to using only one threshold. Parameter space > contains the options that we have for parameters G :, H and I. B C B AO 65 B E How to split the data? Axis aligned hyperplane 9 :,; J LG : HLJ For 2D example: H G : 1! e.g. H 1 0 M N Oriented hyperplane 9 :,; J LG : HLJ G : 1! H H 7 N, e.g. H H Quadratic surface 9 :,; J LG! : H G : LJ G : 1! H 7 N=N representing a conic 66 11

13 67 68 How to determine the best split? Maximize information gain*: R B,S T B 0 B T B B C,E Z %^ `a,`bcde^, UfbU_^,"_f^ What is the best split? T B B Information gain R B,S Shannon s entropy: T B 0 U V log U V Y Z Node training ;argmax ; > ] R B,; B NodeA B C B E *One of many options. Other popular choices: (1) Gini s diversity index, (2) Misclassification error B C 48, B E 52 B C B E T B C T B E What is the best split? What is the best split? T B Information gain BESTSPLIT! B R B,S B C 50, B E 50 T B C T B E B C B E R B,S R B,S of these two options What is the best split? The one that yields the highest information gain from a given set of candidate splits Node training ; argmaxr B,; ; > ] Split function parameters ; Limited set of parameter settings > 71 Randomized Node Optimization (RNO) 72 Bagging How to train a decision tree? Start with a random subset of all the data at the root node Find the split parameters ; from a set of randomly chosen options> >that maximize some split metric Repeat this for the outgoing nodes and stop growing a certain branch untill one of the following two criteria holds: A pre-defined tree depth D is reached (# nodes of a branch) Alternatively: untill a pre-defined total number of nodes is reached All training samples in the node are from the same class 12

14 Example: growing a tree (1) 73 Example: growing a tree (2) 74 Let s grow a tree with depth D = 2: Option 1 Option 2 Option 3 Subset of all availabe data Start at the root node Example: growing a tree (3) 75 Example: growing a tree (4) 76 Option 1 Option 2 Option 3 Option 1 Option 2 Option 3 Resulting tree Example: growing a tree (5) left right 77 Example: classify a new data point (1) New data point v: 78 right right right left right right left left left left 13

15 Random Forests Example: classify a new data point (2) 79 Example: classify a new data point (3) 80 New data point v: New data point v: right left right right right left right right left left left left Decision forest model 81 Decision forest model 82 Node test parameters ; > Features / split function / thresholds Node objective function e.g. R R B,S (Energy) function to minimize Node weak learner e.g. 9 :,; lbf^, c_m^ Split node test function Leaf predictor model e.g. U V : Point estimate / full distribution Randomness model e.g. Bagging, RNO Methods for inserting randomness Stopping criteria e.g. Max tree drepth o When to stop splitting the data Forest size p Number of trees in the forest A collection of trees: a forest! Ensemble model e.g. U V : 1 p 0U q V :! q/ How to combine the output of all the trees in the forest Decision forest model 83 Decision forest model 84 How to add randomness? (Randomness model) 1. Bagging (randomized training set) Subset of all data points per tree 2. Randomized Node Optimization (RNO) Features chosen with selection function G r Split function depending on weak learner orientation H How to add randomness? (1) Bagging s F : s F q s F : Full training set Randomly sampled subset for training tree l s F s F Forest training s F N s F u Thresholds given in I 14

16 Decision forest model How to add randomness? (2) Randomized Node Optimization (RNO) >: Full set of all possible node test parameter values > >: Set of randomly sampled parameter values to train node A v > : vw >, low randomness Randomness control parameter Node test parameter ; G :,H,I > v1, high randomness 85 Decision forest model How to compute a prediction from a trained tree? Probability distribution at leaf: U V : Point-estimate, e.g. M.A.P.: V argmaxu V : Y Generally the full distribution is perserved untill the decision moment to incoroporate uncertainty : 86 U V : V Decision forest model 87 Decision forest model 88 How to combine tree output? Tree 1 Tree 2 Tree T How to combine tree output? U V : 1! p 0U q V : U V : 1! { U q V : q/ q/ U V : U V : U! V : Averaging: Multiplication: U V :! U! q/ q V : U V :! U y q/ q V : where { is a partitioning function to ensure probabilistic normalization Overconfident and less robust to noise Training points Example: generalization Example: the effect of randomness Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section Weak learner: Axis aligned Weak learner: Oriented line Weak learner: Conic section }~ D = 13 D = 5 15

17 Example: the effect of randomness 91 Example: the effect of randomness 92 Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section D = 13 D = 13 }~ }~ D = 5 D = 5 Random Forests Classification example (1) 93 Random Forests Classification example (2) 94 2 classes in feature space Random forest decision 4 classes in feature space Random forest decision N = 100 trees, max number of nodes = 5, # candidate splits per node = 3 N = 100 trees, max number of nodes = 4, # candidate splits per node = 3 Random Forests Classification example (3) 95 Example: handwritten digit classification 96 4 classes in feature space Random forest decision N = 100 trees, max number of nodes = 10, # candidate splits per node = 8 The MNIST Database of Handwritten Digit Images for Machine Learning Research, DOI: /MSP

18 97 98 Example: handwritten digit classification Example: handwritten digit classification Task: classify handwritten digits 1, 2, 3, 4, samples per digit: 250 for training, 250 for testing HOG* features as data points : 7 5, 8144 Forest parameters: Forest size p 300 trees Axis aligned weak learner H Randomness parameter v 5, with > 10 Selection functin G : randomly samples 8 dimensions from : Precision = 0.96 / Recall = 0.97 Prediction confidence *Histogram of Oriented Gradients, Dalal & Triggs, CVPR 2005 Forest predictions for class 1, sorted by confidence. Inverted digits are wrongly classified Example: handwritten digit classification Example: handwritten digit classification Prediction confidence Prediction confidence Precision = 0.99 / Recall = 0.92 Forest predictions for class 2, sorted by confidence. Inverted digits are wrongly classified. Precision = 0.96 / Recall = 0.97 Forest predictions for class 3, sorted by confidence. Inverted digits are wrongly classified Example: handwritten digit classification Example: handwritten digit classification Prediction confidence Prediction confidence Precision = 0.97 / Recall = 1.00 Forest predictions for class 4, sorted by confidence. Inverted digits are wrongly classified. Precision = 0.95 / Recall = 0.97 Forest predictions for class 5, sorted by confidence. Inverted digits are wrongly classified. 17

19 Recommended literature Decision Forests for Computer Vision and Medical Image Analysis, A. Criminisi, 2013 Ch.3: Introduction Ch.4: Classification Forests C++ library: Sherwood Breiman L., "Random forests, Mach. Learn. 45(1), doi: /a: doi: / Conclusions Random Forests offer an attractive method classification Inherently multi-class, probablistic output, efficient implementations available.. A forest is a collection of decision trees Each tree l is trained with a different subset B F q of the training data (Bagging) A tree is a collection of nodes and edges Each internal node splits the incoming data using node split function 9 :,; ; encompasses selection function G :, geometric primitive H and thresholds I Each node A receives a random subset > of the parameter space > for training (RNO) Randomness increases robustness Randomness control parameter } determines the ammount of randomness Maximum randomness when }, minumum randomness when } ƒ Tree depth o controls the forest confidence, hence a high o can lead to overfitting 104 So, now we have model, how good is it? We have labeled data (ground truth), so we can validate! Model validation: Separate sets for training and testing the model Train the model using the training set Use the test set to evaluate the performance Compute figures of merit, which indicate the performance What is a good performance metric? And how should we split the data? 105 Some popular figures of merit: Accuracy (#TP + #TN) / (#TP + #FN +#TN + #FP) Sensitivitiy (#TP) / (#TP + #FN) a.k.a. True Positive Rate Specificity (#TN) / (#TN + #FP) a.k.a. True Negative Rate Where True Positive (TP): True Negative (TN): False Positive (FP): False Negative (FN): positive sample classified as positive negative sample classified as negative negative sample classified as positive positive sample classified as negative Number of samples 106 Receiver Operating Characteristic (ROC) Sensitivity / specificity give the performance for just one possible setting (i.e. decition threshold) of the model We can vary this threshold and recompute these performance metrics This yields a curve of possible combinations of sensitivity and specificity, called the ROC curve Generally true: sensitivity specificity and vice versa 107 How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Positive Predicted score Sensitivitiy = 5 / (5+0) = 1.00 Specificity = 3 / (3+2) = 0.60 Sensitiviy 1 -Specificity

20 How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Positive Predicted score Sensitivitiy = 4 / (4+1) = 0.80 Specificity = 3 / (3+2) = 0.60 Sensitiviy 1 -Specificity How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Predicted score Sensitivitiy = 4 / (4+1) = 0.80 Specificity = 4 / (4+1) = 0.80 Positive Sensitiviy 1 -Specificity How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Model Prediction Large data set: randomly sample half the samples for training and half for testing Training and testing is time consuming for large datasets The test set is probably a good reflection of the training set Predicted score Sensitivitiy = 0 / (0+5) = 0.00 Specificity = 5 / (5+0) = 1.00 Sensitiviy Area Under the Curve (AUC) 1 -Specificity AUC = 0.84 Data set Labels Training data Test data Predicted labels MODEL Compare Ground truth labels Performance How should we split the data? Different choices might lead to different results K-fold cross-validation Split the data in K equally sized parts Use K-1 parts for training and use the left-out part of the data for testing, repeat this for each part and average: Data set K equal parts training testing average 113 Performance Leave-One-Out Cross-Validation Leave one sample out of the complete set and use the remaining set to train the model Test the model on the left-out sample Repeat this for all samples. Best performance indication for small data set You want to use as much of the little data you have for training the model

21 EXAMPLE: 4-fold cross validation (1) 115 EXAMPLE: 4-fold cross validation (2) 116 Test set Training set Fold 1 Split in 4 equally-sized partitions Fold 1: Accuracy = 0.86 EXAMPLE: 4-fold cross validation (3) 117 EXAMPLE: 4-fold cross validation (4) 118 Test set Training set Test set Training set Fold 1 Fold 1 Fold 2 Fold 2 Fold 3 Fold 2: Accuracy = 0.86 Fold 3: Accuracy = 0.84 EXAMPLE: 4-fold cross validation (5) 119 EXAMPLE: 4-fold cross validation (6) 120 Test set Training set Test set Training set Fold 1 Fold 1 Acc. = 0.86 Fold 2 Fold 3 Fold 2 Fold 3 Acc. = 0.86 Acc. = fold cross-validation accuracy = 0.86 ± Fold 4 Fold 4: Accuracy = 0.88 Fold 4 Acc. = 0.88 (mean ± stdev) 20

22 Generalization: under- and overfitting 121 Generalization: under- and overfitting 122 Why don t we evaluate on the training set? Example: Why don t we evaluate on the training set? Example: Is this a good classifier? No errors on the training set!!! 100% accuracy NO! Very poor generalization On new, identically distributed data: 81% accuracy Overfitting! Generalization: under- and overfitting Why don t we evaluate on the training set? Example: Is this a good classifier? Many errors on the training set 86% accuracy NO! Model complexity too low! Underfitting! On new, identically distributed data: 84% accuracy ( train acc.!) 123 Generalization: under- and overfitting Why don t we evaluate on the training set? Example: Is this a good classifier? Accuracy on trianing set: 94% Accuracy on test set: 95% Approximately equal train and test error Good generalization! YES! 124 Generalization: under- and overfitting Model complexity: what is a good model? A model with good generalization! Prediction error Sufficient complexity Model complexity Training error 125 Good prediciton accuracy on both the training and the test set! Generalization: under- and overfitting Model complexity: what is a good model? Example: Non-linear SVM Fixed cost parameter C Complexity increases with reducing the size of the kernel scale (flexibility) 10-fold cross validation to estimate the test error Validate on training set for computing the train error Prediction error (%) Low complexity Model complexity 126 High complexity 21

23 Summary: In supervised learning the ground truth is available, so we can evaluate the prediction performance of the model. Split the data in two sets (training set and test set). Use figures of merit for measuring the performance: Accuracy, Sensitivity, Specificity, AUC, Use K-fold cross-validation for reliable evaluation. Increasing the model complexity may lead to overfitting! Poor generalization: Low training set error, high test set error

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Linear methods for supervised learning

Linear methods for supervised learning Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Statistics 202: Statistical Aspects of Data Mining

Statistics 202: Statistical Aspects of Data Mining Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Ulas Bagci

Ulas Bagci CAP5415-Computer Vision Lecture 14-Decision Forests for Computer Vision Ulas Bagci bagci@ucf.edu 1 Readings Slide Credits: Criminisi and Shotton Z. Tu R.Cipolla 2 Common Terminologies Randomized Decision

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

More information

Context-sensitive Classification Forests for Segmentation of Brain Tumor Tissues

Context-sensitive Classification Forests for Segmentation of Brain Tumor Tissues Context-sensitive Classification Forests for Segmentation of Brain Tumor Tissues D. Zikic, B. Glocker, E. Konukoglu, J. Shotton, A. Criminisi, D. H. Ye, C. Demiralp 3, O. M. Thomas 4,5, T. Das 4, R. Jena

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Logical Rhythm - Class 3. August 27, 2018

Logical Rhythm - Class 3. August 27, 2018 Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Applied Statistics for Neuroscientists Part IIa: Machine Learning Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

12 Classification using Support Vector Machines

12 Classification using Support Vector Machines 160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Application of Support Vector Machine In Bioinformatics

Application of Support Vector Machine In Bioinformatics Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Machine Learning Lecture 9

Machine Learning Lecture 9 Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 19.05.013 Discriminative Approaches (5 weeks) Linear Discriminant Functions

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Machine Learning Lecture 9

Machine Learning Lecture 9 Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 30.05.016 Discriminative Approaches (5 weeks) Linear Discriminant Functions

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information