Advanced Video Content Analysis and Video Imaging (5LSH0), Module 09. Sequential data / Introduction (1) Sequential data / Introduction (2)
|
|
- Bertram Barnett
- 5 years ago
- Views:
Transcription
1 Advanced Video Content Analysis and Video Imaging (5LSH0), Module 09 Semantic-level content analysis and classification II Sveta Zinger & Fons van der Sommen Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl ) 1 Sequential data / Introduction (1) When do we encounter sequential data? Measurements of time series Rainfall measurements on successive days Daily values of a currency exchange rate Acoustic features used for speech recognition Sequence of DNA elements Sequence of characters in a language 2 Sequential data / Introduction (2) 3 Sequential data / Introduction (3) 4 Example of sequential data: spectrogram of the spoken words Bayes theorem Sequential distributions Stationary data evolves in time, but the distribution from which it is generated remains the same Nonstationary not treated here Distribution itself is evolving with time Sequential data / Introduction (4) 5 Markov models (1) 6 Prediction of the next value in time series Recent observations are likely to be more informative than more historical observations It is impractical to consider a general dependence of future observations on all previous observations Number of observations increases => complexity of the model grows Markov models assume that future predictions are independent of all but the most recent observations If we ignore sequential aspects in data Treat the observations as i.i.d. (independent and identically distributed) Corresponds to a graph without links Fail to exploit sequential patterns in the data correlations between observations that are close in the sequence 1
2 Markov models (2) 7 Markov models (3) 8 Example: sequential patterns in data We observe a binary variable denoting whether it rained on a particular day We want to predict whether it will rain on the next day If we treat the data as i.i.d. => we have only relative frequency of rainy days In practice, weather exhibits trends that may last for several days => knowing that it rains today helps predicting rain for tomorrow The product rule applied to the joint distribution of a sequence of observations p N ( x1,..., xn ) = p( xn x1,..., xn 1) n= 1 If each of the conditional distributions on the right-hand side is independent of all previous observations except the most recent, we obtain the first-order Markov chain Markov models (4) { x n } p A first-order markov chain of observations ( ) x n x n 1 of the previous observation of a particular observation x n 1 x n in which the distribution is conditioned on the value 9 Markov models (5) First-order Markov chain defines joint distribution for a sequence of N observations p N ( x1,..., xn ) = p( x1 ) p( xn xn 1) n= 2 given all observations, conditional distribution of one of them p( xn x1,..., xn 1) = p( xn xn 1) distribution of predictions depends only on the value of the immediately preceding observation and is independent of all earlier observations 10 Markov models (6) Homogeneous Markov chain assumes stationary time series => conditional distributions constrained to be equal If the conditional distributions depend on adjustable parameters, then all of the conditional distributions in the chain share the same values of those parameters 11 Markov models (7) Higher-order Markov chains trends in the data over several successive observations will provide important information in predicting the next value First-order Markov chain is still very restrictive => move to higher-order Markov chains M th order Markov chains increase flexibility, but also increase the number of parameters in the model => impractical for large values of M 12 2
3 Markov models (8) Second-order Markov chain Joint distribution is given by p N ( x1,..., xn ) = p( x1 ) p( x2 x1 ) p( xn xn 1, xn 2 ) n= 3 Conditional distribution of a particular observation depends on the values of the two previous observations 13 Markov models (9) How to build a model not limited by the Markov assumption and with a limited number of parameters? Latent variables permit a rich class of models to be constructed from simple components For each observation x n introduce a corresponding latent variable z n and now assume that it is the latent variables that form a Markov chain 14 Markov models (10) Representation of sequential data using a Markov chain of latent variables, with each observation conditioned on the state of the corresponding latent variable: foundation for HMM (Hidden Markov Model) and for linear dynamical systems 15 Markov models (11) Markov chain with latent variables there is always a path connecting any two observed variables via the latent variables This path is never blocked Predictions depend on all previous observations Observed variables do not satisfy the Markov property of any order 16 Hidden Markov Models (1) 17 Hidden Markov Models (2) 18 Hidden Markov Model (HMM) Can be viewed as a Markov chain with discrete latent variables Examine a single slice of HMM It corresponds to a mixture distribution with component densities given by p( x z) The choice of mixture component depends on the choice made for the previous observation Applications of HMM Speech recognition Natural language modeling On-line handwriting recognition Analysis of biological sequences (DNA) 3
4 Hidden Markov Models (3) Definition of transition probabilities we allow the probability distribution of z n to depend on the state of the previous latent variable z n-1 through a conditional distribution p( z n z n 1 ) Latent variables are K-dimensional binary => conditional distribution corresponds to a table of transition probabilities, with elements of the table given by A p z = z 1 ( ) jk nk 1 n 1, j = 19 Hidden Markov Models (4) Transition matrix can be illustrated diagrammatically by drawing the states as nodes Transition diagram shows a model whose latent variables have three possible states corresponding to the three boxes. The black lines denote the elements of the transition matrix 20 Hidden Markov Models (5) If we unfold the state transition diagram, we obtain a lattice, or trellis, representation of the latent states Each column of this diagram corresponds to one of the latent variables z n 21 Hidden Markov Models (6) Joint probability distribution over both latent and observed variables N N p( X, ZΘ) = p( z1 π ) p( zn zn 1, A) p( xm zm, φ) n= 2 m= 1 where p(z 1 ) marginal distribution of the initial latent variable, π vector of probabilities for initial latent variable ( ) p x m z m,φ conditional distribution of the observed variables emission probabilities governed by distribution parameters φ z 1 22 Hidden Markov Models (7) 23 Hidden Markov Models (8) 24 Variants of the standard HMM model obtained for instance by imposing constraints on the form of the transition matrix Left-to-right HMM model of particular practical importance sets the elements A jk of the transition matrix to zero if k < j Example of the state transition diagram for a three-state left-to-right hidden Markov model Left-to-right HMM is used for speech recognition, on-line character recognition 4
5 Hidden Markov Models (9) Lattice diagram for a three-state left-to-right hidden Markov model in which the state index k is allowed to increase by at most 1 at each transition 25 Hidden Markov Models (10) Example of left-to-right HMM: handwritten digits On-line data: digit is represented as a trajectory of a pen as a function of time Train HMM on 45 examples of the digit 2 16 states: line segments having one of 16 possible angles Model parameters are are optimized using 25 iterations of EM 26 Hidden Markov Models (11) Top row: examples of on-line handwritten digits Bottom row: synthetic digits sampled generatively from a left-to-right HMM that has been trained on 45 handwritten digits 27 Summary and conclusions Markov chains assumes dependency inside a fixed neighborhood with latent variables => widely used hidden Markov fields Hidden Markov models Powerful method to model sequential data Requires its parameters to be estimated Invariant to some degree to local warping (compression and stretching) of the time axis speech recognition: variations of speed of speech => warping of the time axis => HMM can accommodate such a distortion and not penalize it too heavily 28 References 29 Supervised learning 30 Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2002 Chapter 8 Chapter 13 Catogarized / labeled data Objects in a picture: chair, desk, person, Handwritten digits: 3, 6, 5 Medical diagnosis: OCT image T2 colon cancer Goal: identify the class of a new data point Statistical modeling and machine learning Disctinctive properties (features) 5
6 Supervised learning: example (1/5) 31 Supervised learning: example (2/5) 32 Separate lemons from oranges Separate lemons from oranges Color: orange Shape: sphere Ø: ± 8 cm Weigth: ±0.1 kg Color: yellow Shape: elipsoid Ø: ± 8 cm Weigth: ±0.1 kg Color Use color and shape as features Shape Supervised learning: example (3/5) 33 Supervised learning: example (4/5) 34 Separate lemons from oranges Model the given -training - data Separate lemons from oranges New data point Color Oranges Lemons Color Oranges Lemons Classifier: It s an orange! Shape Shape Supervised learning: example (5/5) 35 Supervised learning 36 Diameter What if we had chosen the wrong features? Diameter New data point??? Summary Choose distinctive features Make a model based on labeled data (a.k.a. supervised learning) Use the learned model to predict the class of new, unseen data points Weight Weight 6
7 Models for classification 37 (1) 38 Random Forests k Nearest Neighbours (k-nn) Boosting Neural Networks Convolutional Neural Networks (CNN) & Deep learning Find Stanford lectures CS231n on YouTube (19 videos)! Find a hyperplane that separates the classes with a maximum margin Margin (2) 39 (3) 40 Based on emperical risk minimization (1960s) Non-linearity added in 1992 (Boser, Guyon & Vapnik) Soft-margin SVM introduced in 1995 (Cortes & Vapnik) Has become very popular since then Easy to use, a lot of open libraries available Fast learning and very fast classification Good generalization properties How to find the optimal hyperplane? Optimal? No! (4) 41 (4) 42 How to find the optimal hyperplane? How to find the optimal hyperplane? m w x+ b = 1 b w w x+ b= 0 w x+ b= 1 Width of the margin: b + 1 b 1 2 m = = w w w Maximize margin: Support vectors Vectors for which the constraint is exactly met. These vectors support the dividing hyperplane. Removal of one of those vectors typically leads to a different optimal hyperplane. 7
8 (5) 43 (6) 44 We can rewrite the optimization problem to The data is usually not linearly separable Introduce slack variables Formulate as a Quadratic Programming problem: Put a cost C on crossing the margin, so the optimization problem becomes: Efficient methods available to solve this problem! SFvd1 Non-linear SVMs (1) A more complex extension: non-linear SVMs Basic idea: map the data to a higher-dimensional space, in which we can apply a linear SVM 45 Non-linear SVMs (2) Map the data to a higher dimension Example: Add a new dimension as a function of the data 46 Not linearly separable Non-linear SVMs (3) 47 Non-linear SVMs (4) 48 Problems with mapping phi How to find a good mapping? Number of dimensions can blow up! Computationally expensive Data becomes sparser in a higher dimensional space Solution: Kernel functions! Mapping only occurs in the dual problem as inner product Kernel functions Do not define explicitly, only define inner product, (Kernel function) Note that can be infinite dimensional, but since we only use the inner product,, it is not more computationally expensive! Can we choose any Kernel function we like? No! It needs to satisfy Mercers condition. Typically you pick your kernel from a set of commonly-used options. 8
9 Slide 45 SFvd1 If possible, add a more intuitive example, e.g. circular data and a cone to rise the surrounding ring-shaped cluster Sommen, F. van der, 8/24/2016
10 Polynomial Radial Basis Functions (RBF) Sigmoid Non-linear SVMs (5) Popular kernel functions, 1, exp 2, tanh 49 Find optimal parameters using cross-validation on training data Classification of new data Linear SVM Straightforward: we have function! " and from training we know and ". The used constraints enforced that #1 for positive samples and $ 1 for negative samples. For soft-margin SVM, these constraints can be voilated. Hence, we can use the sign of to predict the label: %& sign * + 50 Classification of new data Non-linear SVM Classification function:! " Problem: exists in some high-dimensional space Typically the mapping to this *+ space is unkown, since we use a kernel function, * +.. Solution: can be written as / -% * +, hence:. 0- % /. "0- % K, " / Representer theorem 51 Classification of new data Non-linear SVM Classify new point using Lagrange multipliers 2 only non-zero for support vectors Required at test time: only 2 and support vectors. %& sign 0- % K, / " 52 Cost parameter & generalization (1) 53 Cost parameter & generalization (2) 54 Optimal hyperplane for C=100 SVM decision for C=100 Optimal hyperplane for C=10 SVM decision for C=10 9
11 Cost parameter & generalization (3) 55 Cost parameter & generalization (4) 56 Optimal hyperplane for C=1 SVM decision for C=1 Optimal hyperplane for C=0.1 SVM decision for C=0.1 Non-linear SVM examples 57 Summary Fast and efficient method for binary classification Splits the classes based on maximizing the margin Optimal hyperplane can be computed using Quadratic Programming Cost-parameter for points crossing the margin Non-linear SVM can also handle more complex class distributions by mapping the data to another space Kernel functions: typically increase complexity 58 Random Forest (1) Build decision trees on subsets of the data Let the trees vote on the class of a new sample Orange (60%) Lemon (40%) Orange (95%) Lemon (5%) Orange (72%) Lemon (28%) Orange (35%) Lemon (65%) Orange (84%) Lemon (16%) 59 Random Forest (2) General model for machine learning Density estimation, regression, classification, Robustness through randomness A random subset is used to train each tree For training a tree, each node receives a random set of split options Probabilistic output: model uncertainty! Automatic feature selection Naturally multi-class Runs efficiently trees can run in parallel 60 10
12 61 62 A forest consists of trees Start at the root node True/false question at each split node Stop when a leaf node is reached: prediction internal (split) node A general tree structure terminal (leaf) node root node Is it a male? Does he have a beard? Does he wear glasses? Jake, Joshua, Mike or Justin Example: GUESS WHO* *Credits to Mark Janse How to build a tree? Special type of graph: collection of nodes and edges Directed Acyclic Graph (DAG) Internal (split) nodes and terminal (leaf) nodes The upper/start node is called the root Each internal node has one incoming edge and two outgoing edges All nodes (exept the root) have exactly one incoming edge Mathematical notation Data point: v,,, 5 7 5, label: % Features:, dimensionality: 8 Binary split function: 9 :,; :7 5 => 0,1 Split parameters of node A: ; > Set of possible parameters > Training points reaching node A: B B C B E Complete training set B F Left Right How to split the data? Axis aligned hyperplane 9 :,; J LG : HLJ For 2D example: H B G : 1! e.g. H 1 0 M N Split function 9 :,; depends on parameters ; G :,H,I Feature selection function G : Geometric primitive H (e.g. a line) Thresholds I Note that setting either J or J corresponds to using only one threshold. Parameter space > contains the options that we have for parameters G :, H and I. B C B AO 65 B E How to split the data? Axis aligned hyperplane 9 :,; J LG : HLJ For 2D example: H G : 1! e.g. H 1 0 M N Oriented hyperplane 9 :,; J LG : HLJ G : 1! H H 7 N, e.g. H H Quadratic surface 9 :,; J LG! : H G : LJ G : 1! H 7 N=N representing a conic 66 11
13 67 68 How to determine the best split? Maximize information gain*: R B,S T B 0 B T B B C,E Z %^ `a,`bcde^, UfbU_^,"_f^ What is the best split? T B B Information gain R B,S Shannon s entropy: T B 0 U V log U V Y Z Node training ;argmax ; > ] R B,; B NodeA B C B E *One of many options. Other popular choices: (1) Gini s diversity index, (2) Misclassification error B C 48, B E 52 B C B E T B C T B E What is the best split? What is the best split? T B Information gain BESTSPLIT! B R B,S B C 50, B E 50 T B C T B E B C B E R B,S R B,S of these two options What is the best split? The one that yields the highest information gain from a given set of candidate splits Node training ; argmaxr B,; ; > ] Split function parameters ; Limited set of parameter settings > 71 Randomized Node Optimization (RNO) 72 Bagging How to train a decision tree? Start with a random subset of all the data at the root node Find the split parameters ; from a set of randomly chosen options> >that maximize some split metric Repeat this for the outgoing nodes and stop growing a certain branch untill one of the following two criteria holds: A pre-defined tree depth D is reached (# nodes of a branch) Alternatively: untill a pre-defined total number of nodes is reached All training samples in the node are from the same class 12
14 Example: growing a tree (1) 73 Example: growing a tree (2) 74 Let s grow a tree with depth D = 2: Option 1 Option 2 Option 3 Subset of all availabe data Start at the root node Example: growing a tree (3) 75 Example: growing a tree (4) 76 Option 1 Option 2 Option 3 Option 1 Option 2 Option 3 Resulting tree Example: growing a tree (5) left right 77 Example: classify a new data point (1) New data point v: 78 right right right left right right left left left left 13
15 Random Forests Example: classify a new data point (2) 79 Example: classify a new data point (3) 80 New data point v: New data point v: right left right right right left right right left left left left Decision forest model 81 Decision forest model 82 Node test parameters ; > Features / split function / thresholds Node objective function e.g. R R B,S (Energy) function to minimize Node weak learner e.g. 9 :,; lbf^, c_m^ Split node test function Leaf predictor model e.g. U V : Point estimate / full distribution Randomness model e.g. Bagging, RNO Methods for inserting randomness Stopping criteria e.g. Max tree drepth o When to stop splitting the data Forest size p Number of trees in the forest A collection of trees: a forest! Ensemble model e.g. U V : 1 p 0U q V :! q/ How to combine the output of all the trees in the forest Decision forest model 83 Decision forest model 84 How to add randomness? (Randomness model) 1. Bagging (randomized training set) Subset of all data points per tree 2. Randomized Node Optimization (RNO) Features chosen with selection function G r Split function depending on weak learner orientation H How to add randomness? (1) Bagging s F : s F q s F : Full training set Randomly sampled subset for training tree l s F s F Forest training s F N s F u Thresholds given in I 14
16 Decision forest model How to add randomness? (2) Randomized Node Optimization (RNO) >: Full set of all possible node test parameter values > >: Set of randomly sampled parameter values to train node A v > : vw >, low randomness Randomness control parameter Node test parameter ; G :,H,I > v1, high randomness 85 Decision forest model How to compute a prediction from a trained tree? Probability distribution at leaf: U V : Point-estimate, e.g. M.A.P.: V argmaxu V : Y Generally the full distribution is perserved untill the decision moment to incoroporate uncertainty : 86 U V : V Decision forest model 87 Decision forest model 88 How to combine tree output? Tree 1 Tree 2 Tree T How to combine tree output? U V : 1! p 0U q V : U V : 1! { U q V : q/ q/ U V : U V : U! V : Averaging: Multiplication: U V :! U! q/ q V : U V :! U y q/ q V : where { is a partitioning function to ensure probabilistic normalization Overconfident and less robust to noise Training points Example: generalization Example: the effect of randomness Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section Weak learner: Axis aligned Weak learner: Oriented line Weak learner: Conic section }~ D = 13 D = 5 15
17 Example: the effect of randomness 91 Example: the effect of randomness 92 Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section D = 13 D = 13 }~ }~ D = 5 D = 5 Random Forests Classification example (1) 93 Random Forests Classification example (2) 94 2 classes in feature space Random forest decision 4 classes in feature space Random forest decision N = 100 trees, max number of nodes = 5, # candidate splits per node = 3 N = 100 trees, max number of nodes = 4, # candidate splits per node = 3 Random Forests Classification example (3) 95 Example: handwritten digit classification 96 4 classes in feature space Random forest decision N = 100 trees, max number of nodes = 10, # candidate splits per node = 8 The MNIST Database of Handwritten Digit Images for Machine Learning Research, DOI: /MSP
18 97 98 Example: handwritten digit classification Example: handwritten digit classification Task: classify handwritten digits 1, 2, 3, 4, samples per digit: 250 for training, 250 for testing HOG* features as data points : 7 5, 8144 Forest parameters: Forest size p 300 trees Axis aligned weak learner H Randomness parameter v 5, with > 10 Selection functin G : randomly samples 8 dimensions from : Precision = 0.96 / Recall = 0.97 Prediction confidence *Histogram of Oriented Gradients, Dalal & Triggs, CVPR 2005 Forest predictions for class 1, sorted by confidence. Inverted digits are wrongly classified Example: handwritten digit classification Example: handwritten digit classification Prediction confidence Prediction confidence Precision = 0.99 / Recall = 0.92 Forest predictions for class 2, sorted by confidence. Inverted digits are wrongly classified. Precision = 0.96 / Recall = 0.97 Forest predictions for class 3, sorted by confidence. Inverted digits are wrongly classified Example: handwritten digit classification Example: handwritten digit classification Prediction confidence Prediction confidence Precision = 0.97 / Recall = 1.00 Forest predictions for class 4, sorted by confidence. Inverted digits are wrongly classified. Precision = 0.95 / Recall = 0.97 Forest predictions for class 5, sorted by confidence. Inverted digits are wrongly classified. 17
19 Recommended literature Decision Forests for Computer Vision and Medical Image Analysis, A. Criminisi, 2013 Ch.3: Introduction Ch.4: Classification Forests C++ library: Sherwood Breiman L., "Random forests, Mach. Learn. 45(1), doi: /a: doi: / Conclusions Random Forests offer an attractive method classification Inherently multi-class, probablistic output, efficient implementations available.. A forest is a collection of decision trees Each tree l is trained with a different subset B F q of the training data (Bagging) A tree is a collection of nodes and edges Each internal node splits the incoming data using node split function 9 :,; ; encompasses selection function G :, geometric primitive H and thresholds I Each node A receives a random subset > of the parameter space > for training (RNO) Randomness increases robustness Randomness control parameter } determines the ammount of randomness Maximum randomness when }, minumum randomness when } ƒ Tree depth o controls the forest confidence, hence a high o can lead to overfitting 104 So, now we have model, how good is it? We have labeled data (ground truth), so we can validate! Model validation: Separate sets for training and testing the model Train the model using the training set Use the test set to evaluate the performance Compute figures of merit, which indicate the performance What is a good performance metric? And how should we split the data? 105 Some popular figures of merit: Accuracy (#TP + #TN) / (#TP + #FN +#TN + #FP) Sensitivitiy (#TP) / (#TP + #FN) a.k.a. True Positive Rate Specificity (#TN) / (#TN + #FP) a.k.a. True Negative Rate Where True Positive (TP): True Negative (TN): False Positive (FP): False Negative (FN): positive sample classified as positive negative sample classified as negative negative sample classified as positive positive sample classified as negative Number of samples 106 Receiver Operating Characteristic (ROC) Sensitivity / specificity give the performance for just one possible setting (i.e. decition threshold) of the model We can vary this threshold and recompute these performance metrics This yields a curve of possible combinations of sensitivity and specificity, called the ROC curve Generally true: sensitivity specificity and vice versa 107 How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Positive Predicted score Sensitivitiy = 5 / (5+0) = 1.00 Specificity = 3 / (3+2) = 0.60 Sensitiviy 1 -Specificity
20 How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Positive Predicted score Sensitivitiy = 4 / (4+1) = 0.80 Specificity = 3 / (3+2) = 0.60 Sensitiviy 1 -Specificity How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Negative Model Prediction Predicted score Sensitivitiy = 4 / (4+1) = 0.80 Specificity = 4 / (4+1) = 0.80 Positive Sensitiviy 1 -Specificity How to compute the ROC curve? For each sample we have a predicted class and a score Sort the samples according to score and move the threshold Model Prediction Large data set: randomly sample half the samples for training and half for testing Training and testing is time consuming for large datasets The test set is probably a good reflection of the training set Predicted score Sensitivitiy = 0 / (0+5) = 0.00 Specificity = 5 / (5+0) = 1.00 Sensitiviy Area Under the Curve (AUC) 1 -Specificity AUC = 0.84 Data set Labels Training data Test data Predicted labels MODEL Compare Ground truth labels Performance How should we split the data? Different choices might lead to different results K-fold cross-validation Split the data in K equally sized parts Use K-1 parts for training and use the left-out part of the data for testing, repeat this for each part and average: Data set K equal parts training testing average 113 Performance Leave-One-Out Cross-Validation Leave one sample out of the complete set and use the remaining set to train the model Test the model on the left-out sample Repeat this for all samples. Best performance indication for small data set You want to use as much of the little data you have for training the model
21 EXAMPLE: 4-fold cross validation (1) 115 EXAMPLE: 4-fold cross validation (2) 116 Test set Training set Fold 1 Split in 4 equally-sized partitions Fold 1: Accuracy = 0.86 EXAMPLE: 4-fold cross validation (3) 117 EXAMPLE: 4-fold cross validation (4) 118 Test set Training set Test set Training set Fold 1 Fold 1 Fold 2 Fold 2 Fold 3 Fold 2: Accuracy = 0.86 Fold 3: Accuracy = 0.84 EXAMPLE: 4-fold cross validation (5) 119 EXAMPLE: 4-fold cross validation (6) 120 Test set Training set Test set Training set Fold 1 Fold 1 Acc. = 0.86 Fold 2 Fold 3 Fold 2 Fold 3 Acc. = 0.86 Acc. = fold cross-validation accuracy = 0.86 ± Fold 4 Fold 4: Accuracy = 0.88 Fold 4 Acc. = 0.88 (mean ± stdev) 20
22 Generalization: under- and overfitting 121 Generalization: under- and overfitting 122 Why don t we evaluate on the training set? Example: Why don t we evaluate on the training set? Example: Is this a good classifier? No errors on the training set!!! 100% accuracy NO! Very poor generalization On new, identically distributed data: 81% accuracy Overfitting! Generalization: under- and overfitting Why don t we evaluate on the training set? Example: Is this a good classifier? Many errors on the training set 86% accuracy NO! Model complexity too low! Underfitting! On new, identically distributed data: 84% accuracy ( train acc.!) 123 Generalization: under- and overfitting Why don t we evaluate on the training set? Example: Is this a good classifier? Accuracy on trianing set: 94% Accuracy on test set: 95% Approximately equal train and test error Good generalization! YES! 124 Generalization: under- and overfitting Model complexity: what is a good model? A model with good generalization! Prediction error Sufficient complexity Model complexity Training error 125 Good prediciton accuracy on both the training and the test set! Generalization: under- and overfitting Model complexity: what is a good model? Example: Non-linear SVM Fixed cost parameter C Complexity increases with reducing the size of the kernel scale (flexibility) 10-fold cross validation to estimate the test error Validate on training set for computing the train error Prediction error (%) Low complexity Model complexity 126 High complexity 21
23 Summary: In supervised learning the ground truth is available, so we can evaluate the prediction performance of the model. Split the data in two sets (training set and test set). Use figures of merit for measuring the performance: Accuracy, Sensitivity, Specificity, AUC, Use K-fold cross-validation for reliable evaluation. Increasing the model complexity may lead to overfitting! Poor generalization: Low training set error, high test set error
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationLinear methods for supervised learning
Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationSupervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning
Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationUlas Bagci
CAP5415-Computer Vision Lecture 14-Decision Forests for Computer Vision Ulas Bagci bagci@ucf.edu 1 Readings Slide Credits: Criminisi and Shotton Z. Tu R.Cipolla 2 Common Terminologies Randomized Decision
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationLecture 7: Support Vector Machine
Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each
More informationContext-sensitive Classification Forests for Segmentation of Brain Tumor Tissues
Context-sensitive Classification Forests for Segmentation of Brain Tumor Tissues D. Zikic, B. Glocker, E. Konukoglu, J. Shotton, A. Criminisi, D. H. Ye, C. Demiralp 3, O. M. Thomas 4,5, T. Das 4, R. Jena
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationLogical Rhythm - Class 3. August 27, 2018
Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationCS5670: Computer Vision
CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationSupport Vector Machines
Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 19.05.013 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationECE 5470 Classification, Machine Learning, and Neural Network Review
ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 30.05.016 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More information