LSML19: Introduction to Large Scale Machine Learning
|
|
- Bertina Thomas
- 5 years ago
- Views:
Transcription
1 LSML19: Introduction to Large Scale Machine Learning MINES ParisTech March 2019 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech
2 Practical information Course website > Teaching > LSML 19 Schedule: Morning sessions: 09:30 12:30 Afternoon sessions: 14:00 17:00 Lectures are here in L.118 Practicals are in L.117 L.119 L.120 Grade: 60% Exam (Friday, 14:00) 40% RAMP
3 Today's goals Review what machine learning is Understand scalabilities issues Discover how to accelerate gradient descents with stochastic gradient descent
4 Acknowledgements Slides inspired by Ala Al-Fuqaha, Ethem Alpaydi, Matthew Blaschko, Léon Bottou, Sanjiv Kumar, Trevor Hastie, Rob Tibshirani and Jean-Philippe Vert
5 Why machine learning? 5
6 Business Insider: 2017 is the year of machine learning Improving doctors Assisting lawyers Make cars drive themselves 6
7 Perception 7
8 Communication Chatbot example from Text courtesy an Eddie Izzard show from 1999 called Dressed to Kill. 8
9 Reasoning 9
10 Diagnosis 10
11 Scientific discovery 11 LHC image: Anna Pantelia/CERN
12 A common thread: ML Using algorithms to build models from example (training) data. Statistics + optimization + computer science 12
13 Empirical risk minimization Ingredients: Data Hypothesis class: Shape of the decision function f Loss function: Cost/error of f on one data point Recipe: Find, among all functions of the hypothesis class, one that minimizes the loss on the training data (empirical risk). 13
14 (Un)supervised learning setting features descriptors n observations samples data points p variables attributes t supervision outcome target label Binary classification: data matrix design matrix y X Multi-class classification: Regression: Iris dataset: n=150, p=4, t=1. Cancer drug sensitivity: n=103, p=106, t=100. ImageNet: n=14.106, p=6.103, t= Shopping, e-marketing...: n=o(106), p=o(109), t=o(108). Astronomy, GAFAs, web...: n=o(109), n=o(109), n=o(109). 14
15 Scaling ML algorithms What is large scale? Important considerations: Data does not fit in RAM; Data streams; Algorithms do not run in a reasonable time on a single machine. Performance increases with the number of samples. Likelihood to overfit increases with the number of features. Iris dataset: n=150, p=4, t=1. Cancer drug sensitivity: n=103, p=106, t=100. ImageNet: n=14.106, p=6.103, t= Shopping, e-marketing...: n=o(106), p=o(109), t=o(108). Astronomy, GAFAs, web...: n=o(109), n=o(109), n=o(109). 15
16 A brief zoo of ML problems 16
17 Unsupervised learning Learn a new representation of the data ML algo Data p n Images, text, measurements, omics data... Data! X 17
18 Dimensionality reduction Find a lower-dimensional representation ML algo Data p m X X n Images, text, measurements, omics data... n Data 18
19 Clustering Group similar data points together Data ML algo 19
20 Unsupervised learning Dimensionality reduction PCA Clustering k-means Density estimation Feature learning 20
21 Supervised learning Make predictions ML algo Predictor Data X n n p Labels decision function y 21
22 Classification Make discrete predictions ML algo Predictor Data Labels Binary classification Multi-class classification 22
23 Regression Make continuous predictions ML algo Predictor Data Labels 23
24 Supervised learning Regression Ordinary least squares, ridge regression Classification Logistic regression, SVM Structured output prediction 24
25 Main ML paradigms Unsupervised learning: Supervised learning: Dimensionality reduction; Clustering; Density estimation; Feature learning. Regression; Classification; Structured output prediction. Semi-supervised learning. Reinforcement learning. 25
26 Dimensionality reduction: Principal Components Analysis 26
27 Principal Components Analysis Objective: Reduce the dimension without losing the variability in the data; Find a low-dimensional space such as to maximize the variance of the data projected onto that space. The k-th principal component: Is orthogonal to all previous components: Captures the largest amount of variance. Solution: w is the k-th eigenvector of 27
28 PCA example: Population genetics Genetic data of 1387 Europeans Novembre et al,
29 Algorithmic complexity of PCA Memory: nxp pxp store the data (X) and the covariance matrix (XTX): Runtime: Computing XTX: Computing K eigenvectors by Lanczos iterations: Computing the covariance matrix is more expensive than computing the K first principal components! Example n=109, p=108: Computing the covariance matrix: 1025 FLOPS. Fastest world computer (Nov 2018): 75 peta flops 2+ years. Storing the covariance matrix: 1016B = (1016/250)PB = 8PB. 29
30 Clustering: k-means 30
31 K-means clustering Goal: Find a cluster assignement that minimizes the intra-cluster variance: centroids: 31
32 K-means clustering Goal: Find a cluster assignement that minimizes the intra-cluster variance: centroids: Voronoi tesselation: 32
33 K-means clustering Goal: Find a cluster assignement that minimizes the intra-cluster variance: centroids: NP-hard Iterative algorithm Assignment step: fix the centroids, update assignments Update step: update the centroids 33
34 K means clustering 34
35 K means clustering Pick 3 centroids at random. 35
36 K means clustering Assign each observation to the nearest centroid. 36
37 K means clustering Recompute centroids. 37
38 K means clustering Re-assign each observation to the nearest centroid. 38
39 K means clustering Recompute centroids, and iterate process until convergence. 39
40 k-means complexity Assignment step: fix the centroids, update assignments: Compute n x K distances in p dimensions Update step: update the centroids: Sum n values in p dimensions T iterations Storage: Store n cluster assignements + K centroids Store X 40
41 Ridge regression 41
42 Linear regression Least-squares fit (equivalent to MLE under the assumption of Gaussian noise): Solution uniquely defined when invertible. 42
43 Ridge regression Hoerl & Kennard 1970 Goodness-of-fit + ridge regularization Empirical risk Solution unique and always exists Limit cases: Ridge regularization λ 0 : OLS (non-regularized) solution (low bias, high variance); λ : β=0 (high bias, low variance). Correlated features get similar weights. 43
44 Complexity of ridge regression Computing Inverting When n >> p, computing expensive than inverting it! is more 44
45 error (MSE) Regularization path regression weights regularization regularization 45
46 Ridge regression Hyperparameter setting 46
47 Setting λ Overfitting Prediction error Underfitting On new data On training data Model complexity 47
48 Setting λ Data splitting strategy: cross-validation: Cut the training set in k equally-sized chunks. K folds: one chunk to test, the (K-1) others for training. Valid Training Valid Training Training Valid Valid Training Cross-validation score: perf averaged over the K folds. For a grid of values for λ. λ1 λ2... λm 48
49 Setting λ Data splitting strategy: cross-validation: Cut the training set in k equally-sized chunks. K folds: one chunk to test, the (K-1) others for training. Valid Training Valid Training Training Training Valid Valid Cross-validation score: perf averaged over the K folds. Choose the λ with the best cross-validation score. Multiplies time complexity by KM. 49
50 Ridge regression ℓ2-regularized learning 50
51 ℓ2-regularized learning loss Empirical risk ℓ2 regularization Generalization of the ridge regression to any loss. If the loss is convex, then the problem is strictly convex and has a unique global solution, which can be found numerically. 51
52 ℓ2-regularized learning loss Empirical risk ℓ2 regularization Generalization of the ridge regression to any loss. If the loss is convex, then the problem is strictly convex and has a unique global solution, which can be found numerically. Absolute loss: Quadratic loss: ε-insensitive loss: Huber loss: mix quadratic & linear 52
53 Gradient descent 53
54 Gradient descent If the loss is convex, then the problem is strictly convex and has a unique global solution, which can be found numerically. Suppose the function to minimize is derivable: First-order Taylor expansion of f in u (v, J(v)) (v, J(u)+ J (u).(v u)) (u, J(u)) 54
55 Gradient descent Minimize a derivable, strictly convex function J by finding where its gradient is 0 Set a0 randomly Update: a1 = a0 α J(a0) Repeat Stop when J(ak) < ε. J(a) J (a0) < 0 a0 a1 55
56 Classification 56
57 Logistic regression Model y as a linear function of x? T A C T A C N NO 57
58 Logistic regression Model y as a linear function of x? Model the log-odds ratio as a linear function of x. 58
59 Logistic regression Model y as a linear function of x? Model the log-odds ratio as a linear function of x. p 59
60 Ridge logistic regression Le Cessie and van Houwelingen, 1992 Goodness-of-fit + ridge regularization Logistic loss Empirical risk Ridge regularization Logistic loss: negative conditional likelihood No explicit solution. Smooth convex optimization problem that can be solved numerically. 60
61 Newton-Raphson iterations Minimise J convex, differentiable. Gradient descent: Suppose f is twice differentiable. Second-order Taylor s expansion: g Minimize in v instead of in u. Hence use
62 Solving the logistic regression Can be solved with Newton-Raphson iterations. Each step is equivalent to solving a weighted ridge regression. IRLS: Iteratively Reweighted Least Squares. Complexity: Number of iterations 62
63 Large margin classifiers Margin: classify x as positive if f(x) > 0 and negative otherwise. ideally, yf(x) > 0. Large margin classifier: maximize yf(x): for a convex, non-increasing function Logistic regression: 63
64 Large margin classifiers 1 1 Linear Support Vector Machine (SVM): Hinge loss 64
65 Linear SVM Non-smooth, convex optimization problem; Quadratic Program. Equivalent to the dual problem Complexity (training): Storing Optimization: Complexity (prediction): Primal: Dual: 65
66 Kernel methods 66
67 Motivation 67
68 Non-linear mapping to a feature space R R2 68
69 SVM in the feature space Train: Predict with the decision function 69
70 Kernel SVM Train: Predict with the decision function 70
71 Kernel trick k may be quite efficient to compute, even if H is a very highdimensional or even infinite-dimensional space. For any positive semi-definite function k, there exists a feature space H and a feature map φ such that Hence you can define mappings implicitely. Kernel trick: algorithms that only involve the samples through their dot products can be rewritten using kernels in such a way that they can be applied in the initial space without ever computing the mapping. 71
72 Non-linear mapping to a feature space R R2 72
73 Polynomial kernels More generally, for is an inner product in a feature space of all monomials of degree up to d. 73
74 Gaussian kernel The feature space has infinite dimension. 74
75 Kernel ridge regression Ridge regression in input space: pxp In a feature space of dimension d: dxd nxn Ridge regression in sample space: 75
76 Complexity of KRR Computing K: Storing K: Inverting K + λi: Computing a prediction for one sample: computing κ: computing the products: 76
77 Algorithmic complexity recap 77
78 Summary Method PCA k-means Ridge regression Logistic regression SVM, kernel methods Memory O(p2) O(np) O(p2) O(np) O(np) Train time O(np2) O(Knp) O(np2) O(np2) O(n3) Predict time O(p) O(Kp) O(p) O(p) O(np) Training can take place offline unless data is streaming Prediction should be fast! 78
79 Techniques for large-scale ML Use the deep learning tricks. Deep learning (March 26, F. Moutarde). Natural Language Processing (March 27, E. Grave). Distribute data & computation on modern archs. Systems for large-scale ML (March 28, C.-A. Azencott). Trade optimization accuracy for speed. 79
80 Gradient descents convergence rates 80
81 Flavours of convexity J differentiable is convex iff for all J is L-smooth, or L-Lipschitz gradient iff it is twice differentiable and for all J doesn t vary sharply. J is m-strongly convex iff for all J is not too flat, lower-bounded by a quadratic. 81
82 Gradient descent convergence rates Gradient descent with fixed step size: J convex, L-smooth: converges in O(1/ε) iterations O(np/ε). J m-strongly convex, L-smooth: converges in O(κ log 1/ε) iterations O(np κ log(1/ε)). Newton-Raphson iterations J convex, L-smooth: converges in O(log log 1/ε) iterations O((np2+p3) log log(1/ε)). 82
83 Gradient descent convergence rates Gradient descent with fixed step size: J convex, L-smooth: converges in O(1/ε) iterations O(np/ε). J m-strongly convex, L-smooth: converges in O(κ log 1/ε) iterations O(np κ log(1/ε)). Newton-Raphson iterations J convex, L-smooth: converges in O(log log 1/ε) iterations O((np2+p3) log log(1/ε)). Usually unknown... 83
84 Stochastic gradient descent Gradient descent: Stochastic gradient descent: Sampling with replacement: chose l uniformely at random in {1, 2,, m}. J convex, L-smooth: converges in O(κ/ε2) iterations O(κp/ε2). J m-strongly convex, L-smooth: We got rid of n! converges in O(κ/ε) iterations O(κp/ε).
85 Convex optimization A number of ML algorithms can be formulated as convex optimization problems. They can be solved numerically thanks to variants of the gradient descent. Trading optimization accuracy for speed: Necessary when reaching accuracy is too resourceconsuming. Does not necessarily have a large impact on performance: test error is more important than training error.
CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More information9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives
Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationSolution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013
Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More information10. Support Vector Machines
Foundations of Machine Learning CentraleSupélec Fall 2017 10. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationLinear methods for supervised learning
Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationUnsupervised Learning
Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationEE 511 Linear Regression
EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due
More informationHyperparameters and Validation Sets. Sargur N. Srihari
Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationChallenges motivating deep learning. Sargur N. Srihari
Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu Chiang Frank Wang 王鈺強, Associate Professor
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationClassification by Nearest Shrunken Centroids and Support Vector Machines
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationDevelopment in Object Detection. Junyuan Lin May 4th
Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationObject Detection with Discriminatively Trained Part Based Models
Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwelb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Fabricio Santolin da Silva Kaustav Basu Some slides
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationLecture 7: Support Vector Machine
Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each
More informationData Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationSVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines
SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More information7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech
Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationPrediction of Dialysis Length. Adrian Loy, Antje Schubotz 2 February 2017
, 2 February 2017 Agenda 1. Introduction Dialysis Research Questions and Objectives 2. Methodology MIMIC-III Algorithms SVR and LPR Preprocessing with rapidminer Optimization Challenges 3. Preliminary
More informationChap.12 Kernel methods [Book, Chap.7]
Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationOrange3 Educational Add-on Documentation
Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine
More informationCSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18
CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample
More information