Distributed Logistic Model Trees, Stratio Intelligence Mateo Álvarez and Antonio
|
|
- Miles Ross
- 5 years ago
- Views:
Transcription
1 Distributed Logistic Model Trees, Stratio Intelligence Mateo Álvarez and Antonio
2 Aerospace Engineer, MSc in Propulsion Systems (UPM), Master in Data Science (URJC). Working as data scientist and Big Data developer at Stratio Big Data in the data science department mateo-alvarez
3 Ph.D. in Telecommunications, MSc in Electronic Systems Engineering and Telecommunication Technologies, Systems and Networks (UPV), and MSc Big Data Expert (UTAD). Working as data scientist and Big Data developer at at Stratio Big Data in the data science
4 Why using interpretable algorithms instead of black boxes Logistic Regression Metrics Demo Decision Trees Variance-Bias tradeoff Logistic Model Trees Distributed implementation Cost function & configuration params Demo
5 Why use interpretable algorithms instead of black boxes Logistic Regression Decision Trees Variance-Bias
6 Accuracy VS Explainability Medical Studies Power management Financial environment Criminal activity
7 Probability Threshold Feature
8 Probability Local bad adjust Threshold Feature
9 Probability Threshold Feature
10 Root node Feature 1 Leaf node Leaf node
11 Root node Feature 1 Feature 2 Feature 2 Leaf node Leaf node Leaf node Leaf node
12 Root node Feature 1 Local bad adjust Feature 2 Feature 2 Leaf node Leaf node Leaf node Leaf node
13 Mean Error Error Total error Test error Underfitting Overfitting Training error Variance Bias 2 Model complexity Model complexity
14 Variance Bias
15 Missing important variables for the problem to make the predictions Bias Variance
16 Overfitting to the sample/training data Bias Variance
17 Irreducible error on prediction Bias Variance
18 Logistic Model Trees Distributed implementation Cost function & configuration parametres
19 Performance metrics Root node
20 Root node Feature 1 Performance metrics Performance metrics Performance metrics
21 Root node Feature 1 Performance metrics Performance metrics Performance metrics Performance metrics Performance metrics Performance metrics
22 Root node Feature 1 Feature 2 Feature 2
23 Root node Feature 1 Feature 2
24 Root node Feature 1 Feature 2
25 DISTRIBUTED IMPLEMENTATION Spark s Decision Tree (distributed implementation of random forests) Spark s Logistic Regression / weka s Logistic Regression on the nodes
26 LMT Cost function to fix the logistic regression threshold AccuracyCostFunction ConfusionMatrix PrecisionCostFunction PrecisionRecallCostFunction RocCostFunction The same cost function for pruning criteria Performance metrics Performance metrics Performance metrics
27 ADVANTAGES OF THIS IMPLEMENTATION Big datasets Power of spark to distribute building the tree and logistic regressions Medium datasets Distributed tree growth and weka s logistic regression Small datasets Although it can be slow to distribute the data for the decision tree, cost functions can be still used and specific optimization for particular cases
28 Example of DLMT algorithm in a synthetic dataset
29 Metrics
30 PREDICTION Positive Negative TRUE CONDITION Positive True Positives False Negatives Negative False Positives True Negatives True Positive Rate (Recall) False Positive Rate Precission TPR = TP/(TP+FN) Insensitive to unbalance FPR = FP/(FP+TN) Insensitive to unbalance Precision = TP/(TP+FP) Sensitive to unbalance Accuracy = (TP+TN)/(TP+TN+FP+FN) Sensitive to unbalance
31 PREDICTION Positive Negative TRUE CONDITION Positive True Positives False Negatives Negative False Positives True Negatives True Positive Rate (Recall) False Positive Rate Best performance AUROC (AUC): TPR/FPR -> Insensitive to unbalance! TPR FPR
32 PREDICTION Positive Negative TRUE CONDITION Positive True Positives False Negatives Negative False Positives True Negatives True Positive Rate (Recall) Precission Best performance AUPRC: Precision/TPR -> Sensitive to unbalance! Precision Recall
33 Data Algorithms f f 1 n ABF Benchmark
34 @StratioBD
35 1 Accuracy VS Explainability 2 3 Performance Metrics: 4 AUROC, AUPRC, ACCURACY Automatic Benchmarking Framework f f 1 n ABF Benchmark
36 THANK YOU UNITED STATES Tel: (+1) EUROPE Tel: (+34)
37 WE ARE
INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationLeveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg
Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationGlobal Probability of Boundary
Global Probability of Boundary Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues Martin, Fowlkes, Malik Using Contours to Detect and Localize Junctions in Natural
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationCS381V Experiment Presentation. Chun-Chen Kuo
CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationDATA MINING LECTURE 9. Classification Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationDATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier
DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationUsing Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear
Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang
ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report Team Member Names: Xi Yang, Yi Wen, Xue Zhang Project Title: Improve Room Utilization Introduction Problem
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More informationPart II: A broader view
Part II: A broader view Understanding ML metrics: isometrics, basic types of linear isometric plots linear metrics and equivalences between them skew-sensitivity non-linear metrics Model manipulation:
More informationDecision Trees: Representation
Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning
More informationDATA MINING OVERFITTING AND EVALUATION
DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationData Science Training
Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationDATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees Evaluation What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie? Facial hair alone is not
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationK-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017
K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold
More informationProactive Prediction of Hard Disk Drive Failure
Proactive Prediction of Hard Disk Drive Failure Wendy Li (liwendy) - CS221, Ivan Suarez (isuarezr) - CS221, Juan Camacho (jcamach2) - CS229 Introduction In data center environments, hard disk drive (HDD)
More information1 Machine Learning System Design
Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training
More information7 Techniques for Data Dimensionality Reduction
7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationMetrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57
lassification-issues, Slide 1/57 Classification Practical Issues Huiping Cao lassification-issues, Slide 2/57 Outline Criteria to evaluate a classifier Underfitting and overfitting Model evaluation lassification-issues,
More informationData Mining D E C I S I O N T R E E. Matteo Golfarelli
Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationIntroduction to Machine Learning
Introduction to Machine Learning Eric Medvet 16/3/2017 1/77 Outline Machine Learning: what and why? Motivating example Tree-based methods Regression trees Trees aggregation 2/77 Teachers Eric Medvet Dipartimento
More informationCOMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization
COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationCrossing the AI Chasm. What We Learned from building Apache PredictionIO (incubating)
Crossing the AI Chasm What We Learned from building Apache PredictionIO (incubating) Simon Chan Sr. Director, Product Management, Salesforce Co-founder, PredictionIO PhD, University College London simon@salesforce.com
More informationBusiness Data Analytics
MTAT.03.319 Business Data Analytics Lecture 9 The slides are available under creative common license. The original owner of these slides is the University of Tartu Fraud Detection Wrongful act for financial
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationMachine Learning and Bioinformatics 機器學習與生物資訊學
Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationSUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?
SUPERVISED LEARNING WITH SCIKIT-LEARN How good is your model? Classification metrics Measuring model performance with accuracy: Fraction of correctly classified samples Not always a useful metric Class
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
More informationModel Complexity and Generalization
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationMachine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio
Machine Learning nearest neighbors classification Luigi Cerulo Department of Science and Technology University of Sannio Nearest Neighbors Classification The idea is based on the hypothesis that things
More informationCross-validation. Cross-validation is a resampling method.
Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationCSE 417T: Introduction to Machine Learning. Lecture 6: Bias-Variance Trade-off. Henry Chai 09/13/18
CSE 417T: Introduction to Machine Learning Lecture 6: Bias-Variance Trade-off Henry Chai 09/13/18 Let! ", $ = the maximum number of dichotomies on " points s.t. no subset of $ points is shattered Recall
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationDeep Learning Approach to Network Intrusion Detection
Deep Learning Approach to Network Intrusion Detection Paper By : Nathan Shone, Tran Nguyen Ngoc, Vu Dinh Phai, Qi Shi Presented by : Romi Bajracharya Overview Introduction Limitation with NIDS Proposed
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationClassifiers and Detection. D.A. Forsyth
Classifiers and Detection D.A. Forsyth Classifiers Take a measurement x, predict a bit (yes/no; 1/-1; 1/0; etc) Detection with a classifier Search all windows at relevant scales Prepare features Classify
More informationLecture 19: Decision trees
Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More information