Similarity-Binning Averaging: A Generalisation of Binning Calibration
|
|
- Melinda Rich
- 5 years ago
- Views:
Transcription
1 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Similarity-Binning Averaging: A Generalisation of Binning Calibration Antonio Bella, Cèsar Ferri, José Hernández-Orallo and María José Ramírez-Quintana Universitat Politècnica de València, Spain
2 Introduction Traditional Calibration Methods Calibration by Multivariate Similarity-Binning Averaging Experimental Results Conclusions and Future Work 2
3 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Introduction Universitat Politècnica de València, Spain
4 4 Training Data Supplier Product Quantity Price Delivered on time? S1 P NO S2 P NO S1 P YES S1 P YES S1 P YES S2 P NO S2 P YES S1 P NO S1 P YES S1 P YES S2 P YES S2 P YES S1 P YES S2 P NO Quantity P1 Product YES (4.0) Customer Product Quality Price S1 P S2 P P3 Supplier YES (2.0) NO (3.0) NO (2.0) YES (3.0) New Data Data Mining Model P2 <=75 >75 S1 S2 Delivered on time? YES NO Prob. (Yes)
5 5
6 A classifier is calibrated if, for a sample of examples with predicted probability p, the expected proportion of positives is near to p. Uncalibrated Model Calibrated Model Predicted Probability Predicted Probability Proportion of Positives Proportion of Positives
7 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Traditional Calibration Methods Universitat Politècnica de València, Spain
8 Binning averaging method. Pair-adjacent violators algorithm (PAV). Platt s method. 8
9 Based in ordering instances. Only binary problems (directly). Problem attributes are only used for calculating estimated probability. Estimated probability (of the positive class) is only used for ordering instances. All examples in a bin have the same calibrated probability. 9
10 Probability calibration by similarity (k-most similar instances). Applicable to multiclass problems. Use estimated probabilities (of all the classes) and, also, the problem attributes for computing similarity between instances. More information can improve the calibrated probability. Each example has a calibrated probability. 10
11 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Calibration by Multivariate Similarity-Binning Averaging Universitat Politècnica de València, Spain
12 Training Dataset Validation Dataset (VD) X 11, X 12 X 1n, Y 1 X 21, X 22 X 2n, Y 2 X m1, X m2 X mn, Y m X 11, X 12 X 1n, Y 1 X 21, X 22 X 2n, Y 2 X r1, X r2 X rn, Y r New Instance (I) X I1, X I2 X In Calibration Stage Classification Technique Probabilistic Classification Model M New Instance with Estimated Probabilities (IP) X I1, X I2 X In, p(i,1), p(i,2) p(i,c) Model Generation Stage 12 X 11, X 12 X 1n, p(1,1), p(1,2) p(1,c), Y 1 X 21, X 22 X 2n, p(2,1), p(2,2) p(2,c), Y 2 X r1, X r2 X rn, p(r,1), p(r,2) p(r,c), Y r Validation Dataset with Estimated Probabilities (VDP) Probability Estimation Stage k most similar (SB) p*(i,1), p*(i,2) p*(i,c) Calibrated Probabilities
13 Typical learning process. A classication technique is applied to a training dataset to learn a probabilistic classication model (M). Training Dataset X 11, X 12 X 1n, Y 1 X 21, X 22 X 2n, Y 2 X m1, X m2 X mn, Y m This stage may not exist if the model is given beforehand (a hand-made model or an old model). Classification Technique M 13 Probabilistic Classification Model
14 The trained model M gives the estimated probabilities associated with a dataset. This dataset can be the same used for training, or an additional validation dataset VD. Validation Dataset (VD) X 11, X 12 X 1n, Y 1 X 21, X 22 X 2n, Y 2 X r1, X r2 X rn, Y r M The estimated probability for each class is joined as new attribute, creating a new dataset VDP. 14 X 11, X 12 X 1n, p(1,1), p(1,2) p(1,c), Y 1 X 21, X 22 X 2n, p(2,1), p(2,2) p(2,c), Y 2 X r1, X r2 X rn, p(r,1), p(r,2) p(r,c), Y r Validation Dataset with Estimated Probabilities (VDP)
15 To calibrate a new instance I: 1. Obtain estimated probabilities from the classication model M. 2. Add these probabilities to the instance creating a new instance (IP). 3. Select the k-most similar instances to this new instance from the dataset VDP. 4. The calibrated probability of this instance I for each class is the predicted class probability of the k-most similar instances using all attributes. New Instance with Estimated Probabilities (IP) VDP New Instance (I) X I1, X I2 X In M X I1, X I2 X In, p(i,1), p(i,2) p(i,c) k most similar (SB) p*(i,1), p*(i,2) p*(i,c) 15 Calibrated Probabilities
16 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Experimental Results Universitat Politècnica de València, Spain
17 20 binary datasets from the UCI repository 2 different settings: o Training and test sets (75% / 25%) o Training, validation and test sets (56% / 19% / 25%) Classification techniques (WEKA): o Naïve Bayes, J48, IBk (k=10) and Logistic Regression Baseline methods: o o Class: classification techniques without calibration 10-NN: 10 most similar instances with the original attributes 17
18 Calibration methods: o o o o Binning averaging (10 bins) PAV algorithm Platt s method Similarity-Binning Averaging (SBA) (k=10) Calibration measures: o o Calibration by overlapping bins (CalBin) Pure calibration measure Mean Squared Error (MSE) Hybrid measure Brier score decomposition Calibration loss and refinement loss 18
19 Dataset ClassT 10-NNT BinT PAVT PlattT SBAT BinV PAVV PlattV SBAV AVG
20 Dataset ClassT 10-NNT BinT PAVT PlattT SBAT BinV PAVV PlattV SBAV AVG
21 10-NNT BinT PAVT PlattT SBAT BinV PAVV PlattV SBAV CalBin = ClassT = = 10-NNT = = BinT PAVT PlattT = SBAT (col. wins, ties =, row wins ) = BinV PAVV PlattV 10-NNT BinT PAVT PlattT SBAT BinV PAVV PlattV SBAV MSE = = ClassT = 10-NNT = BinT = = PAVT = PlattT = SBAT BinV 21 = PAVV PlattV
22 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Conclusions and Future Work Universitat Politècnica de València, Spain
23 New calibration method. Binning by constructing the bins using similarity to select the k-most similar instances (estimated probabilities and problem attributes). Experimental results show a significant increase in calibration for both measures considered, over three traditional calibration techniques. Can be applied to multiclass problems. 23
24 24
25 10 th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009) Thanks for your attention! Antonio Bella Universitat Politècnica de València, Spain
Antonio Bella Sanjuán. Supervisors: César Ferri Ramírez José Hernández Orallo Maria José Ramírez Quintana
Master de Ingeniería de Software, Métodos Formales y Sistemas de Información Master Thesis: An Evaluation of Calibration Methods for Data Mining Models in Simulation Problems Antonio Bella Sanjuán Supervisors:
More informationFrom Ensemble Methods to Comprehensible Models
From Ensemble Methods to Comprehensible Models Cèsar Ferri, José Hernández-Orallo, M.José Ramírez-Quintana {cferri, jorallo, mramirez}@dsic.upv.es Dep. de Sistemes Informàtics i Computació, Universitat
More informationIT is desirable that a classification method produces membership
JOURNAL OF MACHINE LEARNING RESEARCH AT UNI-LJ, VOL. 6, NO. 1, JANUARY 2012 1 Reliable Calibrated Probability Estimation in Classification Marinka Zitnik Abstract Estimating reliable class membership probabilities
More informationCalibrating Random Forests
Calibrating Random Forests Henrik Boström Informatics Research Centre University of Skövde 541 28 Skövde, Sweden henrik.bostrom@his.se Abstract When using the output of classifiers to calculate the expected
More informationContext Change and Versatile Models in Machine Learning
Context Change and Versatile s in Machine Learning José Hernández-Orallo Universitat Politècnica de València jorallo@dsic.upv.es ECML Workshop on Learning over Multiple Contexts Nancy, 19 September 2014
More informationChapter 8 The C 4.5*stat algorithm
109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationPart III: Multi-class ROC
Part III: Multi-class ROC The general problem multi-objective optimisation Pareto front convex hull Searching and approximating the ROC hypersurface multi-class AUC multi-class calibration 4 July, 2004
More informationOn classification, ranking, and probability estimation
On classification, ranking, and probability estimation Peter Flach 1 and Edson Takashi Matsubara 2 1 Department of Computer Science, University of Bristol, United Kingdom Peter.Flach@bristol.ac.uk 2 Instituto
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationCHAPTER 6 EXPERIMENTS
CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationHHS Public Access Author manuscript Proc IEEE Int Conf Data Min. Author manuscript; available in PMC 2017 March 15.
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models Mahdi Pakdaman Naeini and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA Gregory F. Cooper Department
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationSupplemental Material: Multi-Class Open Set Recognition Using Probability of Inclusion
Supplemental Material: Multi-Class Open Set Recognition Using Probability of Inclusion Lalit P. Jain, Walter J. Scheirer,2, and Terrance E. Boult,3 University of Colorado Colorado Springs 2 Harvard University
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationPANEL: Information, Information, Information How Much can we Handle?
GRUPPO TELECOM ITALIA Seville 2014 PANEL: Information, Information, Information How Much can we Handle? ICCGI 2014 June 22-26, 2014 - Seville, Spain ICCGI 14 PANEL PANEL: Information, Information, Information
More informationMS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods
MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationSeminars of Software and Services for the Information Society
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationBinary classifier calibration using an ensemble of piecewise linear regression models
Knowl Inf Syst (2018) 54:151 170 https://doi.org/10.1007/s10115-017-1133-2 REGULAR PAPER Binary classifier calibration using an ensemble of piecewise linear regression models Mahdi Pakdaman Naeini 1,2
More informationJosé Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University
José Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University 20/09/2011 1 Evaluation of data mining and machine learning methods in the task of modeling
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationarxiv: v1 [stat.ml] 13 Jan 2014
Binary Classifier Calibration: A Bayesian Non-Parametric Approach arxiv:1401.2955v1 [stat.ml] 13 Jan 2014 Abstract Mahdi Pakdaman Naeini Intelligent System Program University of Piuttsburgh pakdaman@cs.pitt.edu
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More information10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,
10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,...,
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More informationIntroducing Categorical Data/Variables (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Definition: Feature Engineering (FE) = the process of transforming the data to an optimal representation for a given application. Scaling (see Chs.
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationOn The Value of Leave-One-Out Cross-Validation Bounds
On The Value of Leave-One-Out Cross-Validation Bounds Jason D. M. Rennie jrennie@csail.mit.edu December 15, 2003 Abstract A long-standing problem in classification is the determination of the regularization
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationPerceptron-Based Oblique Tree (P-BOT)
Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationA Comparison of Decision Tree Algorithms For UCI Repository Classification
A Comparison of Decision Tree Algorithms For UCI Repository Classification Kittipol Wisaeng Mahasakham Business School (MBS), Mahasakham University Kantharawichai, Khamriang, Mahasarakham, 44150, Thailand.
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationJue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline
Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationData Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM
Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationMichele Samorani. University of Alberta School of Business. More tutorials are available on Youtube
Dataconda Tutorial* Michele Samorani University of Alberta School of Business More tutorials are available on Youtube What is Dataconda? Software program to generate a mining table from a relational database
More informationFunction Algorithms: Linear Regression, Logistic Regression
CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html
More informationBayesian Classifiers Programmed in SQL
1 Bayesian Classifiers Programmed in SQL Carlos Ordonez, Sasi K. Pitchaimalai University of Houston Houston, TX 77204, USA Abstract The Bayesian classifier is a fundamental classification technique. In
More informationData mining: concepts and algorithms
Data mining: concepts and algorithms Practice Data mining Objective Exploit data mining algorithms to analyze a real dataset using the RapidMiner machine learning tool. The practice session is organized
More informationCloNI: clustering of JN -interval discretization
CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically
More informationShort instructions on using Weka
Short instructions on using Weka G. Marcou 1 Weka is a free open source data mining software, based on a Java data mining library. Free alternatives to Weka exist as for instance R and Orange. The current
More informationComparative Study of Instance Based Learning and Back Propagation for Classification Problems
Comparative Study of Instance Based Learning and Back Propagation for Classification Problems 1 Nadia Kanwal, 2 Erkan Bostanci 1 Department of Computer Science, Lahore College for Women University, Lahore,
More information3 Virtual attribute subsetting
3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationAdaptive Supersampling Using Machine Learning Techniques
Adaptive Supersampling Using Machine Learning Techniques Kevin Winner winnerk1@umbc.edu Abstract Previous work in adaptive supersampling methods have utilized algorithmic approaches to analyze properties
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationCS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014
CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationCombination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationEfficient Multi-label Classification
Efficient Multi-label Classification Jesse Read (Supervisors: Bernhard Pfahringer, Geoff Holmes) November 2009 Outline 1 Introduction 2 Pruned Sets (PS) 3 Classifier Chains (CC) 4 Related Work 5 Experiments
More informationThe Offset Tree for Learning with Partial Labels
The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research John Langford Yahoo! Research June 30, 2009 KDD 2009 1 A user with some hidden interests make a query on Yahoo. 2 Yahoo chooses
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationThe Data Mining Application Based on WEKA: Geographical Original of Music
Management Science and Engineering Vol. 10, No. 4, 2016, pp. 36-46 DOI:10.3968/8997 ISSN 1913-0341 [Print] ISSN 1913-035X [Online] www.cscanada.net www.cscanada.org The Data Mining Application Based on
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationMOA: {M}assive {O}nline {A}nalysis.
MOA: {M}assive {O}nline {A}nalysis. Albert Bifet Hamilton, New Zealand August 2010, Eindhoven PhD Thesis Adaptive Learning and Mining for Data Streams and Frequent Patterns Coadvisors: Ricard Gavaldà and
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationHandwritten Text Recognition
Handwritten Text Recognition M.J. Castro-Bleda, Joan Pasto Universidad Politécnica de Valencia Spain Zaragoza, March 2012 Text recognition () TRABHCI Zaragoza, March 2012 1 / 1 The problem: Handwriting
More informationReducing Multiclass to Binary. LING572 Fei Xia
Reducing Multiclass to Binary LING572 Fei Xia 1 Highlights What? Converting a k-class problem to a binary problem. Why? For some ML algorithms, a direct extension to the multiclass case may be problematic.
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationPractice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More information