An Empirical Study on Lazy Multilabel Classification Algorithms
|
|
- Melvyn Cameron
- 5 years ago
- Views:
Transcription
1 An Empirical Study on Lazy Multilabel Classification Algorithms Eleftherios Spyromitros, Grigorios Tsoumakas and Ioannis Vlahavas Machine Learning & Knowledge Discovery Group Department of Informatics Aristotle University of Thessaloniki Greece
2 What is Multilabel Classification? Multilabel Classification Multilabel Classification Methods Single-label Classification Results are associated with a single label disjoint labels If L 2, binary classification If L 2, multi-class classification Multilabel Classification Results are associated with a set of labels from a set of Y L L
3 Data With Multilabel Nature Multilabel Classification Multilabel Classification Methods Traditional Text Classification A web article concerning the Antikythera Mechanism Research Project can be categorized into both categorys { Science_Technology, History_Culture } Medical Diagnosis Multiple diseases for a patient { Obesity, Hypertension} Modern Gene Function Classification A gene usually has multiple functions { Protein Synthesis, Cellular Biogenesis, Cellular Transport} Classification of Music into Emotions A song can make you feel { Sad_Lonely, Quiet_Still} Semantic Scene Analysis { Mountain, Trees, Lake }
4 Types of Multilabel Classification Methods Multilabel Classification Multilabel Classification Methods Problem transformation methods They transform the learning problem into one (LP) or more (BR) single-label classification or label ranking problems Algorithm independent Algorithm adaptation methods They extend specific algorithms to handle multi-label data SVM, decision tree, neural network, lazy, Bayesian, boosting
5 The Binary Relevance (BR) Method Multilabel Classification Multilabel Classification Methods How it works Learns one binary classifier h : X {, } for each different label L The original dataset is transformed into L datasets D contains all examples of D labeled as if they are associated with and as otherwise Criticism Label correlations are not considered D
6 The Label Powerset (LP) Method Multilabel Classification Multilabel Classification Methods How it works Considers its different subset of Criticism Large number of label subsets ( ) as a single label It learns one single-label classifier h : X P( L) Most of these are associated with very few examples L 2 L
7 The BRkNN Algorithm The BRkNN Algorithm The Problem of BRkNN Extensions of BRkNN MLkNN and LPkNN Origin Equivalent to using the BR method in conjunction with the knn algorithm Refinement L times faster than BR + knn in prediction Benefit Avoids the redundant calculations of k nearest neighbors in each one of the transformed datasets D A single k nearest neighbors search is followed by independent predictions for each label Applies better in domains with large number of labels and examples, requiring low response times
8 How it works Introduction The BRkNN Algorithm The Problem of BRkNN Extensions of BRkNN MLkNN and LPkNN Confidence scores BrKNN is based on the calculation of confidence scores for each label L c Confidence is obtained considering the percentage of the k nearest neighbors that include each label A label is included in the label-set when the percentage is higher than or equal to 50%
9 Percenage of instances, where the enpty set is output Introduction Independent Predictions The BRkNN Algorithm The Problem of BRkNN Extensions of BRkNN MLkNN and LPkNN The 35% weakness The empty set is a possible overall output 30% 20% The reason Independent 15% predictions for each label, a general 10% 5% scene yeast emotions Arises when none of the labels has a confidence higher than 25% 50% disadvantage of the BR method Is this common in BrkNN? 0% Nearest Neighbors
10 The Proposed Extensions The BRkNN Algorithm The Problem of BRkNN Extensions of BRkNN MLkNN and LPkNN Trying to dissolve the aforementioned problem BRkNN-a Checks if BRkNN outputs the empty set In that case outputs the label with the highest confidence BRkNN-b 1 st step: Calculates the average size s 1 k nearest neighbors ( s Yj ) 2 nd step: outputs the highest confidence k [] s j 1 of the label sets of the k (nearest integer of s) labels with the
11 The MLkNN and LPkNN Algorithms The BRkNN Algorithm The Problem of BRkNN Extensions of BRkNN MLkNN and LPkNN Two more lazy multi-label classification methods LPkNN The pairing of LP problem transformation method with the knn algorithm A little discussed in the past MLkNN An adaptation of knn for multi-label data Main difference with BRkNN: prior and posterior probabilities estimated from the training set Extended with an option for min-max normalization
12 Evaluation Measures Evaluation Measures Datasets Evaluation Methodology Example-based Calculate the difference between the actual and predicted label sets for each example Average the results over all examples of the test set Label-based Calculate a binary evaluation measure separately for each label Micro/Macro averaging operations over all labels
13 Example Based Measures Evaluation Measures Datasets Evaluation Methodology Notation ( xy, ) 2 Y Z Let be a multi-label example, Z Y Let h be a multi-label classifier Let Z h( x) be the set of labels predicted by h for Hamming Loss Y Z L ( xy, ), where is the symmetric difference of two sets Classification Accuracy or Subset Accuracy 1, if Y Z 0, if Y Z IR-inspired measures Y Z Z Y Z Y Precision, Recall, F-measure 2 Y Z Z Y
14 Label Based Measures Evaluation Measures Datasets Evaluation Methodology Any binary evaluation measure can be used Accuracy, area under ROC curve, precision, recall, etc Operations for averaging across all labels Macro-averaging Micro-averaging L 1 M macro M ( tp, fp, tn, fn ) L 1 L L L L M micro M tp, fp, tn, fn
15 Datasets Introduction Evaluation Measures Datasets Evaluation Methodology Dataset Examples Attributes Numeric Discrete Labels Distinct Subsets Label Cardinality Label Density Scene 2, , Emotions , Yeast 2, , Datasets Scene: semantic indexing of still images Emotions: classification of songs into 6 classes of emotion Yeast: Multi-label Statistics gene function classification Distinct Subsets is the number of different label sets Label Cardinality is the average number of labels per example Label Density is equal to Label Cardinality divided by L
16 Evaluation Methodology Evaluation Measures Datasets Evaluation Methodology Multi-label algorithms evaluated BRkNN BRkNN-a / BRkNN-b MLkNN LPkNN Varying number of nearest neighbors k ranged from 1 to 30 Distance function: Normalized Euclidean Evaluation Example-based: hamming loss, accuracy, F-measure, subset accuracy Label-based : micro and macro version of F-measure 10-fold cross-validation
17 Do the Proposed Extensions Improve BRkNN? Comparison of BRkNN, LPkNN and MLkNN Do the Proposed Extensions Improve BRkNN? BRkNN against its extensions BRkNN-a and BRknn-b Average performance across all 30 values of k metric scene base ext-a ext-b emotions base ext-a ext-b yeast base ext-a ext-b Hamming loss 0,0950 0,0938 0,0941 0,1976 0,1982 0,2175 0,1974 0,1975 0,2082 Accuracy 0,6256 0,7226 0,7218 0,5215 0,5441 0,5430 0,5062 0,5080 0,5346 F-measure 0,6495 0,7539 0,7538 0,6275 0,6576 0,6590 0,5777 0,5795 0,6652 Subset accuracy 0,6281 0,7251 0,7230 0,2895 0,2971 0,2759 0,1958 0,1959 0,1766 micro F-measure 0,6386 0,7392 0,7381 0,6499 0,6577 0,6509 0, macro F-measure 0,5993 0,6889 0,6886 0,6224 0,6303 0, #wins (#better) 0 6 (6) 0 (6) 1 4 (5) 1 (4) 1 1 (5) 4 (4)
18 Do the Proposed Extensions Improve BRkNN? Comparison of BRkNN, LPkNN and MLkNN Do the Proposed Extensions Improve BRkNN? Remarks Both extensions outperform the base algorithm in more than half of the metrics in all datasets Performance pattern correlates with dataset cardinality BRkNN-a dominates in scene and emotions (1.074, 1.868) Increased probability for BRkNN to output the empty set BRkNN-b dominates in yeast (4.237) A mechanism to predict the number of labels
19 Comparison of BRkNN, LPkNN and MLkNN Do the Proposed Extensions Improve BRkNN? Comparison of BRkNN, LPkNN and MLkNN Best extension of BRkNN against LPknn and MLknn Average performance across all 30 values of k Metric scene ext-a LPkNN MLkNN emotions ext-a LPkNN MLkNN yeast ext-b LPkNN MLkNN Hamming loss 0,0938 0,0955 0,0884 0,1982 0,2094 0,2003 0,2082 0,2143 0,1950 Accuracy 0,7226 0,7181 0,6720 0,5441 0,5600 0,5233 0,5346 0,5280 0,5105 F-measure 0,7392 0,7343 0,6944 0,6576 0,6662 0,6352 0,6652 0,6375 0,5823 Subset accuracy 0,6889 0,6854 0,6272 0,2971 0,3287 0,2780 0,1766 0,2452 0,1780 micro F-measure 0,7296 0,7249 0,7316 0,6577 0,6649 0,6509 0,6567 0,6415 0,6422 macro F-measure 0,7363 0,7323 0,7341 0,6303 0,6505 0,6110 0,4261 0,4322 0,3701 #wins
20 Comparison of BRkNN, LPkNN and MLkNN Do the Proposed Extensions Improve BRkNN? Comparison of BRkNN, LPkNN and MLkNN Remarks BRkNN-a dominates in scene LPkNN dominates in emotions BRkNN-b performs slightly better in yeast Possible correlation between LPkNN performance and label density
21 Summary and Future Work Summary and Future Work Resources The End Use of knn for multi-label classification BRkNN an efficient implementation of BR plus knn Extensions that enhance BRkNN s performance Additional comparative experiments with LPkNN and MLkNN Main contribution Which method is most suitable for a dataset depending on certain dataset characteristics. Future work Additional lazy multi-label classification approaches Experiments with additional multi-label datasets
22 Summary and Future Work Resources The End The MUlti-LAbel classification (MULAN)library Open source software for multi-label classification Several problem transformation and algorithm adaptation methods Example/label/ranking based measures Multi-label statistics Built on top of Weka Also hosted by Sourceforge (integrated with SVN) Multi-label classification datasets (.arff format) delicious, emotions, genbase, mediamill, rcv1v2, scene, tmc2007, yeast Active multi-label classification bibliography
23 End of Presentation Introduction Summary and Future Work Resources The End
An Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationRandom k-labelsets: An Ensemble Method for Multilabel Classification
Random k-labelsets: An Ensemble Method for Multilabel Classification Grigorios Tsoumakas and Ioannis Vlahavas Department of Informatics, Aristotle University of Thessaloniki 54124 Thessaloniki, Greece
More informationDeakin Research Online
Deakin Research Online This is the published version: Nasierding, Gulisong, Tsoumakas, Grigorios and Kouzani, Abbas Z. 2009, Clustering based multi-label classification for image annotation and retrieval,
More informationMulti-Label Classification with Conditional Tree-structured Bayesian Networks
Multi-Label Classification with Conditional Tree-structured Bayesian Networks Original work: Batal, I., Hong C., and Hauskrecht, M. An Efficient Probabilistic Framework for Multi-Dimensional Classification.
More informationEfficient Multi-label Classification
Efficient Multi-label Classification Jesse Read (Supervisors: Bernhard Pfahringer, Geoff Holmes) November 2009 Outline 1 Introduction 2 Pruned Sets (PS) 3 Classifier Chains (CC) 4 Related Work 5 Experiments
More informationOn the Stratification of Multi-Label Data
On the Stratification of Multi-Label Data Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas Dept of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, Greece {sechidis,greg,vlahavas}@csd.auth.gr
More informationBenchmarking Multi-label Classification Algorithms
Benchmarking Multi-label Classification Algorithms Arjun Pakrashi, Derek Greene, Brian Mac Namee Insight Centre for Data Analytics, University College Dublin, Ireland arjun.pakrashi@insight-centre.org,
More informationCategorizing Social Multimedia by Neighborhood Decision using Local Pairwise Label Correlation
Categorizing Social Multimedia by Neighborhood Decision using Local Pairwise Label Correlation Jun Huang 1, Guorong Li 1, Shuhui Wang 2, Qingming Huang 1,2 1 University of Chinese Academy of Sciences,
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationLazy multi-label learning algorithms based on mutuality strategies
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Artigos e Materiais de Revistas Científicas - ICMC/SCC 2015-12 Lazy multi-label
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationMulti-Stage Rocchio Classification for Large-scale Multilabeled
Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationLearning and Nonlinear Models - Revista da Sociedade Brasileira de Redes Neurais (SBRN), Vol. XX, No. XX, pp. XX-XX,
CARDINALITY AND DENSITY MEASURES AND THEIR INFLUENCE TO MULTI-LABEL LEARNING METHODS Flavia Cristina Bernardini, Rodrigo Barbosa da Silva, Rodrigo Magalhães Rodovalho, Edwin Benito Mitacc Meza Laboratório
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationAccuracy Based Feature Ranking Metric for Multi-Label Text Classification
Accuracy Based Feature Ranking Metric for Multi-Label Text Classification Muhammad Nabeel Asim Al-Khwarizmi Institute of Computer Science, University of Engineering and Technology, Lahore, Pakistan Abdur
More informationDecomposition of the output space in multi-label classification using feature ranking
Decomposition of the output space in multi-label classification using feature ranking Stevanche Nikoloski 2,3, Dragi Kocev 1,2, and Sašo Džeroski 1,2 1 Department of Knowledge Technologies, Jožef Stefan
More informationA STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES
A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationMuli-label Text Categorization with Hidden Components
Muli-label Text Categorization with Hidden Components Li Li Longkai Zhang Houfeng Wang Key Laboratory of Computational Linguistics (Peking University) Ministry of Education, China li.l@pku.edu.cn, zhlongk@qq.com,
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationFeature and Search Space Reduction for Label-Dependent Multi-label Classification
Feature and Search Space Reduction for Label-Dependent Multi-label Classification Prema Nedungadi and H. Haripriya Abstract The problem of high dimensionality in multi-label domain is an emerging research
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationMulti-label Classification using Ensembles of Pruned Sets
2008 Eighth IEEE International Conference on Data Mining Multi-label Classification using Ensembles of Pruned Sets Jesse Read, Bernhard Pfahringer, Geoff Holmes Department of Computer Science University
More informationMachine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio
Machine Learning nearest neighbors classification Luigi Cerulo Department of Science and Technology University of Sannio Nearest Neighbors Classification The idea is based on the hypothesis that things
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationCS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University
CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies
More informationExploiting label dependencies for improved sample complexity
DOI 10.1007/s10994-012-5312-9 Exploiting label dependencies for improved sample complexity Lena Chekina Dan Gutfreund Aryeh Kontorovich Lior Rokach Bracha Shapira Received: 7 October 2010 / Revised: 1
More informationAn Efficient Probabilistic Framework for Multi-Dimensional Classification
An Efficient Probabilistic Framework for Multi-Dimensional Classification Iyad Batal Computer Science Dept. University of Pittsburgh iyad@cs.pitt.edu Charmgil Hong Computer Science Dept. University of
More informationk-nn Disgnosing Breast Cancer
k-nn Disgnosing Breast Cancer Prof. Eric A. Suess February 4, 2019 Example Breast cancer screening allows the disease to be diagnosed and treated prior to it causing noticeable symptoms. The process of
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationHow Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?
entropy Article How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? Piotr Szymański 1,2, *, Tomasz Kajdanowicz 1 and Kristian Kersting 3 1 Department
More informationA kernel method for multi-labelled classification
A kernel method for multi-labelled classification André Elisseeff and Jason Weston BOwulf Technologies, 305 Broadway, New York, NY 10007 andre,jason @barhilltechnologies.com Abstract This article presents
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationNearest Neighbor Classification
Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationExploiting Label Dependency and Feature Similarity for Multi-Label Classification
Exploiting Label Dependency and Feature Similarity for Multi-Label Classification Prema Nedungadi, H. Haripriya Amrita CREATE, Amrita University Abstract - Multi-label classification is an emerging research
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationA Survey On Data Mining Algorithm
A Survey On Data Mining Algorithm Rohit Jacob Mathew 1 Sasi Rekha Sankar 1 Preethi Varsha. V 2 1 Dept. of Software Engg., 2 Dept. of Electronics & Instrumentation Engg. SRM University India Abstract This
More informationInterpretation and evaluation
Interpretation and evaluation 1. Descriptive tasks Evaluation based on novelty, interestingness, usefulness and understandability Qualitative evaluation: obvious (common sense) knowledge knowledge that
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationLarge-scale multi-label ensemble learning on Spark
2017 IEEE Trustcom/BigDataSE/ICESS Large-scale multi-label ensemble learning on Spark Jorge Gonzalez-Lopez Department of Computer Science Virginia Commonwealth University Richmond, VA, USA gonzalezlopej@vcu.edu
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationCLASSIFICATION JELENA JOVANOVIĆ. Web:
CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Naïve Bayes (NB) algorithm
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationClassification of Hand-Written Numeric Digits
Classification of Hand-Written Numeric Digits Nyssa Aragon, William Lane, Fan Zhang December 12, 2013 1 Objective The specific hand-written recognition application that this project is emphasizing is reading
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationA Pruned Problem Transformation Method for Multi-label Classification
A Pruned Problem Transformation Method for Multi-label Classification Jesse Read University of Waikato, Hamilton, New Zealand jmr3@cs.waikato.ac.nz ABSTRACT Multi-label classification has gained significant
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationA Simple Instance-Based Approach to Multilabel Classification Using the Mallows Model
A Simple Instance-Based Approach to Multilabel Classification Using the Mallows Model Weiwei Cheng and Eyke Hüllermeier Department of Mathematics and Computer Science University of Marburg, Germany {cheng,eyke}@mathematik.uni-marburg.de
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationMulticlass Classification
Multiclass Classification Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), David Kauchak (Pomona), Tommi Jaakola (MIT) and the many others who made
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationA Lazy Approach for Machine Learning Algorithms
A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationMulti Label Prediction Using Association Rule Generation and Simple k-means
2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT) Multi Label Prediction Using Association Rule Generation and Simple k-means H Haripriya
More informationReducing Multiclass to Binary. LING572 Fei Xia
Reducing Multiclass to Binary LING572 Fei Xia 1 Highlights What? Converting a k-class problem to a binary problem. Why? For some ML algorithms, a direct extension to the multiclass case may be problematic.
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationMulti-Label Lazy Associative Classification
Multi-Label Lazy Associative Classification Adriano Veloso 1, Wagner Meira Jr. 1, Marcos Gonçalves 1, and Mohammed Zaki 2 1 Computer Science Department, Universidade Federal de Minas Gerais, Brazil {adrianov,meira,mgoncalv}@dcc.ufmg.br
More informationCategorization of Sequential Data using Associative Classifiers
Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationAutomated Selection and Configuration of Multi-Label Classification Algorithms with. Grammar-based Genetic Programming.
Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-based Genetic Programming Alex G. C. de Sá 1, Alex A. Freitas 2, and Gisele L. Pappa 1 1 Computer Science Department,
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationAssignment 1: CS Machine Learning
Assignment 1: CS7641 - Machine Learning Saad Khan September 18, 2015 1 Introduction I intend to apply supervised learning algorithms to classify the quality of wine samples as being of high or low quality
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationThe Set Classification Problem and Solution Methods
The Set Classification Problem and Solution Methods Xia Ning xning@cs.umn.edu Computer Science & Engineering University of Miesota, Twin Cities George Karypis karypis@cs.umn.edu Computer Science & Engineering
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationClassification of Procedurally Generated Textures
Classification of Procedurally Generated Textures Emily Ye, Jason Rogers December 14, 2013 1 Introduction Textures are essential assets for 3D rendering, but they require a significant amount time and
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationAdvanced Video Content Analysis and Video Compression (5LSH0), Module 8B
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More information