Contents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2

Similar documents
ADDITIONS AND IMPROVEMENTS TO THE ACE 2.0 MUSIC CLASSIFIER

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

3 Virtual attribute subsetting

Applying Supervised Learning

Weka ( )

Slides for Data Mining by I. H. Witten and E. Frank

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

Performance Analysis of Data Mining Classification Techniques

Machine Learning Techniques for Data Mining

A Comparison of Decision Tree Algorithms For UCI Repository Classification

CHAPTER 6 EXPERIMENTS

Naïve Bayes for text classification

Topics In Feature Selection

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Chapter 8 The C 4.5*stat algorithm

Data Mining: An experimental approach with WEKA on UCI Dataset

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

USING ACE XML 2.0 TO STORE AND SHARE FEATURE, INSTANCE AND CLASS DATA FOR MUSICAL CLASSIFICATION

Data Mining With Weka A Short Tutorial

Classification using Weka (Brain, Computation, and Neural Learning)

Using Google s PageRank Algorithm to Identify Important Attributes of Genes

6.034 Design Assignment 2

Weka: Practical machine learning tools and techniques with Java implementations

Tutorial on Machine Learning Tools

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

Using Decision Boundary to Analyze Classifiers

Prototyping DM Techniques with WEKA and YALE Open-Source Software

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Data Mining Practical Machine Learning Tools and Techniques

Comparison Study of Different Pattern Classifiers

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Bias-Variance Analysis of Ensemble Learning

CS229 Final Project: Predicting Expected Response Times

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Why MultiLayer Perceptron/Neural Network? Objective: Attributes:

COMP s1 - Getting started with the Weka Machine Learning Toolkit

A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Efficient Pruning Method for Ensemble Self-Generating Neural Networks

Overview of OMEN. 1. Introduction

A Comparative Study of Selected Classification Algorithms of Data Mining

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Scheme:weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S E 20 -H a

A Lazy Approach for Machine Learning Algorithms

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

A Survey of Distance Metrics for Nominal Attributes

Tanagra: An Evaluation

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

A Fast Decision Tree Learning Algorithm

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

Classifier Inspired Scaling for Training Set Selection

Perceptron-Based Oblique Tree (P-BOT)

A Soft-Computing Approach to Knowledge Flow Synthesis and Optimization

User Guide Written By Yasser EL-Manzalawy

Neural Network Weight Selection Using Genetic Algorithms

Available online at ScienceDirect. Procedia Computer Science 35 (2014 )

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

Data Mining: STATISTICA

PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier

TUBE: Command Line Program Calls

Comparative Study of Data Mining Classification Techniques over Soybean Disease by Implementing PCA-GA

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI

DUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

Network Traffic Measurements and Analysis

WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange

Supervised vs unsupervised clustering

Computer Vision. Exercise Session 10 Image Categorization

A Novel Feature Selection Framework for Automatic Web Page Classification

Hands on Datamining & Machine Learning with Weka

Unsupervised Learning

Cost-sensitive C4.5 with post-pruning and competition

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

10-701/15-781, Fall 2006, Final

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification

The Role of Biomedical Dataset in Classification

Rotation Perturbation Technique for Privacy Preserving in Data Stream Mining

Dimension reduction : PCA and Clustering

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Locally Weighted Naive Bayes

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Performance Evaluation of Various Classification Algorithms

organizing maps generating abilities of as possible the accuracy of for practical classifier propagation and C4.5) [7]. systems.

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Transcription:

ACE

Contents ACE Presentation Comparison with existing frameworks Technical aspects ACE 2.0 and future work 24 October 2009 ACE 2

ACE Presentation 24 October 2009 ACE 3

ACE Presentation Framework for using and optimizing classifiers Experiments a variety of classifiers, classifier parameters, classifier ensembles and dimensionality reduction Designed to facilitate classification Meta-learning component 24 October 2009 ACE 4

ACE Configurations ACE 1.1 works with command line interface ACE 2.0 works with command line or GUI interface ACE XML Files Training data set Taxonomy Testing data set Taxonomy ACEFileConverter ACE InstanceLabeller ACE ClassificationTester ACE Weka ARFF Files Trained classifiers Classified data set 24 October 2009 ACE 5

Example (Devaney 2008) DIMENSIONALITY REDUCTION: Genetic search using naive Bayesian classifier TIME TAKEN: 0.40363333333333334 minutes SELECTED FEATURES (12 of 24): Spectral_Centroid_Overall_Standard_Deviation Spectral_Rolloff_Point_Overall_Standard_Deviation Spectral_Flux_Overall_Standard_Deviation Compactness_Overall_Average Root_Mean_Square_Derivative_Overall_Average Zero_Crossings_Overall_Standard_Deviation Zero_Crossings_Derivative_Overall_Average Zero_Crossings_Derivative_Overall_Standard_Deviation Strongest_Frequency_Via_Zero_Crossings_Overall_Average Strongest_Frequency_Via_Spectral_Centroid_Overall_Average Strongest_Frequency_Via_Spectral_Centroid_Overall_Standard_Deviation Strongest_Frequency_Via_FFT_Maximum_Overall_Average 24 October 2009 ACE 6

Example (Devaney 2008) BEST CLASSIFIER: AdaBoost with C4.5 Decision Trees BEST AVERAGE ERRORRATE: 6.2080536% CORRESPONDING CROSS-VALIDATION TIME: 0.23671666666666666 minutes FOLDS: 10 CROSS-VALIDATION STATISTICS: CONFUSION MATRIX: Correctly Classified Instances 1118 93.7919 % a b c d e <-- classified as Incorrectly Classified Instances 74 6.2081 % 308 0 2 0 1 a = kick Kappa statistic 0.9204 0 263 26 0 1 b = open Mean absolute error 0.0257 2 23 264 7 2 c = closed Root mean squared error 0.1553 0 0 4 150 2 d = k-snare Relative absolute error 8.2325 % 2 1 1 0 133 e = p-snare Root relative squared error 39.3154 % Total Number of Instances 1192 Ignored Class Unknown Instances 49 24 October 2009 ACE 7

Comparison with existing frameworks 24 October 2009 ACE 8

Existing general frameworks PRTTools (MATLAB Toolbox) Van der Heijden et al. 2004 Reliant upon a proprietary software Non redistributable Little portability Weka Witten and Frank 2000 Freely distributable Open source Relatively well documented Good portability (implemented in Java) Basis of the ACE engine 24 October 2009 ACE 9

Existing specific frameworks Marsyas - Tzanetakis and Cook 1999 C++ based system little portability No MIDI classification Designed as feature extractor M2K - Downie 2004 Feature extraction and classification framework based on D2K License issues Not open source 24 October 2009 ACE 10

Technical aspects 24 October 2009 ACE 11

Feature file formats (McKay et al. 2005) ACE is designed to be used with any feature extractor and data set s correctly formatted Weka ARFF file format 1 class = 1 instance Grouping of features Labeling or structuring of the instance Structure of class labels <!ELEMENT feature_vector_file (comments, data_set+)> <!ELEMENT comments (#PCDATA)> <!ELEMENT data_set (data_set_id, section*, feature*)> <!ELEMENT data_set_id (#PCDATA)> <!ELEMENT section (feature+)> <!ATTLIST section start CDATA "" stop CDATA ""> <!ELEMENT feature (name, v+)> <!ELEMENT name (#PCDATA)> <!ELEMENT v (#PCDATA)> ACE XML file format Easily readable (verbose format) and standardized 1 file = 1 task (features, feature definitions, classifications ) reusable Metadata storage full independence between extractors and classifiers 24 October 2009 ACE 12

Meta-learning - Classifier ensembles Combine the answer of several classification models Advantages (Dietterich( 2000) Statistical performance Improved results for models working with local optima Can improve significantly results from a non adapted format classifier sifier Classifier ensemble methods Voting (merging the results) Dynamic selection (selecting in each case the best classifier) Stacking (weighted voting) Bagging (sampling all available training instances) Boosting (emphasize poorly classified trainings sets) 24 October 2009 ACE 13

Feedforward neural networks ACE Classifiers Support vector machines Nearest neighbour classifiers Decision tree classifiers Bayesian classifiers 24 October 2009 ACE 14

Feature dimension reduction ACE implements a variety of dimensionality reduction and feature weighting techniques Iterative evaluation of the techniques Techniques used: Principal component analysis Genetic algorithms Tree searches Forward-backward algorithms 24 October 2009 ACE 15

Feature weighting with GA s (Fiebrink et al. 2005) Optimize a classification by assigned weights to the features according to their accuracy GA software: JGAP (Java package) Genes selection: leave-one one-out out cross-validation with k-nn k classifier Parallel computation (M2K/D2K or Grid Weka) 24 October 2009 ACE 16

ACE 2.0 and future work 24 October 2009 ACE 17

(Thompson et al. 2009) Architectural restructuring ACE 2.0 facilitate integration with other softwares Redesigned cross-validation Implemented in ACE rather using Weka More transparent processing ACE XML 2.0 Zip and Project files Combine the different ACE XML files Improved command-line interface Graphical User Interface viewing and editing ACE XML files 24 October 2009 ACE 18

GUI ACE 2.0 (Thompson et al. 2009) 24 October 2009 ACE 19

(Thompson et al. 2009) Fully functional GUI Future Work on ACE Distributed work load Expanding learning algorithms HMM Recurrent neural networks Weka s unsupervised learning 24 October 2009 ACE 20

Conclusion User-friendly and very complete framework Allows the user to test his/her own classifiers, data reduction techniques Encourage experimentation in classification, in particular with classifier ensembles Table 1. (McKay et al. 2005) ACE s classification success rate on ten UCI datasets using ten-fold cross-validation com-pared to a published baseline (Kotsiantis and Pintelas 2004). Data Set ACE s Selected Classifier Kotsiantis Success Rate ACE Success Rate anneal AdaBoost -- 99,6% audiology AdaBoost -- 85,0% autos AdaBoost 81,7% 86,3% balance scale Naïve Bayes -- 91,4% diabetes Naïve Bayes 76,6% 78,0% ionosphere AdaBoost 90,7% 94,3% iris FF Neural Net 95,6% 97,3% labor k-nn 93,4% 93,0% vote Decision Tree 96,2% 96,3% zoo Decision Tree -- 97,0% 24 October 2009 ACE 21