Application Presentation - SASECA. Sistema Asociativo para la Selección de Características

Size: px
Start display at page:

Download "Application Presentation - SASECA. Sistema Asociativo para la Selección de Características"

Transcription

1 INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACIÓN EN COMPUTACIÓN Application Presentation - SASECA Sistema Asociativo para la Selección de Características Mario Aldape Pérez (CIC-IPN) Manuel Alejandro Soto-Ramos (CIC-IPN) Joint CHAIN/GISELA/EPIKH School for Application Porting to Science Gateways June 2012 Mexico City (Mexico)

2 Project Objectives 2

3 A new approach for feature selection using associative memories. A method for estimating the quality of learning in an Associative Memory. A method that reduces the space complexity by eliminating redundant information of the patterns that form the fundamental set. A method for reducing the size of the patterns that form the fundamental set. 3

4 4

5 Associative Memories 5

6 An associative memory can be seen as a system that relates input patterns with output patterns. M M 6

7 Heteroassociative memory M (elaine.512) Autoassociative memory M 7

8 Heteroassociative memory M (elaine.512) Autoassociative memory M 8

9 9 Let and let, be two column vectors with. The fundamental set of associations is denoted by: 0,1, A x, n A x n,u y u A y 1 2 n n x x A x x 1 2 u u y y A y y 1, 2,,, p x y

10 10

11 Lernmatrix y 1 y i y u x 1 m 11 x j m 1 j x n m m i1 m m u1 m ij uj m m 1n in un [ ] Steinbuch, K. (1961). Die Lernmatrix. Biological Cybernetics, 1,

12 Learning Phase m m m ij ij ij if yi 1 x j 1 mij if yi 1 y xj 0 0 otherwise With a positive constant 12

13 Classification Phase Given an unknown input pattern x A n. Find the class y A u to which an unknown input pattern belongs, according to the following rule: x y i n n u 1 if mij x j h 1 mhj x j j1 j1 0 otherwise With the maximum operator 13

14 14

15 Linear Associator Learning Phase y 1 t y y x y u 2 x 1, x2,, xn p t m ij 1 uxn M y x [ ] Yáñez Márquez, C. & Díaz de León Santiago, J.L. (2001). Linear Associator de Anderson-Kohonen, IT-50, Serie Verde, ISBN , CIC-IPN, México. 15

16 Classification Phase Given an unknown input pattern x A n. Find the class y A u to which an unknown input pattern belongs, according to the following rule: x Mx y x x 1 p t M x y 16

17 Learning Phase CHAT p M y x 1 t Classification Phase y i n n u 1 if mij x j h 1 mhj x j j1 j1 0 otherwise With the maximum operator [ ] Santiago Montero, R. (2003). Clasificador Híbrido de Patrones basado en la Lernmatrix de Steinbuch y el Linear Associator de Anderson-Kohonen. Tesis de Maestría en Ciencias de la Computación, Centro de Investigación en Computación, México. 17

18 Disadvantages: It is not possible to identify redundant or irrelevant information It is not possible to reduce the dimension of the training patterns 18

19 19

20 Feature Selection 20

21 Feature Selection. Consists of choosing those features that preserve or maximize the separation between classes. Filter Wrapper 21

22 Filter Dimensionality reduction Supervised learning Classification Preprocessing Data acquisition Reduced data Prediction Preprocessed data Original data 22

23 Advantages: Low computational costs. Disadvantages: No feedback from the predictor that will be using a lower dimensionality set. 23

24 Wrapper Dimensionality reduction Supervised learning Classification Preprocessing Data acquisition Reduced data Prediction Original data Preprocessed data 2 n iterations 24

25 Advantages: Allows to obtain the optimal subset of features. Disadvantages: Implies high computational costs. Not applicable to high dimensional data sets. 25

26 Learning Phase HCM p C y x 1 t Classification Phase y n n r 1 u r if m x e 1 m x e ij j j h hj j j i j1 j1 0 otherwise r 1,2,...,(2 f 1) With the maximum operator [ ] Aldape-Pérez, M., Yáñez-Márquez, C., & Argüelles-Cruz, A. J. (2007). Optimized Associative Memories for Feature Selection. Lecture Notes in Computer Science, 4477,

27 Advantages: Reduces the dimensionality of the problem. Allows to obtain the optimal subset of features. It increases the predictive accuracy in pattern classification problems. 27

28 Disadvantages: Requires an exploration of the entire solution space Implies high computational costs 28

29 Proposed Model 29

30 Data acquisition Preprocessing Supervised learning Feature Selection Original data Preprocessed data Associative Memory Constraint vector Prediction Reduced data Masked data 30

31 Previous Results 31

32 Experiments Along the experimental phase, four databases were used. These were taken from the UCI Machine Learning Repository. Breast Heart Credit Hepatitis Number of features Number of classes Number of patterns [*] 32

33 Breast Cancer [*] Problem: Classification Type of attributes: Integer Number of patterns: 699 Number of attributes: class label [*] 33

34 34

35 35

36 36

37 Heart Disease Database [*] Problem: Classification Type of attributes: Real Number of patterns: 270 Number of attributes: class label [*] 37

38 38

39 39

40 40

41 Australian Credit Approval Database [*] Problem: Classification Type of attributes: Real Number of patterns: 690 Number of attributes: class label [*] 41

42 42

43 43

44 Hepatitis Database [*] Problem: Classification Type of attributes: Integer Number of patterns: 155 Number of attributes: class label [*] 44

45 45

46 46

47 47

48 48

49 Number of features Time seg seg seg seg min min día mes año 49

50 50

51 Our Proposal 51

52 Data acquisition Preprocessing Supervised learning Feature Selection Original data Preprocessed data Associative Memory Constraint vector Prediction Reduced data Masked data 52

53 Associative Memory Feature Selection Constraint vector Masked data Reduced data Constraint vector Masked data Reduced data 53

54 QUESTIONS 54

55 55

56 56

57 57

58 58

59 59

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,

More information

Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks. Pilar Gómez-Gil, PhD ISCI 2012

Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks. Pilar Gómez-Gil, PhD ISCI 2012 Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks Pilar Gómez-Gil, PhD National Institute of Astrophysics, Optics and Electronics (INAOE)

More information

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search Jianli Ding, Liyang Fu School of Computer Science and Technology Civil Aviation University of China

More information

Forward Feature Selection Using Residual Mutual Information

Forward Feature Selection Using Residual Mutual Information Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics

More information

Classification and Generation of 3D Discrete Curves

Classification and Generation of 3D Discrete Curves Applied Mathematical Sciences, Vol. 1, 2007, no. 57, 2805-2825 Classification and Generation of 3D Discrete Curves Ernesto Bribiesca Departamento de Ciencias de la Computación Instituto de Investigaciones

More information

Revista Facultad de Ingeniería Universidad de Antioquia ISSN: Universidad de Antioquia Colombia

Revista Facultad de Ingeniería Universidad de Antioquia ISSN: Universidad de Antioquia Colombia Revista Facultad de Ingeniería Universidad de Antioquia ISSN: 0120-6230 revista.ingenieria@udea.edu.co Universidad de Antioquia Colombia López Yáñez, Itzamá; Flores Carapia, Rolando; Yáñez Márquez, Cornelio;

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation

More information

Unsupervised Feature Selection for Sparse Data

Unsupervised Feature Selection for Sparse Data Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Comparison of Various Feature Selection Methods in Application to Prototype Best Rules

Comparison of Various Feature Selection Methods in Application to Prototype Best Rules Comparison of Various Feature Selection Methods in Application to Prototype Best Rules Marcin Blachnik Silesian University of Technology, Electrotechnology Department,Katowice Krasinskiego 8, Poland marcin.blachnik@polsl.pl

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Intro to Artificial Intelligence

Intro to Artificial Intelligence Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

Application of the Generic Feature Selection Measure in Detection of Web Attacks

Application of the Generic Feature Selection Measure in Detection of Web Attacks Application of the Generic Feature Selection Measure in Detection of Web Attacks Hai Thanh Nguyen 1, Carmen Torrano-Gimenez 2, Gonzalo Alvarez 2 Slobodan Petrović 1, and Katrin Franke 1 1 Norwegian Information

More information

Elements and Principal Stages in the Design of Non-Profit Websites*

Elements and Principal Stages in the Design of Non-Profit Websites* Elements and Principal Stages in the Design of Non-Profit Websites* PAVEL MAKAGONOV, CELIA B. REYES ESPINOSA Department of post-graduate studies Technological University of Mixtec Km 2.5 carretera Acatlima-Huajuapan,

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design Multiobjective Formulations of Fuzzy Rule-Based Classification System Design Hisao Ishibuchi and Yusuke Nojima Graduate School of Engineering, Osaka Prefecture University, - Gakuen-cho, Sakai, Osaka 599-853,

More information

Fisher Score Dimensionality Reduction for Svm Classification Arunasakthi. K, KamatchiPriya.L, Askerunisa.A

Fisher Score Dimensionality Reduction for Svm Classification Arunasakthi. K, KamatchiPriya.L, Askerunisa.A ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd.,

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Minimizing Roundtrip Response Time in Distributed Databases with Vertical Fragmentation

Minimizing Roundtrip Response Time in Distributed Databases with Vertical Fragmentation Minimizing Roundtrip Response Time in Distributed Databases with Vertical Fragmentation Rodolfo A. Pazos R., Graciela Vázquez A. 2 José A. Martínez F. 3 Joaquín Pérez O. 4 Juan J. Gonzalez B. 5 Instituto

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Disease diagnosis using rough set based feature selection and K-nearest neighbor classifier

Disease diagnosis using rough set based feature selection and K-nearest neighbor classifier Volume :2, Issue :4, 664-668 April 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 Femina B CSE, Sri Krishna College of Anto S CSE, Sri Krishna College Of Disease

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

SELECCIONES MATEMÁTICAS

SELECCIONES MATEMÁTICAS SELECCIONES MATEMÁTICAS Universidad Nacional de Trujillo http://revistas.unitru.edu.pe/index.php/ssmm Vol. 03 (02): 49-59 (2016) ISSN: 2411-1783 (versión electrónica) APROXIMACIÓN DE LA DISTANCIA EN LA

More information

A NEW ALGORITHM FOR OPTIMIZING THE SELF- ORGANIZING MAP

A NEW ALGORITHM FOR OPTIMIZING THE SELF- ORGANIZING MAP A NEW ALGORITHM FOR OPTIMIZING THE SELF- ORGANIZING MAP BEN-HDECH Adil, GHANOU Youssef, EL QADI Abderrahim Team TIM, High School of Technology, Moulay Ismail University, Meknes, Morocco E-mail: adilbenhdech@gmail.com,

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

A New Method for Skeleton Pruning

A New Method for Skeleton Pruning A New Method for Skeleton Pruning Laura Alejandra Pinilla-Buitrago, José Fco. Martínez-Trinidad, and J.A. Carrasco-Ochoa Instituto Nacional de Astrofísica, Óptica y Electrónica Departamento de Ciencias

More information

Rough Set Approaches to Rule Induction from Incomplete Data

Rough Set Approaches to Rule Induction from Incomplete Data Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough

More information

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Computing the Euler Number of a Binary Image Based on a Vertex Codification

Computing the Euler Number of a Binary Image Based on a Vertex Codification Computing the Euler Number of a Binary Image Based on a Vertex Codification J H Sossa-Azuela 1, R Santiago-Montero 2, M Pérez-Cisneros 3, E Rubio-Espino 1 1 Centro de Investigación en Computación-Instituto

More information

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Biclustering for Microarray Data: A Short and Comprehensive Tutorial

Biclustering for Microarray Data: A Short and Comprehensive Tutorial Biclustering for Microarray Data: A Short and Comprehensive Tutorial 1 Arabinda Panda, 2 Satchidananda Dehuri 1 Department of Computer Science, Modern Engineering & Management Studies, Balasore 2 Department

More information

2.2 Optimal cost spanning trees

2.2 Optimal cost spanning trees . Optimal cost spanning trees Spanning trees have a number of applications: network design (communication, electrical,...) IP network protocols compact memory storage (DNA)... E. Amaldi Foundations of

More information

Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD

CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD Khin Lay Myint 1, Aye Aye Cho 2, Aye Mon Win 3 1 Lecturer, Faculty of Information Science, University of Computer Studies, Hinthada,

More information

Common aeronautic components. difficult to cut. Internal geometric features in aeronautic components. Properties of Nickel Alloys

Common aeronautic components. difficult to cut. Internal geometric features in aeronautic components. Properties of Nickel Alloys Reverse Engineering Methodology for Free Form Components and Internal Features -Aerospace Applications- Dr. Héctor R. Siller* Escuela de Ingeniería y Ciencias Tecnológico de Monterrey Common aeronautic

More information

Feature Selection in Knowledge Discovery

Feature Selection in Knowledge Discovery Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco

More information

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese

More information

A PSO-based Generic Classifier Design and Weka Implementation Study

A PSO-based Generic Classifier Design and Weka Implementation Study International Forum on Mechanical, Control and Automation (IFMCA 16) A PSO-based Generic Classifier Design and Weka Implementation Study Hui HU1, a Xiaodong MAO1, b Qin XI1, c 1 School of Economics and

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

INFORMS Annual Meeting 2013 Eva Selene Hernández Gress Autonomous University of Hidalgo

INFORMS Annual Meeting 2013 Eva Selene Hernández Gress Autonomous University of Hidalgo INFORMS Annual Meeting 2013 Eva Selene Hernández Gress Autonomous University of Hidalgo In this paper we proposed a solution to the JobShop Scheduling Problem using the Traveling Salesman Problem solved

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Technological Trends in Computing

Technological Trends in Computing Technological Trends in Computing Research in Computing Science Series Editorial Board Editors-in-Chief: Grigori Sidorov (Mexico) Gerhard Ritter (USA) Jean Serra (France) Ulises Cortés (Spain) Associate

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:

More information

5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection

5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection ENHANCED FUZZY ROUGHSET BASED FEATURE SELECTION 5 TECHNIQUE USING DIFFERENTIAL EVOLUTION 5.1 Data Reduction 5.1.1 Dimensionality Reduction 5.2 Feature Transformation 5.2.1 Principal Component Analysis

More information

Information Driven Healthcare:

Information Driven Healthcare: Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust

More information

REAL-TIME PROCESS MANAGER AND ITS APPLICATION IN ROBOTICS

REAL-TIME PROCESS MANAGER AND ITS APPLICATION IN ROBOTICS REAL-TIME PROCESS MANAGER AND ITS APPLICATION IN ROBOTICS U. Ceseña 1 & R. Muraoka 2 1 Instituto de Astronomía de la Universidad Nacional Autónoma de México (IA - UNAM), Departamento de Computo, Km. 103

More information

Using Text Mining to Locate and Classify Research Papers

Using Text Mining to Locate and Classify Research Papers Using Text Mining to Locate and Classify Research Papers EDGAR ALAN CALVILLO MORENO Instituto Tecnologico de Aguascalientes Av. Adolfo Lopez Mateos 1801 Ote. CP 20256, Aguascalientes, Mexico alancalvillo@yahoo.com

More information

Face Recognition based Only on Eyes Information and Local Binary Pattern

Face Recognition based Only on Eyes Information and Local Binary Pattern Face Recognition based Only on Eyes Information and Local Binary Pattern Francisco Rosario-Verde, Joel Perez-Siles, Luis Aviles-Brito, Jesus Olivares-Mercado, Karina Toscano-Medina, and Hector Perez-Meana

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Univariate Margin Tree

Univariate Margin Tree Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,

More information

K-means clustering based filter feature selection on high dimensional data

K-means clustering based filter feature selection on high dimensional data International Journal of Advances in Intelligent Informatics ISSN: 2442-6571 Vol 2, No 1, March 2016, pp. 38-45 38 K-means clustering based filter feature selection on high dimensional data Dewi Pramudi

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Seismic regionalization based on an artificial neural network

Seismic regionalization based on an artificial neural network Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx

More information

Feature Selection and Revision of Optimisation Problems

Feature Selection and Revision of Optimisation Problems CO3091 - Computational Intelligence and Software Engineering Lecture 26 Image from: http://vignette1.wikia.nocookie.net/pirates/images/3/38/fight_on_isla_de_muerta_16.png/revision/latest?cb=20110702154006

More information

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,

More information

CHAPTER 7 A GRID CLUSTERING ALGORITHM

CHAPTER 7 A GRID CLUSTERING ALGORITHM CHAPTER 7 A GRID CLUSTERING ALGORITHM 7.1 Introduction The grid-based methods have widely been used over all the algorithms discussed in previous chapters due to their rapid clustering results. In this

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: Dimensionality Using Optimization Algorithm for High Dimensional Data Clustering Saranya.S* Dr.Punithavalli.M** Abstract: This paper present an efficient approach to a feature selection problem based on

More information

Multipurpose Color Image Watermarking Algorithm Based on IWT and Halftoning

Multipurpose Color Image Watermarking Algorithm Based on IWT and Halftoning Multipurpose Color Image Watermarking Algorithm Based on IWT and Halftoning C. SANTIAGO-AVILA, M. GONZALEZ LEE, M. NAKANO-MIYATAKE, H. PEREZ-MEANA Sección de Posgrado e Investigación, Esime Culhuacan Instituto

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

Simultaneous Feature Selection and Extraction Using Feature Significance

Simultaneous Feature Selection and Extraction Using Feature Significance Fundamenta Informaticae 136 (2015) 405 431 405 DOI 10.3233/FI-2015-1164 IOS Press Simultaneous Feature Selection and Extraction Using Feature Significance Pradipta Maji Machine Intelligence Unit, Indian

More information

Parallel Linear Algebra on Clusters

Parallel Linear Algebra on Clusters Parallel Linear Algebra on Clusters Fernando G. Tinetti Investigador Asistente Comisión de Investigaciones Científicas Prov. Bs. As. 1 III-LIDI, Facultad de Informática, UNLP 50 y 115, 1er. Piso, 1900

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Cluster-based instance selection for machine classification

Cluster-based instance selection for machine classification Knowl Inf Syst (2012) 30:113 133 DOI 10.1007/s10115-010-0375-z REGULAR PAPER Cluster-based instance selection for machine classification Ireneusz Czarnowski Received: 24 November 2009 / Revised: 30 June

More information

Article Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-d Data

Article Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-d Data Article Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-d Data Boris Kovalerchuk * and Dmytro Dovhalets Department of Computer Science, Central Washington

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Package kernelfactory

Package kernelfactory Package kernelfactory September 29, 2015 Type Package Title Kernel Factory: An Ensemble of Kernel Machines Version 0.3.0 Date 2015-09-29 Imports randomforest, AUC, genalg, kernlab, stats Author Michel

More information

A Heart Disease Risk Prediction System Based On Novel Technique Stratified Sampling

A Heart Disease Risk Prediction System Based On Novel Technique Stratified Sampling IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. X (Mar-Apr. 2014), PP 32-37 A Heart Disease Risk Prediction System Based On Novel Technique

More information

Mostafa Salama Abdel-hady

Mostafa Salama Abdel-hady By Mostafa Salama Abdel-hady British University in Egypt Supervised by Professor Aly A. Fahmy Cairo university Professor Aboul Ellah Hassanien Introduction Motivation Problem definition Data mining scheme

More information

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract

More information

Special Topic: Missing Values. Missing Can Mean Many Things. Missing Values Common in Real Data

Special Topic: Missing Values. Missing Can Mean Many Things. Missing Values Common in Real Data Special Topic: Missing Values Missing Values Common in Real Data Pneumonia: 6.3% of attribute values are missing one attribute is missing in 61% of cases C-Section: only about 1/2% of attribute values

More information

Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification

Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification Bing Xue, Mengjie Zhang, and Will N. Browne School of Engineering and Computer Science Victoria University of

More information

Processing Missing Values with Self-Organized Maps

Processing Missing Values with Self-Organized Maps Processing Missing Values with Self-Organized Maps David Sommer, Tobias Grimm, Martin Golz University of Applied Sciences Schmalkalden Department of Computer Science D-98574 Schmalkalden, Germany Phone:

More information

Optimizing feature representation for speaker diarization using PCA and LDA

Optimizing feature representation for speaker diarization using PCA and LDA Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

thalesgroup.com Thales in Mexico Factsheet

thalesgroup.com Thales in Mexico Factsheet thalesgroup.com Thales in Mexico Factsheet Thales has been present for over 50 years in Mexico and currently employs 330 persons. Thales has successfully deployed urban security solutions, radio communication

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Using Network Analysis to Improve Nearest Neighbor Classification of Non-Network Data

Using Network Analysis to Improve Nearest Neighbor Classification of Non-Network Data Using Network Analysis to Improve Nearest Neighbor Classification of Non-Network Data Maciej Piernik, Dariusz Brzezinski, Tadeusz Morzy, and Mikolaj Morzy Institute of Computing Science, Poznan University

More information

Geometric Decision Rules for Instance-based Learning Problems

Geometric Decision Rules for Instance-based Learning Problems Geometric Decision Rules for Instance-based Learning Problems 1 Binay Bhattacharya 1 Kaustav Mukherjee 2 Godfried Toussaint 1 School of Computing Science, Simon Fraser University Burnaby, B.C., Canada

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

Detection and Deletion of Outliers from Large Datasets

Detection and Deletion of Outliers from Large Datasets Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant

More information