Mostafa Salama Abdel-hady

Size: px
Start display at page:

Download "Mostafa Salama Abdel-hady"

Transcription

1 By Mostafa Salama Abdel-hady British University in Egypt Supervised by Professor Aly A. Fahmy Cairo university Professor Aboul Ellah Hassanien

2 Introduction Motivation Problem definition Data mining scheme Preprocessing stage Machine learning stage Visualization stage Experimental work and analysis Conclusion and future work 2

3 Assist in decision making of large number of data. Perform automated, lowcost and effective analysis of data. Reduce the need of high cost specialized tests. Prediction of how patient react to therapy. 3

4 Decision tree Bayes belief PCA Formal concept analysis Perform better on discrete data sets. Univariate attributes assumption. Normal distribution assumption. Representation of binary data only. Data Mining Technique Assumption 4

5 Hospital data Contacts Test data Diagnosis Medical Referrals Resources of medical data Medical data is not adjusted to a specific machine learning 5

6 The required in this study is to do the following: Review the data mining techniques in its different stages. Propose solutions and techniques in each stage of data mining to handle challenges of various data characteristics. 6

7 Data Mining Data Pre-processing Machine Learning Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 7

8 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 8

9 Thesis content 9

10 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 10

11 Row Data Data Pre-processing Cleaning Transformation Feature reduction Data Cleaning Fill in missing values. Smooth noisy data. Identify or remove outliers. Resolve inconsistencies. Data Transformation Normalization: scaling to a specific range. Discretization: with particular importance, especially for numerical data. Data Reduction Feature Selection Feature Extraction 11

12 Objective function : Evaluates features by: {In wrapper : Pattern classifier} High computation cost {In filter : Feature evaluation} Relies only on the data properties Search Algorithm :selects a subset of features based on the evaluation from the objective function. Training Data Feature Subset Selection Objective Function Feature Subset Search algorithm Final subset of features Classification Algorithm Predictive accuracy 12

13 Sort the features according to the ranks from the feature evaluation techniques like Chisquare, Chi-Merge, Mutual information. Feature selection algorithm selects feature that shows the highest classification accuracy. 13

14 Most of feature evaluation functions assumes the discreteness of data. Descretization process should be applied in case of continuous data sets. Continuous data Descretization Data Transformation Descret data Feature Evaluation Data Reduction Deterioration in the internal structure of the data 14

15 The evaluation of the attributes is based on the following hypothesis: The decrease in the overlapped interval of values for every class label leads to an increase in the importance of such attribute. The rank of each feature is [y/x] 15

16 Classification accuracy percentage Comparison between different feature evaluation techniques. Selection of features using FFS IG SVMB IB Pima-Indian-diabetes data set Mostafa A. Salama, Kenneth Revett, Aboul Ella Hassanien and Aly A. Fahmy, Intervalbased attribute evaluation algorithm, The 6th IEEE International Symposium Advances in Artificial Intelligence and Applications, Szczecin, Poland, Sep 18-21, pp ,

17 PCA1 depends on the degree to which the variables are linearly correlated, this is measured by the covariance Cij between attributes i and j PCA assumes that the input data is in a normal distribution form. data Normalization data PCA Data Transformation Data Deterioration in the Reduction internal structure of the data

18 If the correlation is used instead of covariance in PCA, Normalization can be avoided. Correlations can be also calculated from the variances and covariance: Correlation between variables i and j r ij Variance of variable i C i ij V V j Covariance of variables i and j Variance of variable j

19 Abalone data set Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Reducing the Influence of Normalization on Data Classification, The 6th International Conference on Next Generation Web Services Practices (NWeSP 2010), Gwalior, India, pp , Nov

20 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 20

21 Design a classifier that Avoid the dependency on assumption that may not exist in real-life medical data sets, Avoid dependency on a user defined parameters Avoid the sensitivity obstacles. Proposed techniques is a transition from a clustering technique named as Frequent pattern-based clustering technique to a classification technique. 21

22 Values of features a b c d Features Set of five different object object 1 object 2 object 3 object 4 object 5 22

23 Values of features a b c d Features object 1 object 2 object 5 Set of objects of similar patterns (cluster) 23

24 All objects of difference between i and j attributes less than the user defined δ are in the same cluster as they are sharing the pattern [ij]. Attributes 24

25 Steps of pattern-based classifier: 1. Calculate the Delta δ values for each pair (i j) features for every two objects u and v in the same class. max[ abs( u u ) abs( v v )] ij i 2. Searches for the frequent patterns in a set of objects in the same class. 3. Then uses these patterns to classify objects of unknown class. j i j 25

26 Comparison of the proposed model with other classification techniques according to the classification accuracy of Iris data set Classifier Accuracy in % PBC IB MLP NB CC DT BN Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Uni-Class Pattern-based Classification Model, The 10th IEEE International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egypt, pp , Nov Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Pattern-based Subspace Classification Model, The second World Congress on Nature and Biologically Inspired Computing (NaBIC2010), Kitakyushu, Japan, pp , Dec

27 Apply fuzzy logic on the level of features of each object The degree of membership of features to the class labels Apply fuzzy logic on the level of objects The degree of membership of object to target class Machine learning techniques that are dependent on the Euclidian space calculations Fuzzy C-Mean Support vector machine 27

28 Input features Euclidean calculation Discrimination factors calculation Training data set Feature selection (e.g. ChiMerge) Ranks generations Input data set 10 fold division Testing data set Input features Euclidean calculation Classify data 28

29 29 D k p P jk ik j i p x x x x d ) ( ), ( D k p P jk ik k j i p x x r x x d ) * ( ), ( D i i k r k Where rank value Euclidian distance : Modified Euclidian :

30 Kernel function Linear kernel: Polynomial kernel : k ( xi, x j ) r xik * x i j k k1.. D k ( x, x ) (1 r x * x ) k k1.. D ik jk jk p Sigmoidal kernel : k( xi, x j ) tanh( o rk xik * x jk 1) k1.. D Where rank value r k 2 k i1.. D 2 i 30

31 Classification accuracy % Data set used Conventional Fuzzified Indian diabetes Yeast WDBC Hepatitis Heart Sound Waveform Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Feature Evaluation Based Fuzzy C-Mean Classification, The IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Taibai, Taiwan, jun 30, pp ,

32 Classification accuracy % Data set used Conventional Fuzzified Indian diabetes Yeast WDBC Hepatitis Heart Sound Waveform Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Fuzzification of Euclidean space in machine learning techniques, International Journal of Approximate Reasoning, [Submitted] 32

33 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 33

34 Objects Definition: Formal Concept Analysis (FCA) is a method for data analysis, knowledge representation in a visual form. Attributes a b c D e f Input table Visualization Formal Concept Analysis Above node attributes objects Below node Concept lattice a b d f c e

35 Nominal Scale Ordinal Scale Biordinal Scale 35

36 In case of continues features data set, the number of features increases after applying the scaling method. This increases the complexity of the generated concept lattice. 36

37 Apply binary cuts on each feature using ChiMerge method. Then validate the generated cuts using any feature evaluation method. Attribute selection ChiSquare values Binarization Attribute evaluation (e) True Validated lattice False Wrong lattice Try different Binarization algorithm 37

38 For the first node : Objects obj3, obj4, obj5 Attribute 1, 2 n 0 number of objects in class 0 n 1 number of objects in class 1 n total number of objects Evaluation e of each attribute as follows: e n 0 n If obj 3 and obj 4 are in class 0 And obj5 is in class 1 n 1 Then the e value of attribute 1 is of value = 1/3 Att 1 Att 2 obj 3 obj 4 obj 5 obj 6 obj 2 obj 1 38

39 Breast Cancer - Lattice generated 39

40 If n is the number of attributes. And m is the average number of scaled features. Generate lattice after applying scaling algorithm O(n * m) The value of m is usually greater than n {m n} O(n 2 ) While the generate lattice after applying the proposed technique O(n) Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Binarization and validation in formal concept analaysis, International Journal of Machine Learning and Cybernetics, [Submitted] 40

41 Still the main problem in machine learning is the dependence on assumptions that do not exist in many real life medical data sets. This problem is solved through a set of proposed techniques in this study. If these assumptions are avoided by building classifiers that are less dependent on specific characteristics, better results could be resulted. 41

42 In future, we need to define a methodology to select the appropriate classifier that can successively produces the highest classification accuracy. Egypt is in real need of advanced multidisciplinary research in order to combat the hepatitis C epidemic. Automatic data integration web interface. Implement a kind of social network that is capable to gather different types of groups including patients, doctors and medical or health institutes. 42

43 Journal Papers: 1. Mostafa A. Salama, O.S. Soliman, I. Maglogiannisa, A.E. Hassanien, Aly A. Fahmy, Frequent pattern-based classification model without data presumptions, Computers and artificial intelligence, [Submitted] 2. Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Binarization and validation in formal concept analaysis, International Journal of Machine Learning and Cybernetics, [Submitted] 3. Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Fuzzification of Euclidean space in machine learning techniques, International Journal of Approximate Reasoning, [Submitted] 4. Mostafa A.Salama, Kenneth Revett, Aboul Ella Hassanien, Aly A. Fahmy, An investigation on mapping classifiers onto data sets, Journal of Intelligent Information Systems, [Submitted] Peer Reviewed Book Chapters: 5. Mostafa A. Salama, O.S. Soliman, I. Maglogiannisa, A.E. Hassanien and Aly A. Fahmy, Rough set-based identification of heart valve diseases using heart sounds, Intelligent Systems Reference Library, ISRL series, [In press] 43

44 Peer Reviewed International Conference: 6. Mostafa A. Salama, Aboul Ella Hassanien, Aly A. fahmy, Jan Platos and Vaclav Snasel, Fuzzification of Euclidian Space in Fuzzy C-mean and Support Vector Machine Techniques, The 3rd International Conference on Intelligent Human Computer Interaction (IHCI2011), Pragu, published by Springer as part of their Advances in Soft Computing series, Aug , Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Feature Evaluation Based Fuzzy C-Mean Classification, The IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Taibai, Taiwan, jun 30, pp , Mostafa A. Salama, Kenneth Revett, Aboul Ella Hassanien and Aly A. Fahmy, Interval-based attribute evaluation algorithm, The 6th IEEE International Symposium Advances in Artificial Intelligence and Applications, Szczecin, Poland, Sep 18-21, pp , Mostafa A. Salama, Aboul Ella Hassanien, Aly A. Fahmy, Tai-hoon Kim, Heart Sound Feature Reduction Approach for Improving the Heart Valve Diseases Identification, The 2nd International Conference on Signal Processing, Image Processing and Pattern Recognition (SIP 2011), Dec. 8-10, 2011, International Convention Center Jeju, Jeju Island, Korea, CCIS/LNCS Springer series, vol. 260 (Indexed by SCOPUS),

45 10. Mostafa A. Salama, Aboul Ella Hassanien, Jan Platos, Aly A. Fahmy and Vaclav Snasel, Rough Sets-based Identification of Heart Valve Diseases using Heart Sounds, The 3rd International Conference on Intelligent Human Computer Interaction (IHCI2011), Prague, published by Springer as part of their Advances in Soft Computing series, Aug , Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Uni-Class Patternbased Classification Model, The 10th IEEE International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egypt, pp , Dec Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Pattern-based Subspace Classification Model, The second World Congress on Nature and Biologically Inspired Computing (NaBIC2010), Kitakyushu, Japan, pp , Dec Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Reducing the Influence of Normalization on Data Classification, The 6th International Conference on Next Generation Web Services Practices (NWeSP 2010), Gwalior, India, pp , Nov Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Deep Belief Network for clustering and classification of a continuous data, The IEEE International Symposium on Signal Processing and Information Technology (SSPT2010), Luxor, Egypt, pp ,

46 Papers outside the medical data scope: 15. Mostafa A. Salama, Heba F. Eid, Rabie A. Ramadan, Ashraf Darwish and Aboul Ella Hassanien, Hybrid Intelligent Intrusion Detection Scheme, Advances in Intelligent and Soft Computing, vol. 96, pp ,

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,

More information

Salman Ahmed.G* et al. /International Journal of Pharmacy & Technology

Salman Ahmed.G* et al. /International Journal of Pharmacy & Technology ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com A FRAMEWORK FOR CLASSIFICATION OF MEDICAL DATA USING BIJECTIVE SOFT SET Salman Ahmed.G* Research Scholar M. Tech

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the

More information

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier

PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier Alaa Tharwat 1,2,5, Aboul Ella Hassanien 3,4,5 1 Dept. of Electricity- Faculty of Engineering- Suez Canal University,

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Machine Learning with MATLAB --classification

Machine Learning with MATLAB --classification Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Feature Selection in Knowledge Discovery

Feature Selection in Knowledge Discovery Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

Seminars of Software and Services for the Information Society

Seminars of Software and Services for the Information Society DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

Automatic Generation of Fuzzy Classification Rules Using Granulation-Based Adaptive Clustering

Automatic Generation of Fuzzy Classification Rules Using Granulation-Based Adaptive Clustering Automatic Generation of Fuzzy Classification Rules Using Granulation-Based Adaptive Clustering Mohammed Al-Shammaa*, Maysam F. Abbod Department of Electronic and Computer Engineering Brunel University

More information

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Abstract: Mass classification of objects is an important area of research and application in a variety of fields. In this

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

A Novel method for Frequent Pattern Mining

A Novel method for Frequent Pattern Mining A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate

More information

DATA CLASSIFICATORY TECHNIQUES

DATA CLASSIFICATORY TECHNIQUES DATA CLASSIFICATORY TECHNIQUES AMRENDER KUMAR AND V.K.BHATIA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 akjha@iasri.res.in 1. Introduction Rudimentary, exploratory

More information

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi

More information

Data Preprocessing. Data Mining 1

Data Preprocessing. Data Mining 1 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Domestic electricity consumption analysis using data mining techniques

Domestic electricity consumption analysis using data mining techniques Domestic electricity consumption analysis using data mining techniques Prof.S.S.Darbastwar Assistant professor, Department of computer science and engineering, Dkte society s textile and engineering institute,

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

Prostate Detection Using Principal Component Analysis

Prostate Detection Using Principal Component Analysis Prostate Detection Using Principal Component Analysis Aamir Virani (avirani@stanford.edu) CS 229 Machine Learning Stanford University 16 December 2005 Introduction During the past two decades, computed

More information

Classification using Weka (Brain, Computation, and Neural Learning)

Classification using Weka (Brain, Computation, and Neural Learning) LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Discretizing Continuous Attributes Using Information Theory

Discretizing Continuous Attributes Using Information Theory Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Deep Belief Network for Clustering and Classification of a Continuous Data

Deep Belief Network for Clustering and Classification of a Continuous Data Deep Belief Network for Clustering and Classification of a Continuous Data Mostafa A. SalamaI, Aboul Ella Hassanien" Aly A. Fahmy2 'Department of Computer Science, British University in Egypt, Cairo, Egypt

More information

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

AMOL MUKUND LONDHE, DR.CHELPA LINGAM International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Discriminate Analysis

Discriminate Analysis Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

PCA-NB Algorithm to Enhance the Predictive Accuracy

PCA-NB Algorithm to Enhance the Predictive Accuracy PCA-NB Algorithm to Enhance the Predictive Accuracy T.Karthikeyan 1, P.Thangaraju 2 1 Associate Professor, Dept. of Computer Science, P.S.G Arts and Science College, Coimbatore, India 2 Research Scholar,

More information

Idris Mala. Qualification. Area of Specialization. Contact Information: Ext.: 3035

Idris Mala. Qualification. Area of Specialization. Contact Information:   Ext.: 3035 Idris Mala Contact Information: Email: imala@uit.edu Ext.: 3035 Qualification ME (Telecom) UIT HU 2006 MS (Info. Tech.) Hamdard University 2002 BE (Electrical) NEDUET 1984 Area of Specialization Database

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Rough Set Approaches to Rule Induction from Incomplete Data

Rough Set Approaches to Rule Induction from Incomplete Data Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering World Journal of Computer Application and Technology 5(2): 24-29, 2017 DOI: 10.13189/wjcat.2017.050202 http://www.hrpub.org Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

More information

5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection

5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection ENHANCED FUZZY ROUGHSET BASED FEATURE SELECTION 5 TECHNIQUE USING DIFFERENTIAL EVOLUTION 5.1 Data Reduction 5.1.1 Dimensionality Reduction 5.2 Feature Transformation 5.2.1 Principal Component Analysis

More information

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Kunal Sharma CS 4641 Machine Learning Abstract Supervised learning classification algorithms

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management ADVANCED K-MEANS ALGORITHM FOR BRAIN TUMOR DETECTION USING NAIVE BAYES CLASSIFIER Veena Bai K*, Dr. Niharika Kumar * MTech CSE, Department of Computer Science and Engineering, B.N.M. Institute of Technology,

More information

Available online at ScienceDirect. Procedia Computer Science 35 (2014 )

Available online at  ScienceDirect. Procedia Computer Science 35 (2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 388 396 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

Double Sort Algorithm Resulting in Reference Set of the Desired Size

Double Sort Algorithm Resulting in Reference Set of the Desired Size Biocybernetics and Biomedical Engineering 2008, Volume 28, Number 4, pp. 43 50 Double Sort Algorithm Resulting in Reference Set of the Desired Size MARCIN RANISZEWSKI* Technical University of Łódź, Computer

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Supervised classification exercice

Supervised classification exercice Universitat Politècnica de Catalunya Master in Artificial Intelligence Computational Intelligence Supervised classification exercice Authors: Miquel Perelló Nieto Marc Albert Garcia Gonzalo Date: December

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Support Vector Machines for visualization and dimensionality reduction

Support Vector Machines for visualization and dimensionality reduction Support Vector Machines for visualization and dimensionality reduction Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland tmaszczyk@is.umk.pl;google:w.duch

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

AN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION TECHNIQUE IN DIABETES DATA SET

AN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION TECHNIQUE IN DIABETES DATA SET Volume 119 No. 16 2018, 411-420 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ AN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

Statistical dependence measure for feature selection in microarray datasets

Statistical dependence measure for feature selection in microarray datasets Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department

More information

A Study on Association Rule Mining Using ACO Algorithm for Generating Optimized ResultSet

A Study on Association Rule Mining Using ACO Algorithm for Generating Optimized ResultSet Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 11, November 2013,

More information

Data Engineering. Data preprocessing and transformation

Data Engineering. Data preprocessing and transformation Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Modified-MCA Based Feature Selection Model for Preprocessing Step of Classification

Modified-MCA Based Feature Selection Model for Preprocessing Step of Classification Modified- Based Feature Selection Model for Preprocessing Step of Classification Myo Khaing and Nang Saing Moon Kham, Member IACSIT Abstract Feature subset selection is a technique for reducing the attribute

More information