Mostafa Salama Abdel-hady
|
|
- Katherine Jones
- 5 years ago
- Views:
Transcription
1 By Mostafa Salama Abdel-hady British University in Egypt Supervised by Professor Aly A. Fahmy Cairo university Professor Aboul Ellah Hassanien
2 Introduction Motivation Problem definition Data mining scheme Preprocessing stage Machine learning stage Visualization stage Experimental work and analysis Conclusion and future work 2
3 Assist in decision making of large number of data. Perform automated, lowcost and effective analysis of data. Reduce the need of high cost specialized tests. Prediction of how patient react to therapy. 3
4 Decision tree Bayes belief PCA Formal concept analysis Perform better on discrete data sets. Univariate attributes assumption. Normal distribution assumption. Representation of binary data only. Data Mining Technique Assumption 4
5 Hospital data Contacts Test data Diagnosis Medical Referrals Resources of medical data Medical data is not adjusted to a specific machine learning 5
6 The required in this study is to do the following: Review the data mining techniques in its different stages. Propose solutions and techniques in each stage of data mining to handle challenges of various data characteristics. 6
7 Data Mining Data Pre-processing Machine Learning Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 7
8 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 8
9 Thesis content 9
10 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 10
11 Row Data Data Pre-processing Cleaning Transformation Feature reduction Data Cleaning Fill in missing values. Smooth noisy data. Identify or remove outliers. Resolve inconsistencies. Data Transformation Normalization: scaling to a specific range. Discretization: with particular importance, especially for numerical data. Data Reduction Feature Selection Feature Extraction 11
12 Objective function : Evaluates features by: {In wrapper : Pattern classifier} High computation cost {In filter : Feature evaluation} Relies only on the data properties Search Algorithm :selects a subset of features based on the evaluation from the objective function. Training Data Feature Subset Selection Objective Function Feature Subset Search algorithm Final subset of features Classification Algorithm Predictive accuracy 12
13 Sort the features according to the ranks from the feature evaluation techniques like Chisquare, Chi-Merge, Mutual information. Feature selection algorithm selects feature that shows the highest classification accuracy. 13
14 Most of feature evaluation functions assumes the discreteness of data. Descretization process should be applied in case of continuous data sets. Continuous data Descretization Data Transformation Descret data Feature Evaluation Data Reduction Deterioration in the internal structure of the data 14
15 The evaluation of the attributes is based on the following hypothesis: The decrease in the overlapped interval of values for every class label leads to an increase in the importance of such attribute. The rank of each feature is [y/x] 15
16 Classification accuracy percentage Comparison between different feature evaluation techniques. Selection of features using FFS IG SVMB IB Pima-Indian-diabetes data set Mostafa A. Salama, Kenneth Revett, Aboul Ella Hassanien and Aly A. Fahmy, Intervalbased attribute evaluation algorithm, The 6th IEEE International Symposium Advances in Artificial Intelligence and Applications, Szczecin, Poland, Sep 18-21, pp ,
17 PCA1 depends on the degree to which the variables are linearly correlated, this is measured by the covariance Cij between attributes i and j PCA assumes that the input data is in a normal distribution form. data Normalization data PCA Data Transformation Data Deterioration in the Reduction internal structure of the data
18 If the correlation is used instead of covariance in PCA, Normalization can be avoided. Correlations can be also calculated from the variances and covariance: Correlation between variables i and j r ij Variance of variable i C i ij V V j Covariance of variables i and j Variance of variable j
19 Abalone data set Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Reducing the Influence of Normalization on Data Classification, The 6th International Conference on Next Generation Web Services Practices (NWeSP 2010), Gwalior, India, pp , Nov
20 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 20
21 Design a classifier that Avoid the dependency on assumption that may not exist in real-life medical data sets, Avoid dependency on a user defined parameters Avoid the sensitivity obstacles. Proposed techniques is a transition from a clustering technique named as Frequent pattern-based clustering technique to a classification technique. 21
22 Values of features a b c d Features Set of five different object object 1 object 2 object 3 object 4 object 5 22
23 Values of features a b c d Features object 1 object 2 object 5 Set of objects of similar patterns (cluster) 23
24 All objects of difference between i and j attributes less than the user defined δ are in the same cluster as they are sharing the pattern [ij]. Attributes 24
25 Steps of pattern-based classifier: 1. Calculate the Delta δ values for each pair (i j) features for every two objects u and v in the same class. max[ abs( u u ) abs( v v )] ij i 2. Searches for the frequent patterns in a set of objects in the same class. 3. Then uses these patterns to classify objects of unknown class. j i j 25
26 Comparison of the proposed model with other classification techniques according to the classification accuracy of Iris data set Classifier Accuracy in % PBC IB MLP NB CC DT BN Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Uni-Class Pattern-based Classification Model, The 10th IEEE International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egypt, pp , Nov Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Pattern-based Subspace Classification Model, The second World Congress on Nature and Biologically Inspired Computing (NaBIC2010), Kitakyushu, Japan, pp , Dec
27 Apply fuzzy logic on the level of features of each object The degree of membership of features to the class labels Apply fuzzy logic on the level of objects The degree of membership of object to target class Machine learning techniques that are dependent on the Euclidian space calculations Fuzzy C-Mean Support vector machine 27
28 Input features Euclidean calculation Discrimination factors calculation Training data set Feature selection (e.g. ChiMerge) Ranks generations Input data set 10 fold division Testing data set Input features Euclidean calculation Classify data 28
29 29 D k p P jk ik j i p x x x x d ) ( ), ( D k p P jk ik k j i p x x r x x d ) * ( ), ( D i i k r k Where rank value Euclidian distance : Modified Euclidian :
30 Kernel function Linear kernel: Polynomial kernel : k ( xi, x j ) r xik * x i j k k1.. D k ( x, x ) (1 r x * x ) k k1.. D ik jk jk p Sigmoidal kernel : k( xi, x j ) tanh( o rk xik * x jk 1) k1.. D Where rank value r k 2 k i1.. D 2 i 30
31 Classification accuracy % Data set used Conventional Fuzzified Indian diabetes Yeast WDBC Hepatitis Heart Sound Waveform Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Feature Evaluation Based Fuzzy C-Mean Classification, The IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Taibai, Taiwan, jun 30, pp ,
32 Classification accuracy % Data set used Conventional Fuzzified Indian diabetes Yeast WDBC Hepatitis Heart Sound Waveform Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Fuzzification of Euclidean space in machine learning techniques, International Journal of Approximate Reasoning, [Submitted] 32
33 Data Mining Data Pre-processing Machine Learning Visualization Knowledge Interpretation Row Data Cleaning Transformation Feature reduction Classification Clustering Rule generation Formal Concept Analysis Knowledge extraction Knowledge evaluation Useful Knowledge Knowledge Discovery in Database (KDD) 33
34 Objects Definition: Formal Concept Analysis (FCA) is a method for data analysis, knowledge representation in a visual form. Attributes a b c D e f Input table Visualization Formal Concept Analysis Above node attributes objects Below node Concept lattice a b d f c e
35 Nominal Scale Ordinal Scale Biordinal Scale 35
36 In case of continues features data set, the number of features increases after applying the scaling method. This increases the complexity of the generated concept lattice. 36
37 Apply binary cuts on each feature using ChiMerge method. Then validate the generated cuts using any feature evaluation method. Attribute selection ChiSquare values Binarization Attribute evaluation (e) True Validated lattice False Wrong lattice Try different Binarization algorithm 37
38 For the first node : Objects obj3, obj4, obj5 Attribute 1, 2 n 0 number of objects in class 0 n 1 number of objects in class 1 n total number of objects Evaluation e of each attribute as follows: e n 0 n If obj 3 and obj 4 are in class 0 And obj5 is in class 1 n 1 Then the e value of attribute 1 is of value = 1/3 Att 1 Att 2 obj 3 obj 4 obj 5 obj 6 obj 2 obj 1 38
39 Breast Cancer - Lattice generated 39
40 If n is the number of attributes. And m is the average number of scaled features. Generate lattice after applying scaling algorithm O(n * m) The value of m is usually greater than n {m n} O(n 2 ) While the generate lattice after applying the proposed technique O(n) Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Binarization and validation in formal concept analaysis, International Journal of Machine Learning and Cybernetics, [Submitted] 40
41 Still the main problem in machine learning is the dependence on assumptions that do not exist in many real life medical data sets. This problem is solved through a set of proposed techniques in this study. If these assumptions are avoided by building classifiers that are less dependent on specific characteristics, better results could be resulted. 41
42 In future, we need to define a methodology to select the appropriate classifier that can successively produces the highest classification accuracy. Egypt is in real need of advanced multidisciplinary research in order to combat the hepatitis C epidemic. Automatic data integration web interface. Implement a kind of social network that is capable to gather different types of groups including patients, doctors and medical or health institutes. 42
43 Journal Papers: 1. Mostafa A. Salama, O.S. Soliman, I. Maglogiannisa, A.E. Hassanien, Aly A. Fahmy, Frequent pattern-based classification model without data presumptions, Computers and artificial intelligence, [Submitted] 2. Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Binarization and validation in formal concept analaysis, International Journal of Machine Learning and Cybernetics, [Submitted] 3. Mostafa A. Salama, A.E. Hassanien, Aly A. Fahmy, Fuzzification of Euclidean space in machine learning techniques, International Journal of Approximate Reasoning, [Submitted] 4. Mostafa A.Salama, Kenneth Revett, Aboul Ella Hassanien, Aly A. Fahmy, An investigation on mapping classifiers onto data sets, Journal of Intelligent Information Systems, [Submitted] Peer Reviewed Book Chapters: 5. Mostafa A. Salama, O.S. Soliman, I. Maglogiannisa, A.E. Hassanien and Aly A. Fahmy, Rough set-based identification of heart valve diseases using heart sounds, Intelligent Systems Reference Library, ISRL series, [In press] 43
44 Peer Reviewed International Conference: 6. Mostafa A. Salama, Aboul Ella Hassanien, Aly A. fahmy, Jan Platos and Vaclav Snasel, Fuzzification of Euclidian Space in Fuzzy C-mean and Support Vector Machine Techniques, The 3rd International Conference on Intelligent Human Computer Interaction (IHCI2011), Pragu, published by Springer as part of their Advances in Soft Computing series, Aug , Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Feature Evaluation Based Fuzzy C-Mean Classification, The IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Taibai, Taiwan, jun 30, pp , Mostafa A. Salama, Kenneth Revett, Aboul Ella Hassanien and Aly A. Fahmy, Interval-based attribute evaluation algorithm, The 6th IEEE International Symposium Advances in Artificial Intelligence and Applications, Szczecin, Poland, Sep 18-21, pp , Mostafa A. Salama, Aboul Ella Hassanien, Aly A. Fahmy, Tai-hoon Kim, Heart Sound Feature Reduction Approach for Improving the Heart Valve Diseases Identification, The 2nd International Conference on Signal Processing, Image Processing and Pattern Recognition (SIP 2011), Dec. 8-10, 2011, International Convention Center Jeju, Jeju Island, Korea, CCIS/LNCS Springer series, vol. 260 (Indexed by SCOPUS),
45 10. Mostafa A. Salama, Aboul Ella Hassanien, Jan Platos, Aly A. Fahmy and Vaclav Snasel, Rough Sets-based Identification of Heart Valve Diseases using Heart Sounds, The 3rd International Conference on Intelligent Human Computer Interaction (IHCI2011), Prague, published by Springer as part of their Advances in Soft Computing series, Aug , Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Uni-Class Patternbased Classification Model, The 10th IEEE International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egypt, pp , Dec Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Pattern-based Subspace Classification Model, The second World Congress on Nature and Biologically Inspired Computing (NaBIC2010), Kitakyushu, Japan, pp , Dec Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Reducing the Influence of Normalization on Data Classification, The 6th International Conference on Next Generation Web Services Practices (NWeSP 2010), Gwalior, India, pp , Nov Mostafa A. Salama, Aboul Ella Hassanien and Aly A. Fahmy, Deep Belief Network for clustering and classification of a continuous data, The IEEE International Symposium on Signal Processing and Information Technology (SSPT2010), Luxor, Egypt, pp ,
46 Papers outside the medical data scope: 15. Mostafa A. Salama, Heba F. Eid, Rabie A. Ramadan, Ashraf Darwish and Aboul Ella Hassanien, Hybrid Intelligent Intrusion Detection Scheme, Advances in Intelligent and Soft Computing, vol. 96, pp ,
3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationSSV Criterion Based Discretization for Naive Bayes Classifiers
SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,
More informationSalman Ahmed.G* et al. /International Journal of Pharmacy & Technology
ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com A FRAMEWORK FOR CLASSIFICATION OF MEDICAL DATA USING BIJECTIVE SOFT SET Salman Ahmed.G* Research Scholar M. Tech
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationCLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi
CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the
More informationA STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES
A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationKeywords: clustering algorithms, unsupervised learning, cluster validity
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationPSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier Alaa Tharwat 1,2,5, Aboul Ella Hassanien 3,4,5 1 Dept. of Electricity- Faculty of Engineering- Suez Canal University,
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationCloNI: clustering of JN -interval discretization
CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically
More informationFEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION
FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationFeature Selection in Knowledge Discovery
Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationSeminars of Software and Services for the Information Society
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationAutomatic Generation of Fuzzy Classification Rules Using Granulation-Based Adaptive Clustering
Automatic Generation of Fuzzy Classification Rules Using Granulation-Based Adaptive Clustering Mohammed Al-Shammaa*, Maysam F. Abbod Department of Electronic and Computer Engineering Brunel University
More informationMass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality
Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Abstract: Mass classification of objects is an important area of research and application in a variety of fields. In this
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationA Novel method for Frequent Pattern Mining
A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate
More informationDATA CLASSIFICATORY TECHNIQUES
DATA CLASSIFICATORY TECHNIQUES AMRENDER KUMAR AND V.K.BHATIA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 akjha@iasri.res.in 1. Introduction Rudimentary, exploratory
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationA Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationDomestic electricity consumption analysis using data mining techniques
Domestic electricity consumption analysis using data mining techniques Prof.S.S.Darbastwar Assistant professor, Department of computer science and engineering, Dkte society s textile and engineering institute,
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationProstate Detection Using Principal Component Analysis
Prostate Detection Using Principal Component Analysis Aamir Virani (avirani@stanford.edu) CS 229 Machine Learning Stanford University 16 December 2005 Introduction During the past two decades, computed
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationMin-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection
Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty
More informationDESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES
EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationDiscretizing Continuous Attributes Using Information Theory
Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationDeep Belief Network for Clustering and Classification of a Continuous Data
Deep Belief Network for Clustering and Classification of a Continuous Data Mostafa A. SalamaI, Aboul Ella Hassanien" Aly A. Fahmy2 'Department of Computer Science, British University in Egypt, Cairo, Egypt
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationPartition Based Perturbation for Privacy Preserving Distributed Data Mining
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationPCA-NB Algorithm to Enhance the Predictive Accuracy
PCA-NB Algorithm to Enhance the Predictive Accuracy T.Karthikeyan 1, P.Thangaraju 2 1 Associate Professor, Dept. of Computer Science, P.S.G Arts and Science College, Coimbatore, India 2 Research Scholar,
More informationIdris Mala. Qualification. Area of Specialization. Contact Information: Ext.: 3035
Idris Mala Contact Information: Email: imala@uit.edu Ext.: 3035 Qualification ME (Telecom) UIT HU 2006 MS (Info. Tech.) Hamdard University 2002 BE (Electrical) NEDUET 1984 Area of Specialization Database
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationRough Set Approaches to Rule Induction from Incomplete Data
Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough
More informationData Collection, Preprocessing and Implementation
Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationOutlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering
World Journal of Computer Application and Technology 5(2): 24-29, 2017 DOI: 10.13189/wjcat.2017.050202 http://www.hrpub.org Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering
More information5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection
ENHANCED FUZZY ROUGHSET BASED FEATURE SELECTION 5 TECHNIQUE USING DIFFERENTIAL EVOLUTION 5.1 Data Reduction 5.1.1 Dimensionality Reduction 5.2 Feature Transformation 5.2.1 Principal Component Analysis
More informationPredicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models
Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Kunal Sharma CS 4641 Machine Learning Abstract Supervised learning classification algorithms
More informationGlobal Journal of Engineering Science and Research Management
ADVANCED K-MEANS ALGORITHM FOR BRAIN TUMOR DETECTION USING NAIVE BAYES CLASSIFIER Veena Bai K*, Dr. Niharika Kumar * MTech CSE, Department of Computer Science and Engineering, B.N.M. Institute of Technology,
More informationAvailable online at ScienceDirect. Procedia Computer Science 35 (2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 388 396 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationDouble Sort Algorithm Resulting in Reference Set of the Desired Size
Biocybernetics and Biomedical Engineering 2008, Volume 28, Number 4, pp. 43 50 Double Sort Algorithm Resulting in Reference Set of the Desired Size MARCIN RANISZEWSKI* Technical University of Łódź, Computer
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationSupervised classification exercice
Universitat Politècnica de Catalunya Master in Artificial Intelligence Computational Intelligence Supervised classification exercice Authors: Miquel Perelló Nieto Marc Albert Garcia Gonzalo Date: December
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationSupport Vector Machines for visualization and dimensionality reduction
Support Vector Machines for visualization and dimensionality reduction Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland tmaszczyk@is.umk.pl;google:w.duch
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationREMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationName of the lecturer Doç. Dr. Selma Ayşe ÖZEL
Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended
More informationAN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION TECHNIQUE IN DIABETES DATA SET
Volume 119 No. 16 2018, 411-420 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ AN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationA Study on Association Rule Mining Using ACO Algorithm for Generating Optimized ResultSet
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 11, November 2013,
More informationData Engineering. Data preprocessing and transformation
Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationModified-MCA Based Feature Selection Model for Preprocessing Step of Classification
Modified- Based Feature Selection Model for Preprocessing Step of Classification Myo Khaing and Nang Saing Moon Kham, Member IACSIT Abstract Feature subset selection is a technique for reducing the attribute
More information