Improving Classifier Performance by Imputing Missing Values using Discretization Method
|
|
- Dwayne Gilmore
- 6 years ago
- Views:
Transcription
1 Improving Classifier Performance by Imputing Missing Values using Discretization Method E. CHANDRA BLESSIE Assistant Professor, Department of Computer Science, D.J.Academy for Managerial Excellence, Coimbatore, Tamil Nadu, India, DR.E. KARTHIKEYAN Assistant Professor, Department of Computer Science, Government Arts College, Udumalpet, Tamil Nadu, India, Abstract DR.V.THAVAVEL HOD and Assistant Professor (SG), Department of Computer Application, School of Computer Science and Technology, Karunya University, Tamil Nadu, India. The presence of the missing values in a dataset can affect the performance of a classifier. Missing values can be replaced with the estimated values based on some information available in the data set. Several have been proposed to deal with the missing values. In this paper, six different approaches are presented to fill the missing values. Also, we propose a discretization based method which can increase the relevancy between the instances and attributes. Experimental analysis is made with four datasets to evaluate the performance of the C4.5 classifier. The performance is based on the accuracy of the classifier. The datasets are taken from the UCI ML repository. Keywords :, Data Mining, C4.5, Discretization, Preprocessing, Classifier 1. Introduction Many learning algorithms perform poorly when the training data are incomplete [Kalton and Kasprzyk (1986)][Mundfrom and Whitcomb (1998)]. Missing attribute values commonly exist in real-world data set. They may come from the data collecting process or redundant diagnose tests, unknown data and so on. One standard approach involves imputing the missing values, then giving the completed data to the learning algorithm. In general, the for treating the missing values can be divided into three categories [Mehala, et al. (2009)]: 1) ignoring/discarding the data which are the easiest and most commonly applied. 2) Parameter estimation where maximum likelihood procedures are used to estimate the parameters of a model. 3) Imputation techniques, where missing values are replaced with estimated ones. The objective is to employ known relationships that can be identified in the valid values of the dataset to assist in estimating the missing values. The rest of the paper is organized as follows. Section 2 discusses about the previous work. Section 3 explains the proposed Discretization based method. Experimental analysis and the comparison results are described in section 4. Conclusion and result discussion are described in section Review of the previous work This section surveys [Jerzy, et al. (2005)] some commonly and widely used imputation. Imputation method is one of the most frequently used [6]. It consists of replacing the missing data for a given feature (attribute) by the mean of all known values of that attribute in the class where the instance with missing attribute belongs. Let us consider that the value x ij of the k-th class, C k, is missing then it will be replaced by (x ij ) = Σ x ij /n k (1) x ij ЄC k ISSN : Vol. 4 No.03 March
2 where n k represents the number of non-missing values in the j-th attribute of the k-th class. Another two discard the data having missing values. The first method is known as complete case analysis. This method discards all instances having missing values [Tresp, et al. (1998)]. The second method determines the extents of missing values before deleting it. CN2 [Clark and Niblett.(1989)] algorithm uses a method selecting the most often occurring attribute value to fill the missing values of the attribute. The most common attribute value method does not pay any attention to the relationship between attributes and a decision. The concept most common attribute value method is a restriction of the first method to the concept, i.e., to all examples with the same value of the decision as an example with missing attribute vale. CART replaces a missing value of a given attribute using the corresponding value of a surrogate attribute, which has the highest correlation with the original attribute. C4.5 uses a probabilistic approach to handle missing data in both the training and the test sample [Quinlan (1993)]. 3. Proposed system 3.1 Discretization Discretization [Liu and Setiono (1997)] is a technique to partition continuous attributes into a finite set of adjacent intervals in order to generate attributes with a small number of distinct values. Each interval can then be treated as one value of new discrete attribute. Discretization of attributes can reduce the learning complexity and help to understand the dependencies between the attributes and the target class. Definition Assuming that a dataset consisting of N instances and S target classes, a Discretization algorithm would discretize the continuous attribute F in the dataset into n discrete intervals {[d 0,d 1 ],[d 1,d 2 ],.(d n-1,d n ]}, where d 0 is the minimal value and d n is the maximal value of attribute F. Such a discrete result {[d 0,d 1 ],[d 1,d 2 ],.(d n- 1,d n ]} is called a Discretization scheme D on attribute A. CAIM[Kurgan and Cros (2004)] and CACC[Tsai, et al. (2008)] finds the cutting points for the intervals by finding the middle value between each pair and initialize them as boundary points for each interval. But NAD [Blessie, et al. (2010)] finds the cutting points by finding the middle value between each pair where the two consecutive values have different class value and initialize them as boundary points. This reduces the time complexity. 3.2 Imputation using Discretization Let D={d 1,d 2,d 3,..d n } be the dataset and let the attributes be A={A 1,A 2,A 3,..A m }where m is the number of attributes. The proposed system consists of 2 phases. In the first phase, for each attribute, the data are sorted. Initial cutting points were found out between each pair of the instances in the attribute where the two consecutive values have different class value [Blessie, et al. (2010)]. Next step is to find the mean value within each interval for each class instead of finding the mean value of the entire non missing values in the dataset. Then the minimum values of the mean in each interval are used to fill the missing values corresponding to that class. This will increase the relevancy between the instances and attributes. In the second phase, the dataset with the filled in missing values are used to classify the dataset using c4.5 classifier and the accuracy of the classifier is analyzed. 3.3 Pseudocode Let D be the training data set with continuous features F i ; S classes. For every F i do: Phase 1 Step Find maximum (d n ) and minimum (d o ) values 1.2 sort all distinct values of F i in ascending order 1.3 Initialize all possible interval boundaries, B, with the minimum, maximum and the midpoints where the continuous features have different classes in the set B={[d 0,d 1 ][d 1,d 2 ],.,[d n-1,d n ]} Step For every interval [d i,d j ] where I is the lower bound and j is the upper bound, find the mean value corresponding to a single class value ISSN : Vol. 4 No.03 March
3 (x ij ) = Σ x ij /n k (2) x ij ЄC k 2.2 Find the minimum value of all the mean values corresponding to each class C k. 2.3 Fill the missing values of each class C k with the minimum mean value of the same class C k. Phase 2 Step Calculate the missclassification rate and accuracy by giving the filled in complete dataset into a classifier. End 4. Experimental Analysis Our experiments were carried out using four datasets taken from the Machine Learning Database UCI Repository. The datasets are Diabetes, Breast Cancer, Lung Cancer and Iris data sets. Table 1 describes the information such as number of instances and the number of attributes about the datasets used in this paper. The main objective of the experiments conducted in this work is to analyze the efficiency of the C4.5 classification algorithm. In these experiments, missing values are artificially imputed in different rates in different attributes. Datasets without missing values are taken and few values are removed from it randomly. The rates of the missing values removed are from 2% to 4%. Datasets Instances Attributes Diabetes 7 9 Iris Breast Cancer Lung Cancer Table 1. Datasets used for analysis A. Performance comparison of Diabetes dataset The original dataset without missing values yields the accurate classification rate of 73.83% and the proposed method increases the accuracy rate to.22%. The performance comparisons of five different and also the time taken to execute are shown in table 2. Methods Time Missclassification rate Discend (Proposed) Table 2 : comparison using the diabetes dataset B. Performance comparison of Breast Cancer dataset The original dataset without missing values yields the accurate classification rate of 94.56% and the proposed method increases the accuracy rate 94.71%. The performance comparisons of five different and the time taken to execute are shown in table 3. ISSN : Vol. 4 No.03 March
4 Methods Time Missclassification rate Discend (Proposed) Table 3 : comparison using the Breast Cancer dataset C. Performance comparison of IRIS dataset The original dataset without missing values yields the accurate classification rate of 96% and the Most often method and the proposed method increases the accuracy rate 95.33%. The performance comparisons of five different and the time taken to execute are shown in table 4. Methods Time Miss classification rate Discend (Proposed) Table 4 : comparison using the IRIS dataset D. Performance comparison of Lung Cancer dataset The original dataset without missing values yields the accurate classification rate of.13% and the proposed method increases the accuracy rate 79.42%. The performance comparisons of five different are shown in table 5. The time taken to execute is also given in the table 5. Methods Time Missclassification rate Discend (Proposed) Table 5 : comparison using the Lung Cancer dataset ISSN : Vol. 4 No.03 March
5 Percentage of accuracy for Diabetes dataset Discend (Proposed) Percentage of accuracy for Breast Cancer dataset Discend (Proposed) Fig : 1a Fig : 1b Percentage of accuracy for IRIS dataset Percentage of accuracy for Lung Cancer dataset Discend (Proposed) Fig : 1c Fig 1a-1d : Comparison result of C4.5 for 6 using 4 datasets Fig : 1d 5. Conclusion and Discussion From the comparison above, the classification rate for C4.5 classifier using the proposed method seems to be better than the remaining for three dataset except for IRIS dataset. Our experiment for filling the missing values was conducted using MatLab and the classifier performance was analyzed using Weka 3.6. Missing value problem must be solved before using the dataset as the incomplete data may lead to high misclassification rate. This work analyses the classification performance of the C4.5 classifier. The proposed approach uses only the numerical attributes to impute the missing values. In further it can be extended to handle categorical attributes. From the above comparison, the proposed method seems to be better than the three as the accuracy rate is increased for all the datasets. Also, while filling the missing values found out within the same class, the relevancy between the instances and the attributes can be increased which will give better result. References [1] Acuna,E.; Rodriguez,C. (2004): The treatment of missing values and its effect in the classifier accuracy. In: W. Gaul, D. Banks, L. House, F.R. McMorris, P. Arabie (Eds.) Classification, Clustering and Data Mining Applications, Springer-Verlag Berlin-Heidelberg, pp , [2] Blessie,C.E.; Karthikeyan,E.; Selvaraj,B. (2010): NAD A Discretization approach for improving interdependency, Journal of Advanced Research in Computer Science, 2(1), pp [3] Clark,P.; Niblett,T. (1989): The CN2 induction algorithm. Machine Learning 3, pp [4] Jerzy,W.; Grzymala-Busse1 and Ming Hu, (2005): A Comparison of Several Approaches to Missing Attribute Values in Data Mining, W. Ziarko and Y. Yao (Eds.): RSCTC 2000, LNAI, Springer-Verlag Berlin Heidelberg, pp [5] Kalton,G.; Kasprzyk,D. (1986): The treatment of missing survey data. Survey Methodology 12, pp [6] Kurgan,L.; Cros,K.J.; (2004): CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering 16(2), pp [7] Liu,H.; Setiono,R. (1997): Feature selection via discretization, IEEE Transactions on Knowledge and Data Engineering 9(4), pp [8] Mehala,B.; Ranjit Jeba Thangaiah,P.; Vivekanandan,K. (2009): Selecting Scalable Algorithms to Deal with Missing Values, International Journal of Recent Trends in Engineering, 1(2). [9] Mundfrom,D.J.; Whitcomb,A. (1998): Imputing missing values: The effect on the accuracy of classification. Multiple Linear Regression Viewpoints. 25 (1), pp [10] Quinlan,J.R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo CA. ISSN : Vol. 4 No.03 March
6 [11] Tresp,V.; Neuneier,R.; Ahmad,S. (1998): Efficient for dealing with missing data in supervised learning. In G. Tesauro, D. S. Touretzky, and Leen T. K., editors, Advances in NIPS 7. MIT Press. [12] Tsai,C.J.; Lee,C.; Yang,W.P. (2008): A Discretization algorithm based on Class-Attribute Contingency Coefficient, Information Sciences, 1(3), pp ISSN : Vol. 4 No.03 March
A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationA Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set
A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,
More informationCloNI: clustering of JN -interval discretization
CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationA Magnified Application of Deficient Data Using Bolzano Classifier
Invention Journal of Research Technology in Engineering & Management (IJRTEM) ISSN: 2455-3689 www.ijrtem.com Volume 1 Issue 4 ǁ June. 2016 ǁ PP 32-37 A Magnified Application of Deficient Data Using Bolzano
More informationEfficient SQL-Querying Method for Data Mining in Large Data Bases
Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationDiscretizing Continuous Attributes Using Information Theory
Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification
More informationAn ICA-Based Multivariate Discretization Algorithm
An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of
More informationA Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values
A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationRough Set Approaches to Rule Induction from Incomplete Data
Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough
More informationThe Role of Biomedical Dataset in Classification
The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences
More informationUncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique
Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationMissing Value Imputation in Multi Attribute Data Set
Missing Value Imputation in Multi Attribute Data Set Minakshi Dr. Rajan Vohra Gimpy Department of computer science Head of Department of (CSE&I.T) Department of computer science PDMCE, Bahadurgarh, Haryana
More informationMinimal Test Cost Feature Selection with Positive Region Constraint
Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding
More informationA STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES
A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationA Performance Assessment on Various Data mining Tool Using Support Vector Machine
SCITECH Volume 6, Issue 1 RESEARCH ORGANISATION November 28, 2016 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals A Performance Assessment on Various Data mining
More informationClustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming
Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer
More informationKeywords: clustering algorithms, unsupervised learning, cluster validity
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based
More informationUnivariate Margin Tree
Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,
More informationIntroducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values
Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More informationDecision Tree: Compatibility of Techniques for Handling Missing Values at Training and Testing
Decision Tree: Compatibility of Techniques for Handling Missing Values at Training and Sachin Gavankar Department of Computer Engineering Datta Meghe College of Engineering, Mumbai University Navi Mumbai,
More informationGlobal Journal of Engineering Science and Research Management
A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationProcessing Missing Values with Self-Organized Maps
Processing Missing Values with Self-Organized Maps David Sommer, Tobias Grimm, Martin Golz University of Applied Sciences Schmalkalden Department of Computer Science D-98574 Schmalkalden, Germany Phone:
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationSSV Criterion Based Discretization for Naive Bayes Classifiers
SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationAn Entropy Based Effective Algorithm for Data Discretization
ISSN : 2394-2975 (Online) International Journal of Advanced Research An Entropy Based Effective Algorithm for Data Discretization I Priyanka Das, II Sarita Sharma I M.Tech. Scholar, MATS University, Aarang,
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationA framework to Deal with Missing Data in Data Sets
Journal of Computer Science (9): 740-745, 006 ISSN 549-6 006 Science Publications A framework to Deal with Missing Data in Data Sets Luai Al Shalabi, Mohannad Najjar and Ahmad Al Kayed Faculty of Computer
More informationCorrelation Based Feature Selection with Irrelevant Feature Removal
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationBOOSTING THE ACCURACY OF WEAK LEARNER USING SEMI SUPERVISED CoGA TECHNIQUES
BOOSTING THE ACCURACY OF WEAK LEARNER USING SEMI SUPERVISED CoGA TECHNIQUES Kanchana S. and Antony Selvadoss Thanamani Department of Computer Science, NGM College, Pollachi, Bharathiyar University, Coimbatore,
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationSelection of n in K-Means Algorithm
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 6 (2014), pp. 577-582 International Research Publications House http://www. irphouse.com Selection of n in
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationEfficient Pruning Method for Ensemble Self-Generating Neural Networks
Efficient Pruning Method for Ensemble Self-Generating Neural Networks Hirotaka INOUE Department of Electrical Engineering & Information Science, Kure National College of Technology -- Agaminami, Kure-shi,
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationCOMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES
COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,
More informationRECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH
Int. J. Engg. Res. & Sci. & Tech. 2013 V Karthika et al., 2013 Research Paper ISSN 2319-5991 www.ijerst.com Vol. 2, No. 2, May 2013 2013 IJERST. All Rights Reserved RECORD DEDUPLICATION USING GENETIC PROGRAMMING
More informationA Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 3 March 2017, Page No. 20765-20769 Index Copernicus value (2015): 58.10 DOI: 18535/ijecs/v6i3.65 A Comparative
More informationInternational Journal of Computer Engineering and Applications, Volume XI, Issue IX, August 17, ISSN
International Journal of Computer Engineering and Applications, Volume XI, Issue IX, August 17, www.ijcea.com ISSN 2321-3469 MEASURE THE GROUTH OF INSTANCES BY APRIORI AND FILTERED ASSOCIATOR ALGORITHMS
More informationA Study on Factors Affecting the Non Guillotine Based Nesting Process Optimization
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 IEEE International Conference
More informationComparison of Online Record Linkage Techniques
International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.
More informationComparative Study of Clustering Algorithms using R
Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer
More informationCluster Analysis on Statistical Data using Agglomerative Method
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 33-38 International Research Publication House http://www.irphouse.com Cluster Analysis on Statistical
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationA Two Stage Zone Regression Method for Global Characterization of a Project Database
A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationClassification model with subspace data-dependent balls
Classification model with subspace data-dependent balls attapon Klakhaeng, Thanapat Kangkachit, Thanawin Rakthanmanon and Kitsana Waiyamai Data Analysis and Knowledge Discovery Lab Department of Computer
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationUnderstanding Rule Behavior through Apriori Algorithm over Social Network Data
Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationSYMBOLIC FEATURES IN NEURAL NETWORKS
SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:
More informationA HYBRID FEATURE SELECTION MODEL FOR SOFTWARE FAULT PREDICTION
A HYBRID FEATURE SELECTION MODEL FOR SOFTWARE FAULT PREDICTION C. Akalya devi 1, K. E. Kannammal 2 and B. Surendiran 3 1 M.E (CSE), Sri Shakthi Institute of Engineering and Technology, Coimbatore, India
More informationAutomated Test Case Generation using Data Mining
Automated Test Case Generation using Data Mining Mrs. B. Meena Preethi 1, Ms. R. Aishwarya 2, Mr. P. Pradeesh 3, Mr. S. Venkatachalapathy 4 1Assistant Professor, Department of Software Systems, Sri Krishna
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationClassifying Twitter Data in Multiple Classes Based On Sentiment Class Labels
Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),
More informationComparison of Various Feature Selection Methods in Application to Prototype Best Rules
Comparison of Various Feature Selection Methods in Application to Prototype Best Rules Marcin Blachnik Silesian University of Technology, Electrotechnology Department,Katowice Krasinskiego 8, Poland marcin.blachnik@polsl.pl
More informationFeature Selection Based on Relative Attribute Dependency: An Experimental Study
Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez
More informationChallenges and Interesting Research Directions in Associative Classification
Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationLOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES
8 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a n i a, M a y 25 27, 2 0 0 6 LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION
More informationDiscretization and Grouping: Preprocessing Steps for Data Mining
Discretization and Grouping: Preprocessing Steps for Data Mining Petr Berka 1 and Ivan Bruha 2 z Laboratory of Intelligent Systems Prague University of Economic W. Churchill Sq. 4, Prague CZ-13067, Czech
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationA Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection
A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection S. Revathi Ph.D. Research Scholar PG and Research, Department of Computer Science Government Arts
More informationK-means clustering based filter feature selection on high dimensional data
International Journal of Advances in Intelligent Informatics ISSN: 2442-6571 Vol 2, No 1, March 2016, pp. 38-45 38 K-means clustering based filter feature selection on high dimensional data Dewi Pramudi
More informationA Closest Fit Approach to Missing Attribute Values in Preterm Birth Data
A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data Jerzy W. Grzymala-Busse 1, Witold J. Grzymala-Busse 2, and Linda K. Goodwin 3 1 Department of Electrical Engineering and Computer
More informationA Lazy Approach for Machine Learning Algorithms
A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationComparative Study of Instance Based Learning and Back Propagation for Classification Problems
Comparative Study of Instance Based Learning and Back Propagation for Classification Problems 1 Nadia Kanwal, 2 Erkan Bostanci 1 Department of Computer Science, Lahore College for Women University, Lahore,
More informationRipple Down Rule learner (RIDOR) Classifier for IRIS Dataset
Ripple Down Rule learner (RIDOR) Classifier for IRIS Dataset V.Veeralakshmi Department of Computer Science Bharathiar University, Coimbatore, Tamilnadu veeralakshmi13@gmail.com Dr.D.Ramyachitra Department
More informationSOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER
International Journal of Mechanical Engineering and Technology (IJMET) Volume 7, Issue 5, September October 2016, pp.417 421, Article ID: IJMET_07_05_041 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=7&itype=5
More informationData with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction
Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction Jerzy W. Grzymala-Busse 1,2 1 Department of Electrical Engineering and Computer Science, University of
More informationInternational Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN
International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, www.ijcea.com ISSN 2321-3469 A COMPARATIVE STUDY OF CLASSIFICATION VIA CLUSTERING WITH K-MEANS AND J48
More informationPackage discretization
Type Package Package discretization December 22, 2014 Title Data preprocessing, discretization for classification. Version 1.0-1 Date 2010-12-02 Author HyunJi Kim Maintainer This package is a collection
More informationPartition Based Perturbation for Privacy Preserving Distributed Data Mining
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 2321-3469 PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHMS IN DATA MINING Srikanth Bethu
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More informationA Study on Clustering Method by Self-Organizing Map and Information Criteria
A Study on Clustering Method by Self-Organizing Map and Information Criteria Satoru Kato, Tadashi Horiuchi,andYoshioItoh Matsue College of Technology, 4-4 Nishi-ikuma, Matsue, Shimane 90-88, JAPAN, kato@matsue-ct.ac.jp
More informationNORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM
NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college
More informationPREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY
PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY T.Ramya 1, A.Mithra 2, J.Sathiya 3, T.Abirami 4 1 Assistant Professor, 2,3,4 Nadar Saraswathi college of Arts and Science, Theni, Tamil Nadu (India)
More informationWEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW
ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer
More informationIndex Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface
A Comparative Study of Classification Methods in Data Mining using RapidMiner Studio Vishnu Kumar Goyal Dept. of Computer Engineering Govt. R.C. Khaitan Polytechnic College, Jaipur, India vishnugoyal_jaipur@yahoo.co.in
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More informationCLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More information