Domain Independent Prediction with Evolutionary Nearest Neighbors.
|
|
- Benedict Chandler
- 5 years ago
- Views:
Transcription
1 Research Summary Domain Independent Prediction with Evolutionary Nearest Neighbors. Introduction In January of 1848, on the American River at Coloma near Sacramento a few tiny gold nuggets were discovered. This triggered one of the largest human migratio ns in history as a halfmillion people from around the world descended upon California in search of instant wealth [2]. We live in a data rich information poor environment [1] with a requirement for the migration of computational tools to suite the data to extract valuable information nuggets. Data mining is a multi disciplinary field with the primary objective of supporting knowledge workers to extract the information from large volumes of data. In most of the practical data mining applications large scale tool customization is done for the specific application domain. There is a requirement for a generalized data mining tool for at least one major area of data mining. This work is an attempt to investigate, build, and evaluate a generalized prediction frame work that facilitates easy migration of the tool towards the application in an attempt to do meaningful data mining. We propose a nearest neighbor prediction approach with a genetic algorithm (GA) based relevance tuning for the particular application domain. Generalization of the tool enabling easy migration with the use of a GA may be computationally prohibitive for large data sets. We propose the use of a vertical data mining ready data structure (P-trees 1 ) that would enable the tool frame work to be computationally efficient in the generalized setting. Background The work proposed fall into the area of research categorized as data mining and knowledge discovery with evolutionary algorithms. The main motivation for applying evolutionary algorithms to data mining tasks is that they are robust and adaptive. Classification (Prediction) is 1 P-tree technology is patent pending.
2 most probably the most widely studied data mining task [3]. K Nearest Neighbor (KNN) classification is well explored in the literature and has been shown to have good classification (prediction) performance on a wide range of real world data sets [4]. KNN is simple and straight forward to implement. The use of a distance metric in KNN opens a wide array of opportunities to use evolutionary techniques to tune the metric to a particular application domain. Most of the existing research use evolutionary techniques for dimensionality reduction and attribute relevance [4],[5],[6],[7] etc. There are some other cases where evolutionary techniques are used to optimize other parameters such as the optimum k in KNN [4]. Genetic algorithms [8] are parallel, iterative optimizers, and have been successively applied to a broad spectrum of optimization problems [4]. Attribute dimensions can be scaled, using, a genetic algorithm, to optimize the classification accuracy of a separate algorithm, such as KNN [7]. The artificial tuning process requires the evaluation of the prediction model iteratively. Iterative evaluation of the data mining model could be computationally expensive for large data sets. P-trees are a lossless, compressed, and data-mining-ready data structure. This data structure has been successfully applied in data mining applications for real world data [6],[9],[10],[11]. Efficient computation of required neighborhood counts leads to a low cost solution for the iterative evaluation required for the GA based evolution (tuning). Two major obstacles with quick and easy migration of a tool frame work are diversity in data and diversity in domain knowledge. These could be addressed with the use of Ptrees and a GA respectively. Proposed approach The main objective of the proposed work is a generalized prediction frame work. This should allow easy migration of the tool framework to different application domains. We propose the use of an artificially tuned nearest neighbor type prediction model. We propose exploring all
3 possible tuning parameters for the nearest neighbor prediction model with the use of a GA. For example the non restriction of the neighborhood search to k with the use of a GA optimized influence function in the similarity metric (Figure 2). The use of the P-tree data structure allows this work to go beyond the classical nearest neighbor classification. In the classical approach the similarity counting is done through expensive database scans, which is replaced by a collection of logical operations on compressed bit vectors in P-trees. An outline of the proposed architecture is shown in figure 1. Training data from the application domain will be initially converted to P-trees. This will be used for neighborhood counting in the predictor. The GA will be used to tune the predictor. Finally the input samples will be predicted with the use of the tuned predictor. Application Training Data Genetic Algorithm P-tree Engine & Data Repository Nearest Neighbor Predictor Attr. Relevance Data to be predicted Tuned Predictor Figure 1 Proposed outline of architecture Prediction Neighborhood influence Figure 2 Example of two parameters that could be tuned on the similarity metric of the nearest neighbor predictor with the use of the genetic algorithm. Proposed Evaluation With respect to the main objective of this work the tool framework proposed should be evaluated at least in two diverse application domains to show the ease of migration independent of the application domain. It will also be an added advantage to look for an application domain with a
4 high potential for return with respect to the use of data mining. Prediction applications in bioinformatics and software project cost estimation are proposed as two initial application areas. In bioinformatics there is an abundance of data [12] with some, specific such as protein function prediction and not so specific classification and predication applications. In software engineering there is a specific need for good software cost predictions [13] for the mere survival of the industry with a general intuition that data mining can provide a reasonable solution. Two major criterions for evaluation with respect to the quality of solution are the accuracy and the computational cost. In this work more emphasis will be focused on the accuracy. The proposed evaluation will test the tool framework against published results of existing solutions. Each selected application domain will be tested with only the migration enabled by the artificial tuning proposed in this work. Conclusion As with the human migration in the Gold Rush, we are proposing a tool frame work with quick and easy migration across application domains to find valuable information. The two major obstacles with the migration of data mining tools are diversity in application data and diversity in domain knowledge. The data diversity is handled by the use of a uniform and computationally efficient data structure in P-trees. Diversity in domain knowledge is handled by the use of an evolutionary algorithm in a GA. Successful completion of the proposed work will contribute to the body of knowledge the feasibility of enabling technology for a computationally intelligent gold rush for information in diverse application domains.
5 References [1] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques Academic Press, Morgan Kaufmann Publishers, [2] [3] A.A. Freitas, A survey of evolutionary algorithms for data mining and knowledge discovery, Advances in Evolutionary Computation, pp , Springer-Verlag, August [4] M. L. Raymer, W. F. Punch, E. D. Goodman, L. A. Kuhn, and L. C. Jain. Dimensionality reduction using genetic algorithms. IEEE Trans. on Evolutionary Computation, 4(2): , [5] Yang J and Honavar V. Feature subset selection using a genetic algorithm. In: Liu H & Motoda H (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, Kluwer, [6] P-tree Classification of Yeast Gene Deletion Data. Amal Perera, Anne Denton, Pratap Kotala, William Jockheck,Willy Valdivia Granda,William Perrizo. SIGKDD Explorations. January 2003 Vol 4, Issue 2. [7] W.F. Punch, E.D. Goodman, M. Pei, L. Chia-Shun, P. Hovland, and R. Enbody, Further research on feature selection and classification using genetic algorithms, Proc. of the Fifth Int. Conf. on Genetic Algorithms, pp , San Mateo, CA, [8] Goldberg, D.E., Genetic Algorithms in Search Optimization, and Machine Learning, Addison Wesley, [9] Ding, Q., Ding, Q., Perrizo, W., ARM on RSI Using P-trees, Pacific-Asia KDD Conf., pp , Taipei, May [10] Ding, Q., Ding, Q., Perrizo, W., Decision Tree Classification of Spatial Data Streams Using Peano Count Trees, ACM SAC, pp , Madrid, Spain, March [11] Khan, M., Ding, Q., Perrizo, W., KNN on Data Stream Using P-trees, Pacific-Asia KDD, pp , Taipei, May [12] Beck, S. and Sterk, P. Genome-scale DNA sequencing: where are we? Curr. Opin. Biotechnol. 9, , [13] S. Chulani, B. Boehm, and B. Steece. Bayesian analysis of empirical software engineering cost models. IEEE Transactionon Software Engineerining, 25(4), July/August 1999.
Attribute Selection with a Multiobjective Genetic Algorithm
Attribute Selection with a Multiobjective Genetic Algorithm Gisele L. Pappa, Alex A. Freitas, Celso A.A. Kaestner Pontifícia Universidade Catolica do Parana (PUCPR), Postgraduated Program in Applied Computer
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationDistributed Optimization of Feature Mining Using Evolutionary Techniques
Distributed Optimization of Feature Mining Using Evolutionary Techniques Karthik Ganesan Pillai University of Dayton Computer Science 300 College Park Dayton, OH 45469-2160 Dale Emery Courte University
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationMulti-objective pattern and feature selection by a genetic algorithm
H. Ishibuchi, T. Nakashima: Multi-objective pattern and feature selection by a genetic algorithm, Proc. of Genetic and Evolutionary Computation Conference (Las Vegas, Nevada, U.S.A.) pp.1069-1076 (July
More informationAn Empirical Study on feature selection for Data Classification
An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of
More informationWeighting and selection of features.
Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC TEST CASE GENERATION FOR PERFORMANCE ENHANCEMENT OF SOFTWARE THROUGH GENETIC ALGORITHM AND RANDOM TESTING Bright Keswani,
More informationConstructing X-of-N Attributes with a Genetic Algorithm
Constructing X-of-N Attributes with a Genetic Algorithm Otavio Larsen 1 Alex Freitas 2 Julio C. Nievola 1 1 Postgraduate Program in Applied Computer Science 2 Computing Laboratory Pontificia Universidade
More informationCombination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran
More informationMore Efficient Classification of Web Content Using Graph Sampling
More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information
More informationExtension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize
Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State
More informationMultimedia Data Mining Using P-trees 1,2
Multimedia Data Mining Using P-trees 1,2 William Perrizo, William Jockheck, Amal Perera, Dongmei Ren, Weihua Wu, Yi Zhang Department of Computer Science, North Dakota State University, Fargo, North Dakota
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationClassification of Concept-Drifting Data Streams using Optimized Genetic Algorithm
Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm E. Padmalatha Asst.prof CBIT C.R.K. Reddy, PhD Professor CBIT B. Padmaja Rani, PhD Professor JNTUH ABSTRACT Data Stream
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationA Hierarchical Document Clustering Approach with Frequent Itemsets
A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of
More informationCHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES
CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of
More informationCategorization of Sequential Data using Associative Classifiers
Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,
More informationGenetic Algorithms for Classification and Feature Extraction
Genetic Algorithms for Classification and Feature Extraction Min Pei, Erik D. Goodman, William F. Punch III and Ying Ding, (1995), Genetic Algorithms For Classification and Feature Extraction, Michigan
More informationK-Means Clustering With Initial Centroids Based On Difference Operator
K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,
More informationClassification and Feature Selection Techniques in Data Mining
Classification and Feature Selection Techniques in Data Mining Sunita Beniwal *, Jitender Arora Department of Information Technology, Maharishi Markandeshwar University, Mullana, Ambala-133203, India Abstract
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationA New Technique of Lossless Image Compression using PPM-Tree
A New Technique of Lossless Image Compression PP-Tree Shams ahmood Imam, S.. Rezaul Hoque, ohammad Kabir Hossain, William Perrizo Department of Computer Science and Engineering, North South University,
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationFall Principles of Knowledge Discovery in Databases. University of Alberta
Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationMonika Maharishi Dayanand University Rohtak
Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures
More informationComparative Study of Data Mining Classification Techniques over Soybean Disease by Implementing PCA-GA
Comparative Study of Data Mining Classification Techniques over Soybean Disease by Implementing PCA-GA Dr. Geraldin B. Dela Cruz Institute of Engineering, Tarlac College of Agriculture, Philippines, delacruz.geri@gmail.com
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationComparison of PSO-Based Optimized Feature Computation for Automated Configuration of Multi-Sensor Systems
Comparison of PSO-Based Optimized Feature Computation for Automated Configuration of Multi-Sensor Systems Kuncup Iswandy and Andreas Koenig Institute of Integrated Sensor Systems, University of Kaiserslautern,
More informationK-Nearest Neighbor Classification on Spatial Data Streams. Using P-Trees 1, 2
K-Nearest Neighbor Classification on Spatial Data Streams Using P-Trees 1, 2 Maleq Khan, Qin Ding and William Perrizo Computer Science Department, North Dakota State University Fargo, ND 58105, USA {Md_Khan,
More informationIndex Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface
A Comparative Study of Classification Methods in Data Mining using RapidMiner Studio Vishnu Kumar Goyal Dept. of Computer Engineering Govt. R.C. Khaitan Polytechnic College, Jaipur, India vishnugoyal_jaipur@yahoo.co.in
More informationDesign of Nearest Neighbor Classifiers Using an Intelligent Multi-objective Evolutionary Algorithm
Design of Nearest Neighbor Classifiers Using an Intelligent Multi-objective Evolutionary Algorithm Jian-Hung Chen, Hung-Ming Chen, and Shinn-Ying Ho Department of Information Engineering and Computer Science,
More informationDecision Tree Classification of Spatial Data Streams Using Peano Count Trees 1, 2
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees 1, 2 Qiang Ding, Qin Ding, William Perrizo Computer Science Department, North Dakota State University Fargo, ND58105, USA {qiang.ding,
More informationA SURVEY OF DATA MINING & ITS APPLICATIONS
A SURVEY OF DATA MINING & ITS APPLICATIONS Pankaj jain M.Tech Student, Computer Science Siddhi Vinayak College of Science & Hr.Education, Alwar (Rajasthan) Abstract- Data mining consists of evolving set
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationBinary Representations of Integers and the Performance of Selectorecombinative Genetic Algorithms
Binary Representations of Integers and the Performance of Selectorecombinative Genetic Algorithms Franz Rothlauf Department of Information Systems University of Bayreuth, Germany franz.rothlauf@uni-bayreuth.de
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationH-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm?
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases Paper s goals Introduce a new data structure: H-struct J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang Int. Conf. on Data Mining
More informationentire search space constituting coefficient sets. The brute force approach performs three passes through the search space, with each run the se
Evolving Simulation Modeling: Calibrating SLEUTH Using a Genetic Algorithm M. D. Clarke-Lauer 1 and Keith. C. Clarke 2 1 California State University, Sacramento, 625 Woodside Sierra #2, Sacramento, CA,
More informationFinding Effective Software Security Metrics Using A Genetic Algorithm
International Journal of Software Engineering. ISSN 0974-3162 Volume 4, Number 2 (2013), pp. 1-6 International Research Publication House http://www.irphouse.com Finding Effective Software Security Metrics
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationClassification Using Unstructured Rules and Ant Colony Optimization
Classification Using Unstructured Rules and Ant Colony Optimization Negar Zakeri Nejad, Amir H. Bakhtiary, and Morteza Analoui Abstract In this paper a new method based on the algorithm is proposed to
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationClassifier Inspired Scaling for Training Set Selection
Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationMeta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization
2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More informationOUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS
OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS DEEVI RADHA RANI Department of CSE, K L University, Vaddeswaram, Guntur, Andhra Pradesh, India. deevi_radharani@rediffmail.com NAVYA DHULIPALA
More informationDetection and Deletion of Outliers from Large Datasets
Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant
More informationRevision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems
4 The Open Cybernetics and Systemics Journal, 008,, 4-9 Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems K. Kato *, M. Sakawa and H. Katagiri Department of Artificial
More informationOn Mining Satellite and Other Remotely Sensed Images 1, 2
On Mining Satellite and Other Remotely Sensed Images 1, 2 William Perrizo, Qin Ding, Qiang Ding, Amalendu Roy Department of Computer Science, North Dakota State University Fargo, ND 5815-5164 {William_Perrizo,
More informationCOMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS
COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationSK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher
ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of
More informationEfficiently Handling Feature Redundancy in High-Dimensional Data
Efficiently Handling Feature Redundancy in High-Dimensional Data Lei Yu Department of Computer Science & Engineering Arizona State University Tempe, AZ 85287-5406 leiyu@asu.edu Huan Liu Department of Computer
More informationReview on Data Mining Techniques for Intrusion Detection System
Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationAdaptive Crossover in Genetic Algorithms Using Statistics Mechanism
in Artificial Life VIII, Standish, Abbass, Bedau (eds)(mit Press) 2002. pp 182 185 1 Adaptive Crossover in Genetic Algorithms Using Statistics Mechanism Shengxiang Yang Department of Mathematics and Computer
More informationSCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationA Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationAn experimental evaluation of a parallel genetic algorithm using MPI
2009 13th Panhellenic Conference on Informatics An experimental evaluation of a parallel genetic algorithm using MPI E. Hadjikyriacou, N. Samaras, K. Margaritis Dept. of Applied Informatics University
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationA Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)
International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationA Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India
More informationA Classifier with the Function-based Decision Tree
A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw
More informationEvolution of the Discrete Cosine Transform Using Genetic Programming
Res. Lett. Inf. Math. Sci. (22), 3, 117-125 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Evolution of the Discrete Cosine Transform Using Genetic Programming Xiang Biao Cui and
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationClustering: An art of grouping related objects
Clustering: An art of grouping related objects Sumit Kumar, Sunil Verma Abstract- In today s world, clustering has seen many applications due to its ability of binding related data together but there are
More informationA New Genetic Clustering Based Approach in Aspect Mining
Proc. of the 8th WSEAS Int. Conf. on Mathematical Methods and Computational Techniques in Electrical Engineering, Bucharest, October 16-17, 2006 135 A New Genetic Clustering Based Approach in Aspect Mining
More informationOptimization of Association Rule Mining through Genetic Algorithm
Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationEfficient Case Based Feature Construction
Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de
More informationMulti-objective Optimization Algorithm based on Magnetotactic Bacterium
Vol.78 (MulGrab 24), pp.6-64 http://dx.doi.org/.4257/astl.24.78. Multi-obective Optimization Algorithm based on Magnetotactic Bacterium Zhidan Xu Institute of Basic Science, Harbin University of Commerce,
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationFREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING
FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,
More informationWEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1
WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey
More informationEfficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points
Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 1
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More information