A genetic algorithm based focused Web crawler for automatic webpage classification

Size: px
Start display at page:

Download "A genetic algorithm based focused Web crawler for automatic webpage classification"

Transcription

1 A genetic algorithm based focused Web crawler for automatic webpage classification Nancy Goyal, Rajesh Bhatia, Manish Kumar Computer Science and Engineering, PEC University of Technology, Chandigarh, India Keywords: Genetic algorithm; webpage classification; feature extraction; focused Web crawler; Java. Abstract The rapid increase in the amount of information present on the World Wide Web makes it difficult to find information of interest to a user. Search engines uses focused Web crawlers to get the information about a particular topic. Focused Web crawler seeks, gathers and maintains webpages relevant to a pre-defined set of topics rather than downloading all the webpages. During focused crawling, automatic webpage classification method is used to determine whether the webpage is on-topic or not. This paper discusses a genetic algorithm based automatic webpage classification technique. In this method, tags and terms are considered as features and the classifier is made to learn from the webpages in the training set. The best features are selected from the genetic algorithm based fitness optimization technique. Using both tags and terms as features, high precision on test data is achieved. 1. Introduction The increase in popularity of the World Wide Web causes the inclusion of large information on the Web. Processing with the information obtained from the Web requires large storage and is time consuming, it also put load on processor and degrade its performance. As a result, hardware and software resources are over utilized without getting much useful information from the Web. To overcome this problem, focused Web crawler is used. Focused web crawler crawls Web to gather webpages based on some predefined set of topics [1]. The relevancy of a webpage to a topic can be determined manually or automatically.manual classification is a process in which webpages are classified based on pre-defined set of categories. Various open source projects such as DMOZ.org maintained their directories manually provide way to find information from pre-defined set of categories. As the Web grows, manual approach is less effective. During focused Web crawling process, automatic webpage classification method is used to ascertain whether the webpage considered is on-topic or not [7]. In automatic webpage classification, a set of labelled documents is used to train a classifier, and then classifier is employed to assign webpages to the class labels. Information present on the Web is so huge which contains large number of terms and categories for the classifier. Classifier with huge data results in large dimensions for data processing. Feature selection can be used to improve the efficiency, scalability and accuracy of the classifier [3]. Feature selection is the process to select subset of features from the webpages. Different machine learning techniques used for classification problems are decision tree, Bayesian, support vector machine, K-nearest neighbor, and genetic algorithm etc. K-nearest neighbor is the simplest approach but it has long classification time. Support vector machine and decision tree are suitable for classification problems in which number of feature is small [5]. Webpage classification problems are high dimensional problems, might contains large number of features, features in combinations such as <tags, terms>. In this study, genetic algorithm is used to select best feature from the large feature set. Genetic algorithm is a fitness optimisation technique based on hereditary and evolution. By imitating the natural selection process, genetic algorithm tried to find optimal solution by sampling the space that has high probability of generating an optimal solution. It is useful for various applications such as scheduling, ordering etc [4]. Here it is used for webpage classification. In this study, genetic algorithm is proposed which finds the best features for a webpage and learned features can be used to classify webpages. This paper is organised as follows. In section 2, various webpage classifiers are discussed. Proposed genetic algorithm based webpage classification system is discussed in section 3. In section 4, data set, experimental results and discussion on experimental result are discussed. 2. Related work A number of approaches are present in the literature for webpage classification. In this section, we have discussed approaches that have been studied for automatic webpage classification in focused Web crawler. Ali and Omatu [9] has proposed a webpage classification method that uses neural network for classification. The proposed method is based on principle component analysis 697

2 and class profile-based features for feature selection. Principle component analysis is helpful to reduce the feature vector from document-term matrix. These reduced features combined with manually selected words act as a complete feature set to be fed to neural network for classification. One of the webpage classification methods is proposed by Qi,Xia and brain [7] in which they uses latent semantic analysis and webpage feature selection process to extract semantic and text features. Two support vector machine classifiers are used so that two classification results can be checked to vote which category the webpage should be placed in. Ant colony optimization is used to select best features from the feature set obtained from the webpages present in training dataset is proposed by Saraç, Esra, and Özel [8]. C4.5, naïve bayes and k-nearest neighbor classifiers were used to assign class labels to webpages. Based on pheromone value, best feature group is selected using TF- IDF and classify webpages using classifiers. Based on the combination of genetic algorithm and K- means clustering algorithm, Qi, Dehu, and Sun [6] has developed a webpage classifier. In this approach, a set of keywords is generated for each category from training webpages. For each keyword, an initial weight is assigned to each category. Genetic algorithm is used to optimize weights for keywords for each category to select best weights. Genetic algorithm based automatic Web page classification system is proposed in which both HTML tags and terms belong to each tag are used as classification features proposed by Özel, Selma Ayşe [4]. Genetic algorithm is used to select best feature set by learning optimal classifier from the positive and negative webpages in the training dataset. Özel, Selma Ayşe [5] has proposed a webpage classification system which uses genetic algorithm and K-nearest neighbor to select best features from the feature to improve run time performance of the classifier. Webpages from the test set would be classified based on the selected top features obtained using genetic algorithm. Genetic algorithm is used to generate classification rules based on predicting values and the result value as proposed by Ferdeus, Ahmed and Khan [2]. Rules are generated in the form of IF-THEN statement and fed into genetic algorithm. IF condition contains predicting values and THEN statement contains result value. Best fitted classification rule is selected to assign the class label to the values. 3. Proposed genetic algorithm based classification system The proposed system consists of URL filtering, feature extraction, genetic algorithm based classifier and classification as shown in fig. 1. In this study, our aim is to determine whether webpage have some information of Indian origin faculty webpage working in foreign universities or not i.e. Binary classification is used for class labels. The process starts with URL filtering in which a subset of URLs is selected based on the keywords related to faculty of any university. In feature extraction, certain tags and terms as features are used and extract features using genetic algorithm. In document formation, 2-D array is created to represent the presence or absence of feature in the document. The genetic algorithm based classifier learning part consists of: (i) coding, (ii) generation of initial population. (iii) Evaluation of initial population, (iv) selection, (v) crossover, (vi) mutation, (vii) generation of new population such that steps (iii) to (vii) are repeated until convergence to learn a (sub) optimal classifier. After the learning process, learned classifier is used for the classification to classify the unseen data URL filtering The URLs present in the training set are crawled up to certain depth and those crawled URLs are filtered based on the filtering list. Filtering list consist of keywords related to faculty such as people, staff, directory- staff, all-people etc. Since data is unknown, presence of these words might lead us to the URLs containing faculty information Feature extraction Tags such as <title>, <h1>, <h2>, <h3>, <h4>, <img>, <b>, <table>, <li>, <a>, <p>, <meta> which denotes title, header at level 1, header at level 2, header at level 3, header at level 4, image, bold, table, list items, anchor, paragraph respectively are used to extract features that are needed in both classifier learning and classification process. After analysis and observations, list of Indian surnames, cities, institutes, designations, departments and universities etc are the terms chosen for the above mentioned tags. Feature set is created consisting of these tags and terms. For example <tag-terms> forms one feature in the feature set. For example <title-list of surname>, <table-list of institutes>, <bold-list of departments> etc. <h1>, <h2>, <h3>, <h4> are grouped together to represent one header to reduce number of features extracted Document formation Filtered URLs and extracted features together form documents. Document formation creates a 2-D array consisting of URLs in rows and features in the columns whereas the entries in the array are 0 or 1 depending on the 698

3 presence or absence of the feature in the document as shown in equation 1., = {,, h Where D(i, j) represent document 2-D array where i represents i th URL from the filtered list T and j represents j th feature from the feature set F. Before creating this 2-D array, stop word removal and stemming using Wordnet is performed on the terms fetched from the URLs present in the filtered list. Training dataset (1) 3.4. Coding A chromosome consists of feature weights list which are real numbers in range [0, 1] and is represented in equation 2. = (,,., 1,,,, ) (2) Where W ij denotes the term j in tag i. We used title, header, image, paragraph, table, list, bold, meta and anchor tags in this order. In the proposed work, initial weights are assigned randomly and will be updated in genetic algorithm process. Testing dataset URL filtering Document formation Feature extraction Coding Genetic algorithm based classifier Initial population Evaluation Crossover Mutation New generation <gen_size or >avg_fitness_p rev No Yes Classification Classified webpages Fig.1. Flow diagram for proposed model 3.5. Initial population Initial population consist of population size chromosomes generated randomly using coding scheme. Size of each chromosome equals to the feature set. Population size taken in the proposed work is Evaluation of population Fitness of every chromosome present in the population is computed by evaluating the cosine similarity of the chromosome with every document as shown in equation 3. Cos_simi(C,Di) represents the cosine similarity. After evaluating the cosine similarity, threshold value is taken which is the mean of the cosine similarities for a chromosome corresponding to all documents as shown in equation 4. This threshold value might provide average 699

4 result but donot decrement the overall performance. And then the average of the cosine similarities of the chromosome corresponding to the documents is computed. That average is the fitness for the chromosome. Fitness computation is as shown in equation 5. _, = hh = =1 [] [] =1 [] [] + =1 [] [] =1 _, = _, _, > hh (5) Where n is the number of elements in the feature set, m is the number of documents present in the training dataset. C represents the chromosome and D i represents the i th document from the training set Selection For the selection of the chromosomes, a novel technique is used in which a dummy chromosome is created as a parameter for selection. Dummy chromosome is created containing elements equals to the average of the corresponding elements of all the chromosomes present in the population as shown in equation 6. [] = (3) (4) = [][] (6) Where C m [i] represents i th element of the dummy chromosome and C[j][i] represents i th element of the j th chromosome. Then fitness of that dummy chromosome is computed using equation 5. Based on the minimum difference between the fitness of the dummy chromosome and the chromosome of the population, chromosomes are selected for further processing Crossover In the proposed approach, uniform crossover technique is used in which a chromosome sized random dummy chromosome is generated which contains random weights. And then that dummy chromosome is compared with the crossover probability as shown in equations 7 and 8. [] < h [] = [] [] = [] (7) [] > h [] = [] [] = [] (8) Where P1[i], P2[i], r[i], c1[i], c2[i] are the i th weight of the feature of the first parent chromosome, second parent chromosome, dummy chromosome, first child and second child respectively. P c denotes crossover probability. And then fitness of the newly generated children is computed using equation 5. Table I. Example of crossover operation F1 F2 F3 F4 F5 F FN P P r C C Consider for example as shown in Table I, creation of the child chromosomes from the crossover operation. F1, F2 and so on upto FN are the features present in the feature set. P1, P2, r are first parent chromosome, second parent chromosome, dummy chromosome. C1 and C2 are generated after comparing dummy chromosome with the crossover probability using equations 7 and 8. C1 and C2 are newly generated chromosomes Mutation A modified mutation technique is proposed in which mut_no is calculated to determine the number of features in the chromosome that has been changed. As shown in equation 9, pop_size represents the size of the population, P(m) represents the mutation probability and chromosome_size represents the number of elements in the chromosome. _ = _ h _ (9) In this, a dummy chromosome is calculated in which each element is the average of each feature from all chromosomes in the population of the present iteration and computing its fitness. Selection of the chromosome for the mutation is done using minimum difference between the fitness of the dummy chromosome and the i th chromosome from the population as shown in equation 10. = = min, (10) Where C s represents selected chromosome for mutation and C i represents i th chromosome. Fitnees d and fitness i represents fitness of dummy and i th chromosome for population respectively, n is the number of chromosomes in a population. After selecting the chromosome, the mut_no features is selected and their weights is updated based on the random number. Fig. 2 shows the algorithm to generate new child by using mutation. In this algorithm, C represents the selected chromosome, C a represents the dummy chromosome and C m represents newly generated chromosome. Arr is an array to represent the intermediate state to store randomly generated number j, k and i are simple variables. After generating new chromosome, fitness of that chromosome is computed. 700

5 Input: Selected Chromosome C, Arr, C a. Output: Mutated Chromosome C m. 1. k =0; 2. for j =1 to mut_no a. Generate random number ran between [0, 1]. b. Generate randomly a number j between [1, chromosome_size]. c. if(ran< C[j]) C m [i] = C[j]. d. else C m [i] = rand(c[j], C a [i]). e. Arr[k++] = j. 3. end of for loop. 4. for i = 1 to chromosome_size a. if i present in Arr array continue. b. else C m [i] = C[j]. 5. end of for loop. 6. Return C m. Fig.2. Proposed algorithm for mutation Generation of new population are taken to crawl them upto certain depth. Then the URL filtering of the crawled data is performed to filter out the webpages whose URLs does not contain these words present in the filtering list. Filtered URLs are then passed to document formation phase where they are represented as binary vector of size equal to the number of features taken into consideration. Then the Cos_simi of the webpage D and best fitted chromosome C is computed. If Cos_simi is greater than threshold, webpage is marked as relevant else irrelevant. Best fitted chromosome (C) Start Testing dataset Seed URLs Crawl upto certain depth URL filtering Document formation (D) All the chromosomes present in the population of the current iteration and newly generated chromosomes from crossover and mutation are sorted based on their fitness and highly fitted pop_size chromosomes are selected for next iteration. Average fitness of the newly generated population is computed. Cos_simi(C, D) Yes >threshold No Termination condition In order to achieve convergence, improved termination condition is used. Convergence conditions are as shown in equation 11. Relevant Stop Irrelevant, > < = { (11), h Where tp represents the total chromosomes present till iteration, gensz represents the maximum number of the chromosomes that can be generated in the system. avgftcrp and avgftpvp represent the average fitness of the current and previous population respectively. When the genetic algorithm is terminated, chromosome with the highest fitness is selected from all the chromosomes present and used for classification of the webpages Classification In the classification phase, fig. 3 shows the classification process of the proposed algorithm. In the proposed classification process, seed URLs from the testing dataset 4. Results and discussion Fig.2. Proposed classification process In this section, experimental setup and results obtained are discussed. All the implementations for experiments were made using NetBeans IDE under windows 7 operating system. The hardware used in the experiment had 3GB of RAM and Intel Core i3 CPU M 2.53 GHz processor Dataset To get Indian origin academician information working abroad, the dataset taken consists of the websites of the foreign universities. In this experiment, is the website of the Stanford University used as an initial URL. In order to create dataset, 701

6 this URL is crawled upto depth 6 to extract all URL from the Stanford domain. Then this dataset of URLs is filtered based on URL filtering list. URL filtering list consist of words such as faculty, directory, people, staff, people-all, directory-people etc. Filtered URLs consists of all those URLs which contains one of these words in the URL itself. These set of webpages consists of irrelevant as well as relevant webpages. For the dataset, features are extracted based on tags and terms. Tags used are title < t >, header (<h1>, <h2>, < h3>, < h4>), image <img>, bold <b>, paragraph <p>, table <td>, list <li>, anchor <a>. Terms consist of lists of surnames, institutes, cities, departments and designations. Surnames list containing surnames of the Indians, institutes list consists of educational institutes present in India, and cities list consists of cities of India. Departments list consist of departments present in foreign universities related to science and technology and designations list consist of the designations of the faculties such as professor, assistant professor and associate professor etc. After analysis, feature set is created based on the combination of tags and terms such as title-designation, title-surnames, header-department, list-cities, table-institutes etc. For each Filtered URL, document formation takes place in which for the presence or absence of the feature from the feature set, 1 or 0 is marked respectively in 2D document matrix. For example, presence of surname in the title, marks corresponding <title-surname> feature as Genetic algorithm parameters Genetic algorithm parameters were determined experimentally such that they were the good choice for our system. Parameters such as population size = 30, generation size = 400, crossover probability = 0.7, mutation probability = 0.5 are taken after analysis and observations. Learning process took 38 iterations to converge and best chromosome is achieved. 5. RESULTS Table 2 shows the number of webpages achieved by the proposed system after every stage. Table 2: No. of webpages achieved at every stage of proposed system Stages No. of webpages Crawle d upto depth 6 URL filterin g Total faculty of Stanford university Indian origin faculty After crawling the Stanford University website upto certain depth, proposed system is verified and validated based on the results obtained. Precision is taken as the performance parameter of the proposed system. It is defined as the number of the relevant retrieved items to the total number of retrieved items as shown in equation 12. = # # (12) Using proposed approach, precision of around 80% is achieved. 6. CONCLUSION Focused Web crawler seeks, acquires and gathers webpages relevant to pre-defined set of topics. In this paper a genetic algorithm based focused Web crawler is proposed for webpage classification to classify webpages as relevant or irrelevant. In this approach, best chromosome is achieved after the learning phase of the genetic algorithm. This chromosome consists of best weighted feature set. Stanford university website is crawled for testing and based on this chromosome all the URLs are classified as relevant or irrelevant. It means whether that URL contains the Indian origin faculty information or not. In the proposed approach, we are able to achieve precision upto 80%. The precision of the proposed approach can further be improved by using more features. References [1] Chakrabarti, Soumen, Martin Van den Berg, and Byron Dom. "Focused crawling: a new approach to topic-specific Web resource discovery."computer Networks (1999): [2] Ferdaus, Abu Ahmed, and Mehnaj Afrin Khan. "A Genetic Algorithm Approach using Improved Fitness Function for Classification Rule Mining."International Journal of Computer Applications (2014). [3] Korde, Vandana, and C. Namrata Mahender. "Text classification and classifiers: A survey." International Journal of Artificial Intelligence & Applications (IJAIA) 3.2 (2012): [4] Özel, Selma Ayşe. "A web page classification system based on a genetic algorithm using tagged-terms as features." Expert Systems with Applications38.4 (2011): [5] Özel, Selma Ayşe. "A genetic algorithm based optimal feature selection for web page classification." Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on. IEEE, [6] Qi, Dehu, and Bo Sun. "A genetic k-means approaches for automated web page classification." Information Reuse and Integration, IRI Proceedings of the 2004 IEEE International Conference on. IEEE, [7] Qi, Xiaoguang, and Brian D. Davison. "Web page classification: Features and algorithms." ACM Computing Surveys (CSUR) 41.2 (2009): 12. [8] Saraç, Esra, and Selma Ayşe Özel. "An Ant Colony Optimization Based Feature Selection for Web Page Classification." The Scientific World Journal2014 (2014). [9] Selamat, Ali, and Sigeru Omatu. "Web page feature selection and classification using neural networks." Information Sciences 158 (2004):

Creating a Classifier for a Focused Web Crawler

Creating a Classifier for a Focused Web Crawler Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.

More information

Focused crawling: a new approach to topic-specific Web resource discovery. Authors

Focused crawling: a new approach to topic-specific Web resource discovery. Authors Focused crawling: a new approach to topic-specific Web resource discovery Authors Soumen Chakrabarti Martin van den Berg Byron Dom Presented By: Mohamed Ali Soliman m2ali@cs.uwaterloo.ca Outline Why Focused

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

INTRODUCTION (INTRODUCTION TO MMAS)

INTRODUCTION (INTRODUCTION TO MMAS) Max-Min Ant System Based Web Crawler Komal Upadhyay 1, Er. Suveg Moudgil 2 1 Department of Computer Science (M. TECH 4 th sem) Haryana Engineering College Jagadhri, Kurukshetra University, Haryana, India

More information

Focused Web Crawling Using Neural Network, Decision Tree Induction and Naïve Bayes Classifier

Focused Web Crawling Using Neural Network, Decision Tree Induction and Naïve Bayes Classifier IJCST Vo l. 5, Is s u e 3, Ju l y - Se p t 2014 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Focused Web Crawling Using Neural Network, Decision Tree Induction and Naïve Bayes Classifier 1 Prabhjit

More information

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Mukesh Kumar and Renu Vig University Institute of Engineering and Technology, Panjab University, Chandigarh,

More information

A Novel Feature Selection Framework for Automatic Web Page Classification

A Novel Feature Selection Framework for Automatic Web Page Classification International Journal of Automation and Computing 9(4), August 2012, 442-448 DOI: 10.1007/s11633-012-0665-x A Novel Feature Selection Framework for Automatic Web Page Classification J. Alamelu Mangai 1

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

Research Article An Ant Colony Optimization Based Feature Selection for Web Page Classification

Research Article An Ant Colony Optimization Based Feature Selection for Web Page Classification e Scientific World Journal, Article ID 649260, 16 pages http://dx.doi.org/10.1155/2014/649260 Research Article An Ant Colony Optimization Based Feature Selection for Web Page Classification EsraSaraçandSelma

More information

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms A Framework for adaptive focused web crawling and information retrieval using genetic algorithms Kevin Sebastian Dept of Computer Science, BITS Pilani kevseb1993@gmail.com 1 Abstract The web is undeniably

More information

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES SHIHADEH ALQRAINY. Department of Software Engineering, Albalqa Applied University. E-mail:

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Improving Relevance Prediction for Focused Web Crawlers

Improving Relevance Prediction for Focused Web Crawlers 2012 IEEE/ACIS 11th International Conference on Computer and Information Science Improving Relevance Prediction for Focused Web Crawlers Mejdl S. Safran 1,2, Abdullah Althagafi 1 and Dunren Che 1 Department

More information

An Optimized Approach for Feature Selection using Membrane Computing to Classify Web Pages

An Optimized Approach for Feature Selection using Membrane Computing to Classify Web Pages Research Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet An Optimized

More information

Smartcrawler: A Two-stage Crawler Novel Approach for Web Crawling

Smartcrawler: A Two-stage Crawler Novel Approach for Web Crawling Smartcrawler: A Two-stage Crawler Novel Approach for Web Crawling Harsha Tiwary, Prof. Nita Dimble Dept. of Computer Engineering, Flora Institute of Technology Pune, India ABSTRACT: On the web, the non-indexed

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Web Crawling As Nonlinear Dynamics

Web Crawling As Nonlinear Dynamics Progress in Nonlinear Dynamics and Chaos Vol. 1, 2013, 1-7 ISSN: 2321 9238 (online) Published on 28 April 2013 www.researchmathsci.org Progress in Web Crawling As Nonlinear Dynamics Chaitanya Raveendra

More information

A Web Page Recommendation system using GA based biclustering of web usage data

A Web Page Recommendation system using GA based biclustering of web usage data A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path

A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path Makki Akasha, Ibrahim Musa Ishag, Dong Gyu Lee, Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk

More information

Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,

More information

DESIGN OF CATEGORY-WISE FOCUSED WEB CRAWLER

DESIGN OF CATEGORY-WISE FOCUSED WEB CRAWLER DESIGN OF CATEGORY-WISE FOCUSED WEB CRAWLER Monika 1, Dr. Jyoti Pruthi 2 1 M.tech Scholar, 2 Assistant Professor, Department of Computer Science & Engineering, MRCE, Faridabad, (India) ABSTRACT The exponential

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Information Fusion Dr. B. K. Panigrahi

Information Fusion Dr. B. K. Panigrahi Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Template Extraction from Heterogeneous Web Pages

Template Extraction from Heterogeneous Web Pages Template Extraction from Heterogeneous Web Pages 1 Mrs. Harshal H. Kulkarni, 2 Mrs. Manasi k. Kulkarni Asst. Professor, Pune University, (PESMCOE, Pune), Pune, India Abstract: Templates are used by many

More information

A Review of K-mean Algorithm

A Review of K-mean Algorithm A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster

More information

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

A Review on Identifying the Main Content From Web Pages

A Review on Identifying the Main Content From Web Pages A Review on Identifying the Main Content From Web Pages Madhura R. Kaddu 1, Dr. R. B. Kulkarni 2 1, 2 Department of Computer Scienece and Engineering, Walchand Institute of Technology, Solapur University,

More information

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming Kyrre Glette kyrrehg@ifi INF3490 Evolvable Hardware Cartesian Genetic Programming Overview Introduction to Evolvable Hardware (EHW) Cartesian Genetic Programming Applications of EHW 3 Evolvable Hardware

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Evaluation Methods for Focused Crawling

Evaluation Methods for Focused Crawling Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth

More information

Mobile Agent Routing for Query Retrieval Using Genetic Algorithm

Mobile Agent Routing for Query Retrieval Using Genetic Algorithm 1 Mobile Agent Routing for Query Retrieval Using Genetic Algorithm A. Selamat a, b, M. H. Selamat a and S. Omatu b a Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia,

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

ACCELERATING THE ANT COLONY OPTIMIZATION

ACCELERATING THE ANT COLONY OPTIMIZATION ACCELERATING THE ANT COLONY OPTIMIZATION BY SMART ANTS, USING GENETIC OPERATOR Hassan Ismkhan Department of Computer Engineering, University of Bonab, Bonab, East Azerbaijan, Iran H.Ismkhan@bonabu.ac.ir

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Conclusions. Chapter Summary of our contributions

Conclusions. Chapter Summary of our contributions Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web

More information

A Supervised Method for Multi-keyword Web Crawling on Web Forums

A Supervised Method for Multi-keyword Web Crawling on Web Forums Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Literature Review On Implementing Binary Knapsack problem

Literature Review On Implementing Binary Knapsack problem Literature Review On Implementing Binary Knapsack problem Ms. Niyati Raj, Prof. Jahnavi Vitthalpura PG student Department of Information Technology, L.D. College of Engineering, Ahmedabad, India Assistant

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A New Approach for Energy Efficient Routing in MANETs Using Multi Objective Genetic Algorithm

A New Approach for Energy Efficient Routing in MANETs Using Multi Objective Genetic Algorithm A New Approach for Energy Efficient in MANETs Using Multi Objective Genetic Algorithm Neha Agarwal, Neeraj Manglani Abstract Mobile ad hoc networks (MANET) are selfcreating networks They contain short

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

SIMULATION APPROACH OF CUTTING TOOL MOVEMENT USING ARTIFICIAL INTELLIGENCE METHOD

SIMULATION APPROACH OF CUTTING TOOL MOVEMENT USING ARTIFICIAL INTELLIGENCE METHOD Journal of Engineering Science and Technology Special Issue on 4th International Technical Conference 2014, June (2015) 35-44 School of Engineering, Taylor s University SIMULATION APPROACH OF CUTTING TOOL

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

Network Routing Protocol using Genetic Algorithms

Network Routing Protocol using Genetic Algorithms International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:0 No:02 40 Network Routing Protocol using Genetic Algorithms Gihan Nagib and Wahied G. Ali Abstract This paper aims to develop a

More information

A Hybrid Genetic Algorithm for the Distributed Permutation Flowshop Scheduling Problem Yan Li 1, a*, Zhigang Chen 2, b

A Hybrid Genetic Algorithm for the Distributed Permutation Flowshop Scheduling Problem Yan Li 1, a*, Zhigang Chen 2, b International Conference on Information Technology and Management Innovation (ICITMI 2015) A Hybrid Genetic Algorithm for the Distributed Permutation Flowshop Scheduling Problem Yan Li 1, a*, Zhigang Chen

More information

Normalization based K means Clustering Algorithm

Normalization based K means Clustering Algorithm Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com

More information

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu

More information

Regression Based Cluster Formation for Enhancement of Lifetime of WSN

Regression Based Cluster Formation for Enhancement of Lifetime of WSN Regression Based Cluster Formation for Enhancement of Lifetime of WSN K. Lakshmi Joshitha Assistant Professor Sri Sai Ram Engineering College Chennai, India lakshmijoshitha@yahoo.com A. Gangasri PG Scholar

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

A New Selection Operator - CSM in Genetic Algorithms for Solving the TSP

A New Selection Operator - CSM in Genetic Algorithms for Solving the TSP A New Selection Operator - CSM in Genetic Algorithms for Solving the TSP Wael Raef Alkhayri Fahed Al duwairi High School Aljabereyah, Kuwait Suhail Sami Owais Applied Science Private University Amman,

More information

SNS College of Technology, Coimbatore, India

SNS College of Technology, Coimbatore, India Support Vector Machine: An efficient classifier for Method Level Bug Prediction using Information Gain 1 M.Vaijayanthi and 2 M. Nithya, 1,2 Assistant Professor, Department of Computer Science and Engineering,

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Santa Fe Trail Problem Solution Using Grammatical Evolution

Santa Fe Trail Problem Solution Using Grammatical Evolution 2012 International Conference on Industrial and Intelligent Information (ICIII 2012) IPCSIT vol.31 (2012) (2012) IACSIT Press, Singapore Santa Fe Trail Problem Solution Using Grammatical Evolution Hideyuki

More information

Smart Crawler: A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces

Smart Crawler: A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces Smart Crawler: A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces Rahul Shinde 1, Snehal Virkar 1, Shradha Kaphare 1, Prof. D. N. Wavhal 2 B. E Student, Department of Computer Engineering,

More information

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains

More information

Extraction of Semantic Text Portion Related to Anchor Link

Extraction of Semantic Text Portion Related to Anchor Link 1834 IEICE TRANS. INF. & SYST., VOL.E89 D, NO.6 JUNE 2006 PAPER Special Section on Human Communication II Extraction of Semantic Text Portion Related to Anchor Link Bui Quang HUNG a), Masanori OTSUBO,

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Chapter 8 The C 4.5*stat algorithm

Chapter 8 The C 4.5*stat algorithm 109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the

More information

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) SKIP - May 2004 Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) S. G. Hohmann, Electronic Vision(s), Kirchhoff Institut für Physik, Universität Heidelberg Hardware Neuronale Netzwerke

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

A Modified Genetic Algorithm for Process Scheduling in Distributed System

A Modified Genetic Algorithm for Process Scheduling in Distributed System A Modified Genetic Algorithm for Process Scheduling in Distributed System Vinay Harsora B.V.M. Engineering College Charatar Vidya Mandal Vallabh Vidyanagar, India Dr.Apurva Shah G.H.Patel College of Engineering

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila

More information

A Genetic Algorithm Approach for Clustering

A Genetic Algorithm Approach for Clustering www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 6 June, 2014 Page No. 6442-6447 A Genetic Algorithm Approach for Clustering Mamta Mor 1, Poonam Gupta

More information

A System s Approach Towards Domain Identification of Web Pages

A System s Approach Towards Domain Identification of Web Pages A System s Approach Towards Domain Identification of Web Pages Sonali Gupta Department of Computer Engineering YMCA University of Science & Technology Faridabad, India Sonali.goyal@yahoo.com Komal Kumar

More information

Application of rough ensemble classifier to web services categorization and focused crawling

Application of rough ensemble classifier to web services categorization and focused crawling With the expected growth of the number of Web services available on the web, the need for mechanisms that enable the automatic categorization to organize this vast amount of data, becomes important. A

More information

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization

More information

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India. Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm Prabha S. 1, Arun Prabha K. 2 1 Research Scholar, Department of Computer Science, Vellalar

More information

[Kaur, 5(8): August 2018] ISSN DOI /zenodo Impact Factor

[Kaur, 5(8): August 2018] ISSN DOI /zenodo Impact Factor GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES EVOLUTIONARY METAHEURISTIC ALGORITHMS FOR FEATURE SELECTION: A SURVEY Sandeep Kaur *1 & Vinay Chopra 2 *1 Research Scholar, Computer Science and Engineering,

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

THE WEB SEARCH ENGINE

THE WEB SEARCH ENGINE International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com

More information

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification

More information