A MINING TECHNIQUE FOR WEB DATA USING CLUSTERING
|
|
- Lionel Gallagher
- 5 years ago
- Views:
Transcription
1 A MINING TECHNIQUE FOR WEB DATA USING CLUSTERING Ms. Chhaya M.Meshram 1, Prof. Rahila Sheikh 2 1 B.D.C.O.E. Sevagram, 2 R.G.C.E.R.T. Chandrapur Abstract- Web text mining is an important branch in the data mining. Text mining is the process of searching large volumes of documents from certain keywords or key phrases. An extension of text mining is web mining. Web mining is an exciting new field that integrates data and text mining within a website. It enhances the web site with intelligent behavior, such as suggesting related links or recommending new products to the consumer. One of tbe data mining activities which involve extracting meaningful new information from the data is classification & clustering technique. ing enables one to discover hidden similarity and key concepts. Any clustering technique relies on concepts such as a data representation model, a similarity measure, a cluster model, a clustering algorithm. The classification technique is a kind of data analysis form, which can be used to gather and describe important data set. This method is used to estimate the Categorical Label of data object. The objective of this paper is to provide a new Web Text Mining Model which include query directed web page clustering algorithm & vector space model. Keywords- ing, Text Mining, VSM, Web Text Mining 1. INTRODUCTION The World Wide Web is rapidly emerging as an important medium for the dissemination of information related to wide range of topics. This increases need of techniques to unveil inherent structure in the underlined data. ing is one of these. ing enables one to discover hidden similarity and key concepts. Any clustering technique relies on concepts such as a data representation model, a similarity measure, a cluster model, a clustering algorithm. Web search is difficult because it is hard for users to construct queries that are both sufficiently descriptive and sufficiently discriminating to find just the web pages that are relevant to the user s search goal. Queries are often ambiguous: words and phrases are frequently polysemantic and user search goals are often narrower in scope than the queries used to express them. This ambiguity leads to search result sets containing distinct page groups that meet different user search goals. Often users must refine their search by modifying the query to filter out the irrelevant results. Users must understand the result set to refine queries effectively; but this is time consuming, if the result set is unorganized. Web page clustering is one approach for assisting users to both comprehend the result set and to refine the query. Web page clustering algorithms identify semantically meaningful groups of web pages and present these to the user as clusters. The clusters provide an overview of the contents of the result set and when a cluster is selected the result set is refined to just the relevant pages in that cluster. ing performance is very important for usability. If cluster quality is poor, the clusters will be semantically meaningless or will contain many irrelevant pages. If cluster coverage is poor, then clusters representing useful groups of pages will be missing or the clusters will be missing many relevant pages. A query directed web page clustering algorithm that gives better clustering performance than other clustering algorithms. It has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining (drifting) problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster. The objective of this paper is to provide a new web text mining model. The section2 describes new web text mining model. Section3 describes new technique for web page clustering & Vector Space Model used for showing similarity between query & document. Experiments, interpretation and discussion are presented in section4. Section5 provides conclusion. 2. WEB TEXT MINING MODEL The proposed web text mining model consists of different phases. Firstly query is given to search engine then we get 100 URL s related to that query. Summary of extracted URL s is then passes through various preprocessing phases. Then different clusters are formed by using query directed web page clustering algorithm. Then vector space model is used for showing similarity between query & document. ISSN: Page 240
2 Download web Documents Input Query (Phrase) performance at the cost of algorithm speed. Algorithm computes the query distance of each base cluster the distance from the query, using NGD. NGD(x, y) = max {log f(x), log f(y)} log f(x, y) log min {log f(x), log f(y)} Where f(x) and f(y) are the number of hits of words x and y, respectively, and M is the total number of web pages that Google indexes. Symbol List Lower Case conver Symbol Filter Tag Filter Stop Words Filter Stop words list Stemme r Word Net Base ing Merging splitting selection Cleaning 3.2 Merging Algorithm constructs larger clusters by merging clusters together. Each cluster (c) is constructed from a set of base clusters and a cluster is described by the word that describes the cluster s largest base cluster. However, the set of pages in a cluster is not necessarily all the pages in its base clusters. A page is only included in the cluster If it is present in enough of the base clusters in the cluster. This threshold should increase with the number of base clusters in the cluster, but should not increase steeply. Algorithm uses a log function. A cluster is a set that contains the pages that are in at least log2 ( base(c) + 1) of the cluster s base clusters. Initially there is a singleton cluster for each base cluster. Algorithm merges clusters using single-link clustering over relatedness Graph. Single-link clustering merges together all clusters that are part of the same connected component on the graph. The relatedness graph has the clusters as vertices and has an edge between any two clusters that are sufficiently similar. VSM algorithm for finding similarity between query & document List of clusters along with Documents 3. WEB PAGE CLUSTERING The proposed web text mining system is implemented by using query directed web page clustering algorithm. This algorithm gives better clustering performance than other clustering algorithms. Initially this algorithm having single word as query but in proposed system we can use multiple word as query. Algorithm has five key innovations as below. 3.1 Base ing A base cluster is described by a single word and consists of all the pages containing that word. Equivalently, base clusters are single word search refinements based on the Current search results. After standard page preprocessing, this algorithm constructs a collection of base clusters, one for every word that is in at least 4% of the pages. Using a lower threshold will increase clustering 3.3 Splitting Each cluster now contains at least all the base clusters that relate to one idea; this is assured as single-link clustering merges all related clusters. But single-link clustering, even with our improved similarity function, can produce clusters containing multiple ideas and irrelevant base clusters due to cluster chaining (drifting). Such clusters need to be split. Interestingly, it is easier to split such a compound cluster than to prevent its formation in the first place; because the splitting can take into account the final cluster, whereas the merging process cannot. Algorithm uses a distance measure with three components: the number of paths between the two subclusters on the relatedness graph of length one (one links), or of length two (two links), and the average distance from base clusters in one sub-cluster to base clusters in the other sub-cluster. Dist (c1, c2) = onelinks+0.5 twolinks avgdist (c1, c2) avidest(c1, c2) =P b 12 base(c1) Pb22base(c2) Len(b1, b2) base(c1) base(c2) Where Len (b1, b2) is the path length between two base clusters in the relatedness graph. ISSN: Page 241
3 3.4 Selection At this stage, algorithm has a small set of coherent clusters. However, there will still be more clusters than can be presented to the user. Algorithm needs to select the best subset of the clusters to present to the user. Ideally, these clusters should be high quality clusters that cover all the pages in the original set with minimal overlap. Algorithm uses the ESTC cluster selection algorithm [36] with an improved heuristic, H(C), to select a set of clusters to Show the user. The ESTC cluster selection algorithm uses the heuristic with a 3-step look-ahead hillclimbing search to select a set of clusters to present to the user. To evaluate a candidate set of clusters, C, the new heuristic considers the number of pages covered by the clusters (CP), the number of distinct pages covered by the clusters (CD), the number of pages not covered by any of the clusters (CO), and the quality of each cluster (q(c)).h(c) =Xc2Cq(c)! _CO _(CP CD). 3.5 Selection Base clusters are sometimes formed from polysemous words and therefore clusters can contain pages that cover different topics. Since the clusters should relate to only one topic, pages from other topics are irrelevant. Algorithm computes the relevance of each page in each cluster and removes irrelevant pages. The relevance of a page to a cluster is based on the number and size of the cluster s base clusters of which it is a member. Page relevance varies between 0 and 1, with 0 being a page that is completely irrelevant to the cluster. Page relevance is computed as the sum of the sizes of the cluster s base clusters of which it is a member, divided by the sum of the sizes of all of the cluster s base clusters. Relevance (p, c) =P {b b2base(c) ^p2b} b Pb2base(c) b 4. MEASUREMENTS, EXPERIMENTS & RESULTS This paper has presented a new Web text mining model. This model is based on clustering. 4.1 Measurements ing have been evaluated using a wide variety of measurements. Purity of cluster are based on three standard information retrieval measures: precision, recall, and f-measure. P(c,t) = Precision= D c,t / D c R(c,t) = Recall = D c,t / D t F(c,t) = F-measure = (2*P(c,t) * R(c,t) ) / (P(c,t) + R(c,t) Where C-> is a set of clusters T-> is a set of topics D-> is a set of pages D c -> is the pages in cluster c D t -> is the pages in topic t D c,t -> pages in cluster c of topic t. Purity assumes that a cluster represents the topic with the highest precision. F assumes that a cluster represents the topic with the highest f-measure. Entropy & Measurement are also used for measurement of cluster. 4.2 Experiments Fig:- Precision & Recall Fig:- Entropy & Mutual Information 4.3 Results A. Graphical User Interface GUI consists of following components. 1. Web links component will show the list of all downloaded web pages. 2. Search button along with text field. 3. A list with base clusters ( List). 4. A table which contain clusters related to respective clusters. ISSN: Page 242
4 5. A list of all URLs for a single cluster (Document List). 6. Four types of counts showing total URLs searched (Total Links), URLs for a single cluster (Document Links), time taken by Search Engine as well as QDC. 7. Show Results button. Application is showing total web pages searched and filtered web pages that are related with the selected cluster. C. Application showing cluster results The application is showing the list of web links related to given query. Now double click on links shown in Web Links to see the Link Summary Dialog. Fig:- Graphical User Interface B. URLs for All downloaded web pages The database for phrases consists of 100 web pages. Whenever query phrase is entered by user the system checks for the pages from database that contain phrase and keep all pages in document list. The different interpretations of phrase sachin tendulkar cricket are sachin, tendulkar, cricket, sachin tendulkar, sachin tendulkar cricket. Fig:- Application showing cluster results The application is showing the list in the given right hand side table of related cluster those relate with the given query. Now Double click on the give single row of the table to see the similarity value of the given cluster present in given documents. Fig:- URLs for All downloaded web pages After completion of search process, a list of clusters is shown i.e. different interpretations. Application is showing the time taken by Search Engine and the algorithm. Fig:- Showing live web page which is related to searched phrase query ISSN: Page 243
5 5. CONCLUSION This paper has presented a new web text mining model. It includes combination of web page clustering algorithm, VSM model & uses relationship between clusters to show classification. ing algorithm has five key innovations. Firstly, it identifies better clusters using a query directed cluster quality guide that considers the relationship between a cluster s descriptive terms and the query terms. Secondly, it increases the merging of semantically related clusters and decreases the merging of semantically unrelated clusters by comparing the descriptions of clusters in addition to comparing the overlap of page contents between clusters. Thirdly, it fixed the cluster chaining (drifting) problem using a new cluster splitting method. Fourthly, it chooses better clusters to show the user by improving the ESTC cluster selection heuristic to consider the number of clusters to select and cluster quality. Finally, it improves the clusters by ranking the pages according to cluster relevance. We can give phrase as query to this model. Finally it shows the relationship between clusters in tree format. 6. REFERENCES [1]. Jingfeng Zhang, Ming Zhen, Yan Wu, the Script Language and Dynamic Web Page Designing, China Water Power Press (2004) (in Chinese) [2]. Chen xiaoyun, Text Categorization Based on Classification Rules Tree by Frequent Patterns. Journal of Software, Vol.17, No.5, pp , [3]. Zhili Zhou, Renwu Wang, A Study of Web Data Automation Extraction and its Application, E- COMMERCE,4(2006) 58-63(in Chinese) [4]. Weiha Feng, Zhangfeng Mao, The Research of Web Pages Information Extraction Based on Web, Journal of Luoyang Technology College,3(2005) (in Chinese) [5]. Shentao Li, Design and Realization of Focused Web Crawler, Chinese Academy of Sciences, 1-3(2002) (in Chinese) [6]. Boyi Xu, Jing Wang, Hongming Cai A Web Page Classification Algorithm and Its Application in E-government System, Seventh International Conference on Fuzzy Systems and Knowledge Discovery (2010) [7]. Shiqun Yin, Fang Wang, Zhong Xie, Yuhui Qiu, Study on Webpage Classification Algorithm Based on Rough Set Theory, International Symposiums on Information Processing (2008) [8]. G.S. Tomar,Shekhar Verma, Ashish Jha, Web Page Classification using Modified Naïve BayesianApproach, International conference on Research and Development in Information Retrieva, 2007 [9]. Pável Calado, et.al., Combining link-based and Content-based methods for web document Classification, in Proceedings of the twelfth International conference on Information and knowledge Management, 2003, pp [10]. Lin Zhang, Yan Chen, Yan Liang, Nan Li, Application Of Data Mining Classification Algorithms in Customer Membership Card Classification Model, International Conference on Information Management, Innovation Management and Industrial Engineering,2008 [11]. Bai Xingli, Zhang Yuanping, The Research on an Improved Fast SVM Classification Algorithm, Second International Symposium on Computational Intelligence and Design (2009) [12]. Zheng Tan, Hanhu Wang, Mei Chen Improved CBA Classification Algorithm Based on Rough Set, IEEE, 2009 [13]. Zehra Cataltepe, Eser Aygun, An Improvement of Centroid-Based Classification Algorithm for Text Classification, IEEE, 2007 [14]. K. P. Bennett and A. Demerit. Semi-supervised support vector machines. In M. S. Kearns, S. A. Sololá, and D. A. Cohn, editors, Advances in Neural Information Processing Systems -10-, pages 368{374, Cambridge, MA, MIT Press. [15]. J. C. Bedeck. Pattern Recognition with Fuzzy Objective Function Alga-Rhythms. New York, [16] US Census Bureau. Adult dataset. Publicly available from ISSN: Page 244
REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationOn Reduct Construction Algorithms
1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationAUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS
AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,
More informationResearch and Design of Key Technology of Vertical Search Engine for Educational Resources
2017 International Conference on Arts and Design, Education and Social Sciences (ADESS 2017) ISBN: 978-1-60595-511-7 Research and Design of Key Technology of Vertical Search Engine for Educational Resources
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationInternational Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 02, February -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Survey
More informationTime Series Clustering Ensemble Algorithm Based on Locality Preserving Projection
Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology
More informationUsing Gini-index for Feature Weighting in Text Categorization
Journal of Computational Information Systems 9: 14 (2013) 5819 5826 Available at http://www.jofcis.com Using Gini-index for Feature Weighting in Text Categorization Weidong ZHU 1,, Yongmin LIN 2 1 School
More informationA Supervised Method for Multi-keyword Web Crawling on Web Forums
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationEXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES
EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES B. GEETHA KUMARI M. Tech (CSE) Email-id: Geetha.bapr07@gmail.com JAGETI PADMAVTHI M. Tech (CSE) Email-id: jageti.padmavathi4@gmail.com ABSTRACT:
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationDiscovering Advertisement Links by Using URL Text
017 3rd International Conference on Computational Systems and Communications (ICCSC 017) Discovering Advertisement Links by Using URL Text Jing-Shan Xu1, a, Peng Chang, b,* and Yong-Zheng Zhang, c 1 School
More informationAn Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data
An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,
More informationProfile Based Information Retrieval
Profile Based Information Retrieval Athar Shaikh, Pravin Bhjantri, Shankar Pendse,V.K.Parvati Department of Information Science and Engineering, S.D.M.College of Engineering & Technology, Dharwad Abstract-This
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationThe application of OLAP and Data mining technology in the analysis of. book lending
2nd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2017) The application of OLAP and Data mining technology in the analysis of book lending Xiao-Han Zhou1,a,
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationA Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering
A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering R.Dhivya 1, R.Rajavignesh 2 (M.E CSE), Department of CSE, Arasu Engineering College, kumbakonam 1 Asst.
More informationKeywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationInternational Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)
I J E E E C International Journal of Electrical, Electronics ISSN No. (Online): 2277-2626 Computer Engineering 3(2): 85-90(2014) Robust Approach to Recognize Localize Text from Natural Scene Images Khushbu
More informationICRCS at Intent2: Applying Rough Set and Semantic Relevance for Subtopic Mining
ICRCS at Intent2: Applying Rough Set and Semantic Relevance for Subtopic Mining Xiao-Qiang Zhou, Yong-Shuai Hou, Xiao-Long Wang, Bo Yuan, Yao-Yun Zhang Key Laboratory of Network Oriented Intelligent Computation
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationText Clustering Incremental Algorithm in Sensitive Topic Detection
International Journal of Information and Communication Sciences 2018; 3(3): 88-95 http://www.sciencepublishinggroup.com/j/ijics doi: 10.11648/j.ijics.20180303.12 ISSN: 2575-1700 (Print); ISSN: 2575-1719
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationLET:Towards More Precise Clustering of Search Results
LET:Towards More Precise Clustering of Search Results Yi Zhang, Lidong Bing,Yexin Wang, Yan Zhang State Key Laboratory on Machine Perception Peking University,100871 Beijing, China {zhangyi, bingld,wangyx,zhy}@cis.pku.edu.cn
More informationCategorization of Sequential Data using Associative Classifiers
Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,
More informationThe Design of Model for Tibetan Language Search System
International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationAn Improved PageRank Method based on Genetic Algorithm for Web Search
Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web
More information2 Ontology evolution algorithm based on web-pages and users behavior logs
ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.18(2014) No.1,pp.86-91 Ontology Evolution Algorithm for Topic Information Collection Jing Ma 1, Mengyong Sun 1,
More informationDesign and Implementation of Search Engine Using Vector Space Model for Personalized Search
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationWeb Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques
Web Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques Imgref: https://www.kdnuggets.com/2014/09/most-viewed-web-mining-lectures-videolectures.html Contents Introduction
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationInformation Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining
Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining 1 Vishakha D. Bhope, 2 Sachin N. Deshmukh 1,2 Department of Computer Science & Information Technology, Dr. BAM
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationAn Efficient Approach for Color Pattern Matching Using Image Mining
An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationInternational Journal of Software and Web Sciences (IJSWS) Web service Selection through QoS agent Web service
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationAnalysis on the technology improvement of the library network information retrieval efficiency
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the
More informationFace Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN
2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine
More informationRanking Web Pages by Associating Keywords with Locations
Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn
More informationImplementation of Smart Question Answering System using IoT and Cognitive Computing
Implementation of Smart Question Answering System using IoT and Cognitive Computing Omkar Anandrao Salgar, Sumedh Belsare, Sonali Hire, Mayuri Patil omkarsalgar@gmail.com, sumedhbelsare@gmail.com, hiresoni278@gmail.com,
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationTowards New Heterogeneous Data Stream Clustering based on Density
, pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn
More informationDeep Web Content Mining
Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased
More informationCreating a Classifier for a Focused Web Crawler
Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationCorrelation Based Feature Selection with Irrelevant Feature Removal
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationDatasets Size: Effect on Clustering Results
1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationThe Research and Design of the Application Domain Building Based on GridGIS
Journal of Geographic Information System, 2010, 2, 32-39 doi:10.4236/jgis.2010.21007 Published Online January 2010 (http://www.scirp.org/journal/jgis) The Research and Design of the Application Domain
More informationMultisource Remote Sensing Data Mining System Construction in Cloud Computing Environment Dong YinDi 1, Liu ChengJun 1
4th International Conference on Computer, Mechatronics, Control and Electronic Engineering (ICCMCEE 2015) Multisource Remote Sensing Data Mining System Construction in Cloud Computing Environment Dong
More informationAN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH
AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-
More informationA Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations
IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.1, January 2013 1 A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations Hiroyuki
More informationISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com
More informationDesign of student information system based on association algorithm and data mining technology. CaiYan, ChenHua
5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017) Design of student information system based on association algorithm and data mining technology
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationTest Model for Text Categorization and Text Summarization
Test Model for Text Categorization and Text Summarization Khushboo Thakkar Computer Science and Engineering G. H. Raisoni College of Engineering Nagpur, India Urmila Shrawankar Computer Science and Engineering
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationInternational Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14
International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14 DESIGN OF AN EFFICIENT DATA ANALYSIS CLUSTERING ALGORITHM Dr. Dilbag Singh 1, Ms. Priyanka 2
More informationApplication of Individualized Service System for Scientific and Technical Literature In Colleges and Universities
Journal of Applied Science and Engineering Innovation, Vol.6, No.1, 2019, pp.26-30 ISSN (Print): 2331-9062 ISSN (Online): 2331-9070 Application of Individualized Service System for Scientific and Technical
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationText Document Clustering Using DPM with Concept and Feature Analysis
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationRemotely Sensed Image Processing Service Automatic Composition
Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
More informationInternational Journal of Modern Engineering and Research Technology
Volume 2, Issue 3, July 2015 ISSN: 2348-8565 (Online) International Journal of Modern Engineering and Research Technology Website: http://www.ijmert.org Line Up: A Technique for Semantic-Synaptic Synaptic
More informationLITERATURE SURVEY ON SEARCH TERM EXTRACTION TECHNIQUE FOR FACET DATA MINING IN CUSTOMER FACING WEBSITE
International Journal of Civil Engineering and Technology (IJCIET) Volume 8, Issue 1, January 2017, pp. 956 960 Article ID: IJCIET_08_01_113 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=8&itype=1
More informationANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining
ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationA Model of Machine Learning Based on User Preference of Attributes
1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada
More informationA Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering
A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering Gurpreet Kaur M-Tech Student, Department of Computer Engineering, Yadawindra College of Engineering, Talwandi Sabo,
More informationAutomation of URL Discovery and Flattering Mechanism in Live Forum Threads
Automation of URL Discovery and Flattering Mechanism in Live Forum Threads T.Nagajothi 1, M.S.Thanabal 2 PG Student, Department of CSE, P.S.N.A College of Engineering and Technology, Tamilnadu, India 1
More informationThe Research of A multi-language supporting description-oriented Clustering Algorithm on Meta-Search Engine Result Wuling Ren 1, a and Lijuan Liu 2,b
Applied Mechanics and Materials Online: 2012-01-24 ISSN: 1662-7482, Vol. 151, pp 549-553 doi:10.4028/www.scientific.net/amm.151.549 2012 Trans Tech Publications, Switzerland The Research of A multi-language
More informationA FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationNews Filtering and Summarization System Architecture for Recognition and Summarization of News Pages
Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---
More informationAN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES
AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES ABSTRACT Wael AlZoubi Ajloun University College, Balqa Applied University PO Box: Al-Salt 19117, Jordan This paper proposes an improved approach
More informationTheme Identification in RDF Graphs
Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published
More informationISSN: [Sugumar * et al., 7(4): April, 2018] Impact Factor: 5.164
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED PERFORMANCE OF STEMMING USING ENHANCED PORTER STEMMER ALGORITHM FOR INFORMATION RETRIEVAL Ramalingam Sugumar & 2 M.Rama
More informationPRIS at TAC2012 KBP Track
PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationKeywords: clustering algorithms, unsupervised learning, cluster validity
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based
More informationFuzzy C-means Clustering with Temporal-based Membership Function
Indian Journal of Science and Technology, Vol (S()), DOI:./ijst//viS/, December ISSN (Print) : - ISSN (Online) : - Fuzzy C-means Clustering with Temporal-based Membership Function Aseel Mousa * and Yuhanis
More informationInternational Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More information