INTRODUCTION. Chapter GENERAL
|
|
- Blaze Reynolds
- 5 years ago
- Views:
Transcription
1 Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which people communicate with each other as well as with machines. Since its inception in 1989, it is connecting people from all walks of life from anywhere in the world who are crossing their paths in one way or the other. As a result, Internet has become global village [2] with 16 million people surfing the web in December 1995 to 2095 million people in March 2011 [3]. The ever increasing interest of the people over the information spread across WWW, has led to the development of the other interlinked field Information retrieval. Information Retrieval (IR) [4, 5] is the area concerned with retrieving information about a subject from a collection of data objects. IR is different from Data Retrieval, which in the context of documents consists mainly in searching which documents of the collection contain keywords of a user query. IR deals with finding information needed by the user. The WWW has distinctive properties. For example, it is extremely complex, massive in size, and highly dynamic in nature. Owing to this unique nature, the ability to search and retrieve information from the Web efficiently and effectively is a challenging task especially when the goal is to realize its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte-size precompiled web indexes in a fraction of a second. But effective retrieval of information is still a developing research area.
2 Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. This is where Information retrieval search engines have to focus. In general, search engines allow users to search information by submitting queries in the form of keywords in the search interface. The search engines in turn retrieve the links of relevant information. The brief overview of the search engine process is discussed in the next section. 1.2 SEARCH ENGINE: AN INFORMATION RETRIEVAL TOOL In general, search engines [6] allow users to search documents by submitting queries in the form of keywords in the search interface. The search engines in turn retrieve the links to relevant documents. Broadly, the working of the search engine components can be divided into two modules as shown in Fig 1.1: Query Independent Module and Query Dependent Module. Search Engine Knowledge Base Query Dependent Module Crawling Indexing Query Processor Ranking Fig 1.1 Classification of Search Engine As can be seen from Fig 1.1, at operational level, search engines [7] comprise of following four major components: 2
3 Crawler Indexer Query Processor Ranking The brief discussion of each component is as follows: 1. Crawler: It is an automated web browser [8] that follows every hyperlink on the various web sites of WWW to retrieve web pages. These web pages are stored at the search engine s side in databases. Therefore, crawler is a query independent module. The contents of each page stored at the search engine s side are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called metatags). Meta information about the web pages is stored in an index for later queries. 2. Indexer: It is also a query independent module. Search engine indexing [10] is done after web crawler has stored documents in the search engine s database. Indexer analyzes the documents for extracting out important terms for creating an appropriate index for fast retrieval of the documents against user queries. 3. Query Processor: It is a query dependent module [11] which uses search engine s index to consult the database for retrieval of related documents. The searching process also manages log files to optimize both indexer and the crawler towards providing information about active set of pages which are actually seen and sought by users. 4. Ranking: It is a query dependent process [12]. The active sets of pages, before presenting to the users, are first ranked according to a ranking strategy. Since, different search engines follow different strategies; the search process may lead to different active sets of pages. The search engines are continuously being evolved to improve the effectiveness of the active sets of web pages returned to the user against their submitted query. Even after adopting the complex algorithms/strategies at both the query independent and query dependent modules, the user is presented with the huge list of information for their 3
4 submitted query. To tackle with this problem of information overkill, the results presented to the user must be refined. This leads to the necessity of developing various automated tools at the server side or the client side. In the following section, the role of web mining and web prefetching the documents to retrieve the relevant documents for the user has been highlighted. 1.3 ROLE OF WEB MINING With the constant increase in the amount of information present on WWW, providing relevant information to the user in the least amount of time has become a challenging task. However, the perceived latency* of the user can be effectively reduced by employing web mining techniques in IR. Web mining [13] is the automatic retrieval, extraction and evaluation of information in the form of interesting patterns for knowledge discovery from web documents using data mining techniques. In web mining domain, data is most important based on which the quality of information to be mined depends. There are three types of web data that can be mined: content, usage, and structure. Content includes text and multimedia mining. Usage includes Web log mining which further includes search logs and other usage data and Structure implies analyzing the link structure of the Web. The three type of web data help in determining the depth of web mining domain. Web mining is consistently being improved to reduce this user perceived latency time. A critical look at the available literature indicates that the remedy to reduce this wait time is to prefetching the web documents with the help of suitable prediction techniques. * User perceived latency is the delay from the time a request is issued until response is received. 4
5 1.4 ROLE OF WEB PREFETCHING Although web performance can be improved by introducing caches at the appropriate places but the advantages get limited in the wake of dynamic content present on WWW. In fact the delay in bringing the required information can further be reduced through prefetching web documents precisely. Web Prefetching [14] can be defined as the process of prefetching the web documents from the web servers even before they have been requested by the user. However, if not predicted properly, prefetching can greatly increase overheads to the already overloaded network bandwidth. There are various web prefetching strategies which are being adopted by the researchers to minimize the user perceived latency. Some of the popular strategies are: Popularity based strategies: Predictions are made based on the popularity of the web pages. Semantic prefetching strategies: Content of the web pages is analyzed to make predictions and Statistic prefetching strategies: These make predictions based on the statistics formed from the user sessions. The next section discusses the various problem areas and provides their appropriate solutions. 1.5 PROBLEM IDENTIFICATION The WWW is publicly indexable web. The following characteristics of WWW present researchers with the challenges towards retrieving and mining the information from it. 1. To dig the relevant information: Search engines use crawlers to fetch pages from the WWW which it then stores and indexes. Based on their popularity, these indexed documents are ranked. However, the problem with the current search 5
6 engines is that they consider only the popularity i.e. their forward and backward links. Whereas, there is a possibility that the more relevant documents which may be less popular according to the user s query are left out. Thus, a technique needs to be developed that considers the user s query in order to find out more relevant information as user s query is more important as compared to links in the web pages. Solution: This problem is solved by introducing a mechanism that considers not only the popularity of the web pages but also significantly considers the much needed relevancy of the user s submitted query. The proposed mechanism has infact improved over the google s PageRank method. 2. High User Perceived Latency: The delay in response (i.e. the time when user submits the query and he/she receives the results for the submitted query) perceived by users in retrieving the web objects is known as User Perceived Latency. Due to increase in the size of WWW, this delay increases. As a result, users experience long waits to meet their requests over the web. Hence, the need to develop a technique that can effectively reduce this latency. Solution: In this thesis, a framework named Predictive Prefetching Engine (PPE) has been introduced at the search engine side and proxy side which reduces the user s perceived latency by prefetching the relevant web pages based on the users past browsing experience. The important feature of this framework is that it adds least burden on the additional network bandwidth requirements as it makes its predictions based on the rules that are generated dynamically depending on the size of the database. 3. Lack of personalization of WWW: With the exponential growth of WWW and its users, it becomes very difficult to retrieve the information that is looked into by particular groups of users. For example, employees of an organization may need similar type of information. Therefore, a need for a mechanism is strongly felt that can personalize the contents of WWW according to the groups of users. 6
7 Solution: In order to solve this problem, a mechanism has been proposed that works on different groups of users in order to provide personalized information. This is done by designing an agent based mechanism that activates the agents for different groups after identifying the incoming user s request from a particular IP address. 4. Information Overkill: Even with the introduction of prefetching mechanism that aims to reduce the user perceived latency, unsuccessful predictions made to prefetch the pages may result in information overkill. Thus, a mechanism is required that could actually make credible predictions for only those pages that are more relevant, i.e. make correct predictions to minimize the problem of information overkill. Solution: In order to minimize this problem of information overkill, k-order Markov Predictors have been used in the proposed work. These predictors generate the rules that are refined with each increasing level of k, thus generating more and more relevant predictions of web pages. 5. Huge size: The rate of web s growth has been and continues to be exponential. The number of its user has increased from 16 million in 1995 to approximately 2 billion in 2010 [15].This huge size of WWW has transformed it into huge repository of knowledge in which highly diverse information is linked in an extremely complex manner. But still, WWW shows a particular order in the sense that follows a web like structure of the hyperlinks i.e. when web user surfs a web site, the various documents are well arranged through internal hyperlinks. Moreover, the Meta information about the web documents is stored in the web logs providing an inherent order among them. Therefore, this ordering can be exploited in order to mine desired information from WWW. Solution: In order to solve the above said problem, various data mining techniques e.g. association rules, sequential patterns and clustering have been 7
8 applied in the proposed work on the repository of raw knowledge stored in various logs at the proxy and server side. 1.6 ORGANIZATION OF THESIS This thesis focuses on Web prefetching encompassing web mining in general and web usage mining in particular. In this framework, various algorithms have been proposed for designing the effective web prefetching mechanism. The aim of this work had been to design an effective prefetching technique that could predict web pages even before the users have asked for the same with the view to make and change predictions dynamically depending on the database. The thesis has been divided into six main chapters as listed below: Chapter 2 provides an overview of WWW and search engines which are utilized by the users to search information from this publicly indexable sea of web documents. It also provides insight into the literature review on the role of web mining, its application areas, web prefetching techniques and the various strategies followed for prefetching the documents. This chapter also provides the backdrop of the existing work and the challenging areas that need consideration. Chapter 3 focuses on the issue of lack of relevant pages returned to the user by the general search engines for their submitted queries. It happens because search engine s page rank mechanism gives more importance to the popularity of the documents rather than their relevancy. It addresses this issue by introducing the mechanism that considers not only the individual keywords of the user query but also the associations of those keywords within the documents. It proposes the novel algorithm that gives due weightage to both the relevancy and the popularity of the web pages. Chapter 4 proposes the framework for Predictive Prefetching Engine (PPE). This framework has been introduced at the search engine s side where it is known as Search engine side Predictive Prefetching Engine (SPPE) as well as at the proxy side where it is known as Proxy side Predictive Prefetching Engine (PPPE). This 8
9 framework carries out its task of making credible predictions for prefetching web pages in three phases. The first phase introduces a novel approach for clustering the user sessions obtained after preprocessing the user transactions present in the server/proxy logs. The second phase adopts the mechanism for applying k-order Markov Predictors for determining the rules that govern the predictions for prefetching the web documents. The third phase is the Rule Activator phase which makes use of agent based approach to fire the right set of rules thus prefetching the web documents likely to be used by the user. Chapter 5 presents the implementation details and the analysis of PPE. The PPE has been implemented in Java using Eclipse IDE for Java Developer s version The chapter also verifies the rules formed for prediction of web pages using Zipf s Law thereby proving the accuracy of the rules formed. Chapter 6 concludes the outcome of the work. Major achievements have been highlighted in this chapter. Further, it also endeavors to explore the possibilities of the future research work in this area. Appendix A briefly explains the Zipf s Law. Bibliography includes references to publications in this area. 9
EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationCHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER
CHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER 4.1 INTRODUCTION In 1994, the World Wide Web Worm (WWWW), one of the first web search engines had an index of 110,000 web pages [2] but
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationA B2B Search Engine. Abstract. Motivation. Challenges. Technical Report
Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over
More informationDATA MINING II - 1DL460. Spring 2017
DATA MINING II - 1DL460 Spring 2017 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt17 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationFILTERING OF URLS USING WEBCRAWLER
FILTERING OF URLS USING WEBCRAWLER Arya Babu1, Misha Ravi2 Scholar, Computer Science and engineering, Sree Buddha college of engineering for women, 2 Assistant professor, Computer Science and engineering,
More informationAn Improved Markov Model Approach to Predict Web Page Caching
An Improved Markov Model Approach to Predict Web Page Caching Meenu Brala Student, JMIT, Radaur meenubrala@gmail.com Mrs. Mamta Dhanda Asstt. Prof, CSE, JMIT Radaur mamtanain@gmail.com Abstract Optimization
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationSemantic Website Clustering
Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationarxiv: v3 [cs.ni] 3 May 2017
Modeling Request Patterns in VoD Services with Recommendation Systems Samarth Gupta and Sharayu Moharir arxiv:1609.02391v3 [cs.ni] 3 May 2017 Department of Electrical Engineering, Indian Institute of Technology
More informationConclusions. Chapter Summary of our contributions
Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationContext Based Web Indexing For Semantic Web
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93 Anchal Jain 1 Nidhi Tyagi 2 Lecturer(JPIEAS) Asst. Professor(SHOBHIT
More informationWorld Wide Web has specific challenges and opportunities
6. Web Search Motivation Web search, as offered by commercial search engines such as Google, Bing, and DuckDuckGo, is arguably one of the most popular applications of IR methods today World Wide Web has
More informationData Mining of Web Access Logs Using Classification Techniques
Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,
More informationCHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1)
71 CHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1) 4.1 INTRODUCTION One of the prime research objectives of this thesis is to optimize
More informationWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationCharacterizing Web Usage Regularities with Information Foraging Agents
Characterizing Web Usage Regularities with Information Foraging Agents Jiming Liu 1, Shiwu Zhang 2 and Jie Yang 2 COMP-03-001 Released Date: February 4, 2003 1 (corresponding author) Department of Computer
More informationSOM Improved Neural Network Approach for Next Page Prediction
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.175
More informationAccWeb Improving Web Performance via Prefetching
AccWeb Improving Web Performance via Prefetching Qizhe Cai Wei Hu Yueyang Qiu {qizhec,huwei,yqiu}@cs.princeton.edu Abstract We present AccWeb (Accelerated Web), a web service that improves user experience
More informationA New Technique for Ranking Web Pages and Adwords
A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data
More informationInternational Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine
International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationA Review Paper on Web Usage Mining and Pattern Discovery
A Review Paper on Web Usage Mining and Pattern Discovery 1 RACHIT ADHVARYU 1 Student M.E CSE, B. H. Gardi Vidyapith, Rajkot, Gujarat, India. ABSTRACT: - Web Technology is evolving very fast and Internet
More informationChapter 2: Literature Review
Chapter 2: Literature Review 2.1 Introduction Literature review provides knowledge, understanding and familiarity of the research field undertaken. It is a critical study of related reviews from various
More informationPath Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff Dr Ahmed Rafea
Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff http://www9.org/w9cdrom/68/68.html Dr Ahmed Rafea Outline Introduction Link Analysis Path Analysis Using Markov Chains Applications
More informationData warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3
International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 170 Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 1 M.Tech.
More informationCOMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationWeb Crawling As Nonlinear Dynamics
Progress in Nonlinear Dynamics and Chaos Vol. 1, 2013, 1-7 ISSN: 2321 9238 (online) Published on 28 April 2013 www.researchmathsci.org Progress in Web Crawling As Nonlinear Dynamics Chaitanya Raveendra
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationWeb Mining Using Cloud Computing Technology
International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain
More informationTHE WEB SEARCH ENGINE
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationEnhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering Recommendation Algorithms
International Journal of Mathematics and Statistics Invention (IJMSI) E-ISSN: 2321 4767 P-ISSN: 2321-4759 Volume 4 Issue 10 December. 2016 PP-09-13 Enhanced Web Usage Mining Using Fuzzy Clustering and
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationImproved Data Preparation Technique in Web Usage Mining
International Journal of Computer Networks and Communications Security VOL.1, NO.7, DECEMBER 2013, 284 291 Available online at: www.ijcncs.org ISSN 2308-9830 C N C S Improved Data Preparation Technique
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationAn Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs
An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,
More informationSearching the Web What is this Page Known for? Luis De Alba
Searching the Web What is this Page Known for? Luis De Alba ldealbar@cc.hut.fi Searching the Web Arasu, Cho, Garcia-Molina, Paepcke, Raghavan August, 2001. Stanford University Introduction People browse
More informationWeb Mining Evolution & Comparative Study with Data Mining
Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India
More informationA GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE
A GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE Sanjib Kumar Sahu 1, Vinod Kumar J. 2, D. P. Mahapatra 3 and R. C. Balabantaray 4 1 Department of Computer
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationExperimental study of Web Page Ranking Algorithms
IOSR IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. II (Mar-pr. 2014), PP 100-106 Experimental study of Web Page Ranking lgorithms Rachna
More informationDomain Based Categorization Using Adaptive Preprocessing
Domain Based Categorization Using Adaptive Preprocessing Anam Nikhil 1, Supriye Tiwari 2, Ms. Arti Deshpande 3, Deepak Kaul 4, Saurabh Gaikwad 5 Abstract: As the number users accessing network for various
More information= a hypertext system which is accessible via internet
10. The World Wide Web (WWW) = a hypertext system which is accessible via internet (WWW is only one sort of using the internet others are e-mail, ftp, telnet, internet telephone... ) Hypertext: Pages of
More informationA crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.
A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program,
More informationPERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM
PERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM Ajit Aher, Rahul Rohokale, Asst. Prof. Nemade S.B. B.E. (computer) student, Govt. college of engg. & research
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationSearching the Web [Arasu 01]
Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationWeb Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University
Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationA SURVEY ON WEB LOG MINING AND PATTERN PREDICTION
A SURVEY ON WEB LOG MINING AND PATTERN PREDICTION Nisha Soni 1, Pushpendra Kumar Verma 2 1 M.Tech.Scholar, 2 Assistant Professor, Dept.of Computer Science & Engg. CSIT, Durg, (India) ABSTRACT Web sites
More informationInformation Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationImplementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky
Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky The Chinese University of Hong Kong Abstract Husky is a distributed computing system, achieving outstanding
More informationAn Introduction to Search Engines and Web Navigation
An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More information5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search
Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page
More informationIJITKMSpecial Issue (ICFTEM-2014) May 2014 pp (ISSN )
A Review Paper on Web Usage Mining and future request prediction Priyanka Bhart 1, Dr.SonaMalhotra 2 1 M.Tech., CSE Department, U.I.E.T. Kurukshetra University, Kurukshetra, India 2 HOD, CSE Department,
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationSEARCH ENGINE INSIDE OUT
SEARCH ENGINE INSIDE OUT From Technical Views r86526020 r88526016 r88526028 b85506013 b85506010 April 11,2000 Outline Why Search Engine so important Search Engine Architecture Crawling Subsystem Indexing
More informationIMPROVING WEB SERVER PERFORMANCE USING TWO-TIERED WEB CACHING
IMPROVING WEB SERVER PERFORMANCE USING TWO-TIERED WEB CACHING 1 FAIRUZ S. MAHAD, 2 WAN M.N. WAN-KADIR Software Engineering Department, Faculty of Computer Science & Information Systems, University Teknologi
More informationAn Algorithm for user Identification for Web Usage Mining
An Algorithm for user Identification for Web Usage Mining Jayanti Mehra 1, R S Thakur 2 1,2 Department of Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, MP, India
More informationDisambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity
Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity Tomáš Kramár, Michal Barla and Mária Bieliková Faculty of Informatics and Information Technology Slovak University
More informationWEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE
WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,
More informationMODELING USER INTERESTS FROM WEB BROWSING ACTIVITIES. Team 11. research paper review: author: Fabio Gasparetti publication date: November 1, 2016
research paper review: MODELING USER INTERESTS FROM WEB BROWSING ACTIVITIES author: Fabio Gasparetti publication date: November 1, 2016 Team 11 Angelique Elkins Jim Saeturn Michael Yang BACKGROUND & PROBLEM
More informationAN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT
AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT Brindha.S 1 and Sabarinathan.P 2 1 PG Scholar, Department of Computer Science and Engineering, PABCET, Trichy 2 Assistant Professor,
More informationContext-based Navigational Support in Hypermedia
Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
More informationAnalytical survey of Web Page Rank Algorithm
Analytical survey of Web Page Rank Algorithm Mrs.M.Usha 1, Dr.N.Nagadeepa 2 Research Scholar, Bharathiyar University,Coimbatore 1 Associate Professor, Jairams Arts and Science College, Karur 2 ABSTRACT
More informationWeb Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web
Web Usage Mining Overview Session 1 This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web 1 Outline 1. Introduction 2. Preprocessing 3. Analysis 2 Example
More informationSathyamangalam, 2 ( PG Scholar,Department of Computer Science and Engineering,Bannari Amman Institute of Technology, Sathyamangalam,
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 8, Issue 5 (Jan. - Feb. 2013), PP 70-74 Performance Analysis Of Web Page Prediction With Markov Model, Association
More informationINDEXING FOR DOMAIN SPECIFIC HIDDEN WEB
International Journal of Computer Engineering and Applications, Volume VII, Issue I, July 14 INDEXING FOR DOMAIN SPECIFIC HIDDEN WEB Sudhakar Ranjan 1,Komal Kumar Bhatia 2 1 Department of Computer Science
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationAutomated Path Ascend Forum Crawling
Automated Path Ascend Forum Crawling Ms. Joycy Joy, PG Scholar Department of CSE, Saveetha Engineering College,Thandalam, Chennai-602105 Ms. Manju. A, Assistant Professor, Department of CSE, Saveetha Engineering
More informationCRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA
CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA An Implementation Amit Chawla 11/M.Tech/01, CSE Department Sat Priya Group of Institutions, Rohtak (Haryana), INDIA anshmahi@gmail.com
More informationSEO Technical & On-Page Audit
SEO Technical & On-Page Audit http://www.fedex.com Hedging Beta has produced this analysis on 05/11/2015. 1 Index A) Background and Summary... 3 B) Technical and On-Page Analysis... 4 Accessibility & Indexation...
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationDeep Web Crawling and Mining for Building Advanced Search Application
Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech
More informationChapter 3 Process of Web Usage Mining
Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge
More informationOntology-based Architecture Documentation Approach
4 Ontology-based Architecture Documentation Approach In this chapter we investigate how an ontology can be used for retrieving AK from SA documentation (RQ2). We first give background information on the
More informationReceived: 15/04/2012 Reviewed: 26/04/2012 Accepted: 30/04/2012
Exploring Deep Web Devendra N. Vyas, Asst. Professor, Department of Commerce, G. S. Science Arts and Commerce College Khamgaon, Dist. Buldhana Received: 15/04/2012 Reviewed: 26/04/2012 Accepted: 30/04/2012
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationPROXY DRIVEN FP GROWTH BASED PREFETCHING
PROXY DRIVEN FP GROWTH BASED PREFETCHING Devender Banga 1 and Sunitha Cheepurisetti 2 1,2 Department of Computer Science Engineering, SGT Institute of Engineering and Technology, Gurgaon, India ABSTRACT
More informationA Web Page Recommendation system using GA based biclustering of web usage data
A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,
More informationA Survey on Information Extraction in Web Searches Using Web Services
A Survey on Information Extraction in Web Searches Using Web Services Maind Neelam R., Sunita Nandgave Department of Computer Engineering, G.H.Raisoni College of Engineering and Management, wagholi, India
More informationAutomated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationSmartcrawler: A Two-stage Crawler Novel Approach for Web Crawling
Smartcrawler: A Two-stage Crawler Novel Approach for Web Crawling Harsha Tiwary, Prof. Nita Dimble Dept. of Computer Engineering, Flora Institute of Technology Pune, India ABSTRACT: On the web, the non-indexed
More informationA Framework for Predictive Web Prefetching at the Proxy Level using Data Mining
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.6, June 2008 303 A Framework for Predictive Web Prefetching at the Proxy Level using Data Mining Jyoti Pandey 1, Amit Goel
More informationDetermining the Number of CPUs for Query Processing
Determining the Number of CPUs for Query Processing Fatemah Panahi Elizabeth Soechting CS747 Advanced Computer Systems Analysis Techniques The University of Wisconsin-Madison fatemeh@cs.wisc.edu, eas@cs.wisc.edu
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com
More information