Mining Information from Temporal Behavior of Web Usage
|
|
- Rosa Curtis
- 5 years ago
- Views:
Transcription
1 Mining Information from Temporal Behavior of Web Usage Prasanna Desikan and Jaideep Srivastava Department of Computer Science University of Minnesota. Abstract Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that include Web Search, Web Classification, Web Personalization, Adaptive Web Sites etc. Mining Web structure data has resulted in variety of hyperlink based algorithms to rank results of a query. Similarly, Web usage data has been used to identify user-sessions and cluster them for better prediction of user navigation patterns. Most research on Web mining has so far been from a data-centric point of view. In this project we examine the temporal dimension of the Web usage data. We study in particular the behavior of Web usage data over a period of time and cluster pages that follow similar access patterns. Such kind of analysis could be useful for targetmarketing based on time or for web services optimization. In the second part of the project, we define a new measure called Page Popularity that counts the number of hits to Web pages during a certain time period and giving more weight to the pages that have been accessed frequently during a recent period of time. This kind of analysis helps in identifying emerging popular topics and brings down the bias on any topic that is obsolete but has been accessed a lot during an earlier period of time.
2 1. Introduction Web Mining, defined as the application of data mining techniques to extract information from the World Wide Web, has been classified into three sub-fields: Web Content Mining, Web Structure Mining and Web Usage Mining based on the kind of the data available. This kind of classification is represented in Figure 1. While the Web Content provides the actual textual and other multimedia information, the Web Structure reflects the organization of the Web documents and thus helping in determining their relative importance. Web Structure has been exploited to extract information about the quality of Web pages in the Web. Traditionally, information provided by Web content combined with the Web Structure has been used in the context of search and ranking pages returned by a search result for a query. The stability of the Web structure led to the more research related to Hyperlink Analysis and the field gained more recognition with the advent of Google. Desikan et al provide an extensive survey on Hyperlink Analysis is provided. Structural information has also been used for focused crawling deciding the pages that need to be crawled first. The Web content and structure information have been successfully combined to classify Web pages according to various topics or to identify the topics that a page is known for. The Web structure information has also been applied to identify group of Web pages that share a certain set ideas, called Web Communities. Thus, most of the initial research on Web Mining was focused on Web content and later Web Structure. Data Source Used Figure 1: Web Mining Taxonomy The third kind of Web data, Web Usage reveals the users surfing patterns that has been of interest for a variety of applications. The Web has been widely used for different kinds of personal, business and professional applications that depend on user interactions in the Web. This has increased the need for understanding the users interests and his browsing behavior. The Web Usage data has thus received much attention in the recent times to study human behavior. Srivastava et al [4] provide a survey on Web Usage Mining, identifying the different kinds of Web Usage data, their sources and also provide a taxonomy for the major application areas in Web Usage Mining. At a high level, Web Usage Mining can be divided into three categories depending on the kind of data: Web Server Data: They correspond to the user logs that are collected at Web server. They contain information about the IP address from which the request was made, the time of request, the URIs of the requested and referral documents and the type pf agent that sent the request. Application Server data: The data that is generated by dynamically by the various application servers such as the.asp and.jsp files that allow certain applications to be built on top of them and collect the information that results due to certain user actions on the application. 2
3 Application Level Data: The data that is provided by the user for an application, such as demographic data. These kinds of data can be logged for each user or event and can be later used to derive useful information. Web Mining research has thus focused more recently on Web Structure and Web Usage. In this project we focus on another important dimension of Web Mining as identified by [5] - the Temporal Evolution of the Web. The Web is changing fast over time and so is the users interaction in the Web suggesting the need to study and develop models for the evolving Web Content, Web Structure and Web Usage. Concept 1 Concept 2 (a) (b) (c) Figure 2: Temporal Evolution of a single Web Document (a) Change in the Web Content of a document over time. (b) Change in the Web Structure i.e. number of inlinks and outlinks; of a document over time. (c) Change in Web Usage of the document over time. The need to study the Temporal Evolution of the Web, understand the change in the user behavior and interaction in the World Wide Web has motivated us to analyze the Web Usage data. We use the user - logs obtained from the Web server to study the evolution of the usage of Web documents over time. We perform two kinds of analysis: Temporal Concepts: We first cluster Web pages that have similar access patterns over a period of time and look at Web pages that have similar access patterns during the time period and see how they are related and if they represent any concept or related concepts or any other useful information. Page Popularity: We define a measure for the popularity of a page proportional to the number of hits to the page during the time period with more weight to the recent history. 3
4 We finally compare the results of this measure compares to the some of the other popular existing measures to rank Web pages. The experimental results reflect noticeable difference in the rankings. While the usage based ranking metrics, boost up the ranks of the pages that are used as opposed to the pure hyperlink based metrics that rank pages that are used rarely high. In particular, we notice that the Page Popularity ranks the pages that have been used more recently high and brings down the rank of the pages that have been used earlier but have had very low access during the recent period. The rest of the document is organized as follows: In Section 2 we talk about the related work in this area and in the following section, we discuss the approach followed by us. Section 4 discusses the experiments performed and the results. In section 5 we analyze the results and finally in Section 6, we conclude and provide future directions. 2. Related Work In our approach, we take into account pure Web usage data to extract the temporal behavior patterns of Web pages. Web usage data has been a major source of information and has been studied extensively during the recent times. Understanding user profiles and user navigation patterns for better adaptive web sites and predicting user access patterns has been of significant interest to the research and the business community. Cooley et al in [6] and [7] discuss methods to pre-process the user log data and to separate web page references into those made for navigational purposes and those made for content purposes. User navigation patterns have evoked much interest and have been studied by various other researchers [9], [10]. Srivastava et al [4] discuss the techniques to pre-process the usage and content data, discover patterns from them and filter out the non-relevant and uninteresting patterns discovered. [8,4] also serve as good surveys for web usage mining. As discussed earlier usage statistics has been applied to hyperlink structure for better link prediction in field of adaptive web sites. The concept of adaptive web sites was proposed by Pekrowitz and Etzioni [11]. Pirolli and Pitkow [12] discuss about predicting user-browsing behavior based on past surfing paths using Markov models. In [13] Ramesh Sarukkai has discussed about link prediction and path analysis for better user navigations. He proposes a Markov chain model to predict the user access pattern based on the user access logs previously collected. Zhu et al. [14] extend this by introducing the maximal forward reference to eliminate the effect of backward references by the user. They also predict user behavior within the n future steps, using a N-Step Markov chain as opposed to the one step approach by Sarukkai. Information foraging theory concepts have also been used recently by Chi et al [15] to incorporate user behavior into the existing content and link structure. They have modeled user needs and user actions using the notion of Information Scent as described earlier. Cadez et al in [16] cluster users with similar navigation paths in the same site. They develop a visualization methodology to display paths for the users within each cluster. They use first order Markov models for clustering, to take into account the order in which the user requests the page. Huang et al in [17] present a Cube-Model to represent Web access sessions for data mining. They use K-modes algorithm to cluster sessions described as sequence of page URL Ids. On the other hand, in the area of Web structure mining there has been a lot of research on ranking of Web pages using hyperlink analysis. There have been different hyperlink based methods that have been proposed. Page Rank is a metric for ranking hypertext documents that determines the quality of these documents. Page et al. [18] developed this metric for the popular search engine, Google [19]. The key idea is that a page has high rank if it is pointed to by many highly ranked pages. So the rank of a page depends upon the ranks of the pages pointing to it. This process is done iteratively till the rank of all the pages is determined. The rank of a page p can thus be written as: 4
5 PR ( p ) = d n + ( 1 d ) Here, n is the number of nodes in the graph and OutDegree(q) is the number of hyperlinks on page q. Intuitively, the approach can be viewed as a stochastic analysis of a random walk on the Web graph. The first term in the right hand side of the equation corresponds to the probability that a random Web surfer arrives at a page p out of nowhere, i.e. (s)he could arrive at the page by typing the URL or from a bookmark, or may have a particular page as his/her homepage. d would then be the probability that a random surfer chooses a URL directly i.e. typing it, using the bookmark list, or by default rather than traversing a link 1. Finally, 1/n corresponds to the uniform probability that a person chooses the page p from the complete set of n pages on the Web. The second term in the right hand side of the equation corresponds to factor contributed by arriving at a page by traversing a link. 1- d is the probability that a person arrives at the page p by traversing a link. The summation corresponds to the sum of the rank contributions made by all the pages that point to the page p. The rank contribution is the Page Rank of the page multiplied by the probability that a particular link on the page is traversed. So for any page q pointing to page p, the probability that the link pointing to page p is traversed would be 1/OutDegree(q), assuming all links on the page is chosen with uniform probability. The other popular metric is Hubs and Authorities. They can be viewed as fans and centers in a bipartite core of a Web graph. The hub and authority scores computed for each Web page indicate the extent to which the Web page serves as a hub pointing to good authority pages or as an authority on a topic pointed to by good hubs. The hub and authority scores for a page are not based on a formula for a single page, but are computed for a set of pages related to a topic using an iterative procedure called HITS algorithm [K1998]. More recently, Oztekin et al [20], proposed Usage Aware PageRank. They modified the basic PageRank metric to incorporate usage information. In their basic approach assigned weights to the links based on the number of traversals on the link, and thus modifying the probability that a user traverses a particular link in the basic PageRank from 1 to W l, where W l is the number of traversals on the OutDegree( q) ( q, p ) G PR ( q ) OutDegree ( q ) OutTraversed(q) link l and OutTravers ed(q) is the total number of traversals of all links from the page q. And also the probability to arrive at a page directly is computed using the usage statistics. The final formula for Usage Aware PageRank is: ( ) ( ) UPR( p) = α d + 1 d UPR q + ( ) ( ) ( ) ( ) + 1 α d W nl 1 d UPR q N OutDegree q W ( ) ( ) l q p G q p G OutTraversed( q) where α is the emphasis factor that decides the weight to be given to the structure versus the usage information 3. Our Approach Our goal is to cluster pages that have similar usage patterns over time and study them. The motivation behind the project was to study how the information on the Web changes over time and how to model such a change in the information. As time changes, the content, structure and usage of a Web page changes. These changes can be modeled both a single page level or for a collection of pages. Looking from a point of view of a single page, the concept that a Web page represents may change or evolve with 1 The parameter d, called the dampening factor, is usually set between 0.1 and 0.2 [19]. 5
6 respect to the time. Also, the basic structure of a page may change, i.e. the number of inlinks and the number of out links may change. Since most structural mining work considers that if a page is pointed to by some other page, them it endorses the view of that page. So as the number of incoming links changes, the topic that the page represents may change with period of time. Similarly the change in the number of out links may reflect the change in the relevance of the page with respect to a certain topic. The usage data is also affected by the content and structural change in a Web page. The usage data brings in information about the topic the page is popular for. And this popularity may or may not be necessarily be reflected by the change in the content of the page or the pages pointing to it. A page s popularity may or may not be affected by the change in its indegree or outdegree. This motivates the need to study the change in the behavior of the Web over a period of time. This idea is not entirely new, the changes to the Web are being recorded by the pioneering Internet Archive project [IA]. Large organizations generally archive (at least portions of) usage data from there Web sites. With these sources of data available, there is a large scope of research to develop techniques for analyzing of how the Web evolves over time. In our project we focus on trying to extract information from the Web usage data inn general and data from Web Server logs to be more specific. H= Total number of hits in past data h = Total Number of hits in rec ent data. N= Total Number of days for which web server logs are analysed Figure 3: Concept of Page Popularity 6
7 We first try to cluster pages based on the total number of hits per day for each Web page. This would cluster pages that have similar access patterns during the given time period. This may reflect pages that are related in some manner, due to which their access patterns have been similar. This kind of analysis will also help in identifying pages that were popular during a certain frame of time. The next thing we bring up in this project is a measure called Page Popularity to determine the popularity of the page in the time period for which we analyzed the data.. In this measure we take into give more weight to the recent history than the past, so as to enable upcoming topics to be ranked better than old topics. Though this kind of a thing could be done by just considering a recent time period of data that would result in loss of information of the old data. So it would be better to consider the usage data for a longer duration and then weigh the recent history more so that there is no loss of information. Considering the old information would be important, specially when doing structure mining, as the web pages are crawled from time to time. So it would be a good idea to store the previous information from the Web graph that existed earlier and also make use the new graph to mine information. This kind of structural information can be obtained for the Internet Archive Project site. We now present the basic idea of Page Popularity as shown in Figure 3. The idea is though the Web page that has the access pattern red may have total number of hits high, the Web page represented by green curve has an increasing usage and so may represent a newer topic or something that is gaining popularity as opposed to the Web page that is represented by the red curve which is no longer used that much. The formula we propose is very naïve at this stage, though it captures the main idea behind the approach. The Page Popularity is defined as: PagePopularity = K ( H + α h) ( H + h) Where K is some constant and H is the total number of hits for a Web page in the time period considered past and h is the number of hits for the same web page in the recent period. α is some parameter that is used to give weight to recent history. α can be varied depending on the importance of the recent data. In our actual implementation, we took the average number of hits during the past time period and the average number of hits in the recent time period. Average was considered as it would neutralize the effect of any sudden spikes or drops in usage per day. If we weigh according to some other scale like linear, such sudden changes may drastically boost or bring down the rank of a page. We considered the first two-thirds of the time as past history and last one-third as recent history. There was no particular reason to choose so, but it seemed a reasonable estimate. We then weighed the hits in the recent history twice as that of the hits in the past history. So in the implementation the formula boils down to: 1 PagePopularity = 3 H ( 2N ) * 3 h N 3 1 = 2N ( H + 4 h) H + h 7
8 4. Data Pre-processing One of the main issues in Web usage mining is Data pre-processing. Web usage data consists of all kinds of access to web pages. The general format of a Web server log data looks is shown in Figure 4. IP Address rfc931 authuser Date and time of request request status bytes referer user agent [09/Mar/2002:00:03: ] "GET /~harum/ HTTP/1.0" Mozilla/4.7 [en] (X11; I; SunOS 5.8 sun4u) IP address: IP address of the remote host. Rfc931: the remote login name of the user. Authuser: the username as which the user has authenticated himself. Date: date and time of the request. Request:the request line exactly as it came from the client. Status: the HTTP response code returned to the client. Bytes: The number of bytes transferred. Referer: The url the client was on before requesting your url. User_agent: The software the client claims to be using. Figure 4: Extended Common Log Format (ECLF) of Web Server log For our experiment, we considered only Web pages with.html extension. We also eliminated robots by considering web pages that did not have Mozilla string in the user-agent field. Inspite of this we noticed some robots like inktomi used Mozilla in the user agent, which we noticed and so removed all data that had slurp/cat string the user agent field. This took care of eliminating most robots and unwanted data. We also pruned data for which the total number of hits was very low i.e. lower than the atleast the number of days in the recent period. This was just to take into account a web page that was started to use in the recent time period and is slowly picking up and so the number of accesses it may have will be low compared to other pages and so if it is a new page it should not be neglected. The data considered was from April through June. We didn t have usage data after June, and for data before April, the CS website had been restructured, so that could mess up the kind of usage data needed for our experiment. The data we used however was good for intuitive purposes as it contained data in end of Spring Semester and then the period between Spring and summer term where the classes had not started full fledged. So this would give us interesting result as the class web page access would change dramatically after the end of a term. So the clustering of web page patterns for at least certain pages should be similar. Else in general it is difficult to find patterns, as most web pages are accessed very randomly. 8
9 5. Experimental Results 5.2 Clustering Interesting patterns 1 High hits during a short interval of time and almost no hits before and after this short period 2 High hits during a short interval of time and lesser traffic during other times 3 Traffic (number of hits) almost none during the latter half of the time period. Figure 5: Clustering of Web pages based on number of hits per day 9
10 The clustering of the Web pages was done using the tool CLUTO [21]. The number of clusters specified was 10. We tried with various number of clusters and of them 10 revealed a decent clustering of pages from the dendogram produced and as shown in Figure 5 Three interesting patterns were found. The kind of Web pages that belong to these clusters is shown in Figure 6. The first cluster belongs to the set of pages that were accessed a lot during a very short period of time. Most of them are some kind of wedding photos that were accessed a lot, suggesting some kind of a wedding event that took place during that time. The cluster of pages is again related to some talk slides of The Twin Cities software process improvement network (Twin-SPIN), that is a regional organization established in January of 1996 as a forum for the free and open exchange of software process improvement experiences and ideas. They seemed to have a talk during that period and hence the access to the slides. The third cluster was the most interesting. It had mostly class web pages and some pages related to Data Mining slides. These set of pages had high access during the first period of time, possibly the spring term and then their access died out. So it seemed the Data Mining web page was accessed, because someone was doing some work related to data mining during that semester, though no Data Mining course was as such not offered. 1 www-users.cs.umn.edu/~ctlu/wedding/speech.html www-users.cs.umn.edu/~gade/boley/re0.html www-users.cs.umn.edu/~gade/boley/wap.html www-users.cs.umn.edu/~ctlu/wedding/photo2.html www-users.cs.umn.edu/~ctlu/wedding/wedding.html www-users.cs.umn.edu/~mjoshi/hpdmtut/sld110.htm www-users.cs.umn.edu/~mjoshi/hpdmtut/sld113.htm www-users.cs.umn.edu/~mjoshi/hpdmtut/sld032.htm. Figure 6: Web pages that belong to the "interesting" clusters. 10
11 5.2 Page Popularity Our next set of results was with respect to the Page Popularity measure. We ranked the web pages in accordance with the Page Rank, Page Popularity, Total Number of hits and Usage Aware PageRank 2. The results are shown in the following figures: e.g These Web pages do not figure in usage based rankings PageRank Results /doc/api/AllNames.html www-users.cs.umn.edu/~ctlu/wedding/sp-web/ /doc/api/AllNames.html www-users.cs.umn.edu/~echi/misc/pictures/ www-users.cs.umn.edu/~safonov/brodsky/ www-users.cs.umn.edu/~mjoshi/hpdmtut/sld001.htm www-users.cs.umn.edu/~mjoshi/hpdmtut/tsld001.htm Figure 7: Ranking Results from PageRank 2 The results of PageRank and Usage Aware PageRank were obtained from Uygar Oztekin, who conducted similar experiments with the usage data in that time period. 11
12 Page Popularity Results e.g ranked 29 th if we count pure hits www-users.cs.umn.edu/~mein/blender/ www-users.cs.umn.edu/~sdier/debian/woody-netinst-test/ www-users.cs.umn.edu/~mjoshi/hpdmtut/ www-users.cs.umn.edu/~dyue/wiihist/ www-users.cs.umn.edu/grad.html www-users.cs.umn.edu/~sdier/debian/woody-netinsttest/releases/ / www-users.cs.umn.edu/~desikan/links.html www-users.cs.umn.edu/~dyue/wiihist/njmassac/nmintro.htm www-users.cs.umn.edu/~bentlema/unix/ www-users.cs.umn.edu/~echi/papers.html www-users.cs.umn.edu/~wadhwa/bits/ www-users.cs.umn.edu/~konstan/brs97-gl.html www-users.cs.umn.edu/~kazar/ www-users.cs.umn.edu/~mjoshi/hpdmtut/sld001.htm www-users.cs.umn.edu/~karypis/metis/ Figure 8: Page Popularity based rankings Total Hits based rankings www-users.cs.umn.edu/~mein/blender/ www-users.cs.umn.edu/~sdier/debian/woody-netinst-test/ www-users.cs.umn.edu/~wadhwa/bits/ www-users.cs.umn.edu/~mjoshi/hpdmtut/ www-users.cs.umn.edu/grad.html www-users.cs.umn.edu/~dyue/wiihist/ www-users.cs.umn.edu/~desikan/links.html www-users.cs.umn.edu/~dyue/wiihist/njmassac/nmintro.htm www-users.cs.umn.edu/~echi/papers.html www-users.cs.umn.edu/~bentlema/unix/ www-users.cs.umn.edu/~konstan/brs97-gl.html www-users.cs.umn.edu/~heimdahl/csci5802/front-page.htm www-users.cs.umn.edu/~heimdahl/csci5802/heading.htm www-users.cs.umn.edu/~heimdahl/csci5802/nav-bar.htm www-users.cs.umn.edu/~shekhar/5708/ Figure 9: Total Hits based ranking 12
13 Usage Aware PageRank www-users.cs.umn.edu/~mein/blender/ www-users.cs.umn.edu/~karypis/metis/ www-users.cs.umn.edu/ Figure 10: Usage Aware PageRank Low Ranked because course web-pages were not accessed in the month of June, which was considered recent and hence more weight given to pages accessed during that period www-users.cs.umn.edu/~shekhar/5708/ (37 acc. to Page Popularity) www-users.cs.umn.edu/~bentlema/unix/ www-users.cs.umn.edu/~gopalan/courses/5106/ (72 acc. To Page Popularity) www-users.cs.umn.edu/~saad/ www-users.cs.umn.edu/~dyue/wiihist/njmassac/nmintro.htm www-users.cs.umn.edu/~rieck/language.html www-users.cs.umn.edu/~sdier/debian/woody-netinst-test/ ( 3 acc. Page Popularity) www-users.cs.umn.edu/~bentlema/unix/advipc/ipc.html www-users.cs.umn.edu/~mjoshi/hpdmtut/ www-users.cs.umn.edu/~dyue/wiihist/ www-users.cs.umn.edu/~karypis/ www-users.cs.umn.edu/~safonov/brodsky/ The results from the different ranking measures reveals that because PageRank gives more importance to structure and does not include usage statistics, it ranks pages that are well linked high, though they are never used. For example, it ranked all the cisco and jave-help pages really high as they were structurally well-connected. Simple count of total hits is not very useful as the number of hits could be accumulating for a variety of reasons and pages that are used since a long time will tend to get a higher rank. Although simply counting the hits reveals to some extent what the user actually finds useful. Usage Aware PageRank makes use of both the usage statistics and the link structure and in all gives a balanced result in terms of the both the usage and Link structure. As it can been the CS home page is ranked high in Usage Aware PageRank and is ranked below 100 using PageRank. It can be noticed that two of the pages in UPR are course web pages that are linked from the home pages of the professors and have been accessed a lot. So the rank of these pages has been boosted up. But Page Popularity on the other hand gives more weight to the recent history and since these course web-pages were not accessed during the month of June after the semester ended, their rankings were brought down. Thus by weighing the recent history more we can boost the ranks of the pages that are more popular or significant for that time period. 6. Conclusions and Future Directions Clustering web page access patterns over time may help in identifying a concept that is popular during a time period. PageRanks tend to give more importance to structure alone, hence pages that are heavily linked may be ranked higher though not used. Hence the importance is given to the person who 13
14 creates the Web page. Usage Aware Pagerank combines usage statistics with link information giving importance to both the creator and the actual user of a web page. Page popularity gives more weight to recent history and helps in ranking obsolete items lower and boosting up the topics that are more popular during that time period. Certainly, more experiments run over longer time-period data. Also there needs to be a refinement of recent history definition in terms of the time period that is considered recent and the weight to be given to recent history. Another useful thing would be to apply time-based usage weights to link traversals and re-compute usage aware page rank. In general it would be good to come up with time based metrics that would help in ranking Web pages or any Web based properties relevant to the time period. For example, Metric ( t + t) = α Metric( t) + ( 1 α ) Metric( t) where t would be the recent time period and α is the weight assigned to the data gathered from the past. This kind of analysis would also help us not lose information about the data that changed during a period of time. Thus the study the behavior of change in the web content, web structure and web usage over time and their effects on each other would help us understand the way Web is evolving and the necessary steps that can betaken to make it a better source of information. Acknowledgements The initial ideas presented were the result of discussions with Prof. Vipin Kumar and the Scout group at the Department of Computer Science. A special mention must be made of Pang-Ning Tan, who gave valuable comments and suggestions during the project. Uygar Oztekin, provided the ranking results using PageRank and Usage Aware PageRank metrics. This work was partially supported by Army High Performance Computing Research Center contract number DAAD The content of the work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. 14
15 References 1. P. Desikan, J. Srivastava, V. Kumar, P.-N. Tan, Hyperlink Analysis Techniques & Applications, Army High Performance Computing Center Technical Report, O. Etzioni, The World Wide Web: Quagmire or Gold Mine, in Communications of the ACM, 39(11):65=68, R. Cooley, B. Mobasher, J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, in Proceedings of the 9 th IEEE International Conference on Tools With Artificial Intelligence (ICTAI 97), Newport Beach, CA, J. Srivastava, R. Cooley, M. Deshpande and P-N. Tan. Web Usage Mining: Discovery and Applications of usage patterns from Web Data, SIGKDD Explorations, Vol 1, Issue 2, J. Srivastava, P. Desikan and V. Kumar, Web Mining Accomplishments and Future Directions, Invited paper in National Science Foundation Workshop on Next Generation Data Mining, Baltimore, MD, Nov. 1-3, R. Cooley, B. Mobasher, and J.Srivastava. Data Preparation for mining world wide web browsing patterns. Knowledge and Information systems, 1(!) R. Cooley, B. Mobasher, and J. Srivastava. Grouping Web Page References into Transactions for Mining World Wide Web Browsing Patterns, Journal of Knowledge and Information Systems (KAIS), Vol. 1, No. 10, 1999, pp B. Masand and M. Spiliopoulu. WebKDD-99: Workshop on Web Usage Analysis and user profiling. SIGKDD Explorations, 1(2), M. S. Chen, J.S. Park, and P.S. Yu. Data Mining for path traversal patterns in a web environment. In the Proc. of the 16 th International Conference on Distributed Computing Systems, pp , A. Buchner, M. Baumagarten, S. Anand, M.Mulvenna, and J.Hughes. Navigation pattern discovery from internet data. In Proc. of WEBKDD 99, Workshop on Web Usage Analysis and User Profiling, Aug M. Perkowitz and O. Etzioni, Adaptive Web sites: an AI challenge. IJCAI P. Pirolli, J. E. Pitkow, Distribution of Surfer s Path Through the World Wide Web: Empirical Characterization. World Wide Web 1:1-17, R.R. Sarukkai, Link Prediction and Path Analysis using Markov Chains, In the Proc. of the 9 th World Wide Web Conference, Jianhan Zhu, Jun Hong, and John G. Hughes, Using Markov Chains for Link Prediction in Adaptive Web Sites. In Proc. of ACM SIGWEB Hypertext Ed H. Chi, Peter Pirolli, Kim Chen, James Pitkow. Using Information Scent to Model User Information Needs and Actions on the Web. In Proc. of ACM CHI 2001 Conference on Human Factors in Computing Systems, pp ACM Press, April Seattle, WA. 16. I Cadez, D. Heckerman, C. Meek, P. Smyth, S. White, 'Visualization of Navigation Patterns on a Web Site Using Model Based Clustering, Proceedings of the KDD 2000, pp J.Z Huang, M. Ng, W.K Ching, J. Ng, and D. Cheung, A Cube model and cluster analysis for Web Access Sessions, In Proc. of WEBKDD 01, CA, USA, August L. Page, S. Brin, R. Motwani and T. Winograd The PageRank Citation Ranking: Bringing Order to the Web Stanford Digital Library Technologes, January S. Brin, L. Page, The anatomy of a large-scale hyper-textual Web search engine. In the 7 th International World Wide Web Conference, Brisbane, Australia, B.U. Oztekin, L. Ertoz and V. Kumar, Usage Aware PageRank, Submitted to WWW CLUTO, 15
Mining Temporally Evolving Graphs
Mining Temporally Evolving Graphs Prasanna Desikan and Jaideep Srivastava Department of Computer Science University of Minnesota, Minneapolis, MN 55414, U.S.A {desikan,srivastava}@cs.umn.edu Abstract Web
More informationWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationRecent Researches on Web Page Ranking
Recent Researches on Web Page Pradipta Biswas School of Information Technology Indian Institute of Technology Kharagpur, India Importance of Web Page Internet Surfers generally do not bother to go through
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationSupport System- Pioneering approach for Web Data Mining
Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationProximity Prestige using Incremental Iteration in Page Rank Algorithm
Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration
More informationRanking Techniques in Search Engines
Ranking Techniques in Search Engines Rajat Chaudhari M.Tech Scholar Manav Rachna International University, Faridabad Charu Pujara Assistant professor, Dept. of Computer Science Manav Rachna International
More informationReading Time: A Method for Improving the Ranking Scores of Web Pages
Reading Time: A Method for Improving the Ranking Scores of Web Pages Shweta Agarwal Asst. Prof., CS&IT Deptt. MIT, Moradabad, U.P. India Bharat Bhushan Agarwal Asst. Prof., CS&IT Deptt. IFTM, Moradabad,
More informationInternational Journal of Advance Engineering and Research Development. A Review Paper On Various Web Page Ranking Algorithms In Web Mining
Scientific Journal of Impact Factor (SJIF): 4.14 International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Review
More informationSearching the Web [Arasu 01]
Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web
More informationAn Improved Computation of the PageRank Algorithm 1
An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationContext-based Navigational Support in Hypermedia
Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationAnalytical survey of Web Page Rank Algorithm
Analytical survey of Web Page Rank Algorithm Mrs.M.Usha 1, Dr.N.Nagadeepa 2 Research Scholar, Bharathiyar University,Coimbatore 1 Associate Professor, Jairams Arts and Science College, Karur 2 ABSTRACT
More informationA Website Mining Model Centered on User Queries
A Website Mining Model Centered on User Queries Ricardo Baeza-Yates 1, 3, 2 and Barbara Poblete 2, 3 1 ICREA, Barcelona, Catalunya, Spain 2 Center for Web Research, CS Dept., University of Chile 3 Web
More informationEffectively Capturing User Navigation Paths in the Web Using Web Server Logs
Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Amithalal Caldera and Yogesh Deshpande School of Computing and Information Technology, College of Science Technology and Engineering,
More informationData Mining of Web Access Logs Using Classification Techniques
Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,
More informationWord Disambiguation in Web Search
Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,
More informationAn Enhanced Page Ranking Algorithm Based on Weights and Third level Ranking of the Webpages
An Enhanced Page Ranking Algorithm Based on eights and Third level Ranking of the ebpages Prahlad Kumar Sharma* 1, Sanjay Tiwari #2 M.Tech Scholar, Department of C.S.E, A.I.E.T Jaipur Raj.(India) Asst.
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationVisoLink: A User-Centric Social Relationship Mining
VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationSurvey Paper on Web Usage Mining for Web Personalization
ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University
More informationWeighted Page Content Rank for Ordering Web Search Result
Weighted Page Content Rank for Ordering Web Search Result Abstract: POOJA SHARMA B.S. Anangpuria Institute of Technology and Management Faridabad, Haryana, India DEEPAK TYAGI St. Anne Mary Education Society,
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationA Survey on Web Personalization of Web Usage Mining
A Survey on Web Personalization of Web Usage Mining S.Jagan 1, Dr.S.P.Rajagopalan 2 1 Assistant Professor, Department of CSE, T.J. Institute of Technology, Tamilnadu, India 2 Professor, Department of CSE,
More informationCOMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationWEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW
ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer
More informationThe application of Randomized HITS algorithm in the fund trading network
The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationWeb Usage Mining: A Research Area in Web Mining
IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Web Usage Mining: A Research Area in Web Mining Nisha Yadav 1 1 Department of Computer
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationKeywords Web Mining, Web Usage Mining, Web Structure Mining, Web Content Mining.
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Framework to
More informationPre-processing of Web Logs for Mining World Wide Web Browsing Patterns
Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns # Yogish H K #1 Dr. G T Raju *2 Department of Computer Science and Engineering Bharathiar University Coimbatore, 641046, Tamilnadu
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationComparative Study of Web Structure Mining Techniques for Links and Image Search
Comparative Study of Web Structure Mining Techniques for Links and Image Search Rashmi Sharma 1, Kamaljit Kaur 2 1 Student of M.Tech in computer Science and Engineering, Sri Guru Granth Sahib World University,
More informationA Survey on k-means Clustering Algorithm Using Different Ranking Methods in Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 4, April 2013,
More informationSemantic Clickstream Mining
Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationPageRank and related algorithms
PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic
More informationUser Centric Web Page Recommender System Based on User Profile and Geo-Location
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationA Review Paper on Page Ranking Algorithms
A Review Paper on Page Ranking Algorithms Sanjay* and Dharmender Kumar Department of Computer Science and Engineering,Guru Jambheshwar University of Science and Technology. Abstract Page Rank is extensively
More informationA Review Paper on Web Usage Mining and Pattern Discovery
A Review Paper on Web Usage Mining and Pattern Discovery 1 RACHIT ADHVARYU 1 Student M.E CSE, B. H. Gardi Vidyapith, Rajkot, Gujarat, India. ABSTRACT: - Web Technology is evolving very fast and Internet
More informationLog Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal
Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical
More informationChapter 2 BACKGROUND OF WEB MINING
Chapter 2 BACKGROUND OF WEB MINING Overview 2.1. Introduction to Data Mining Data mining is an important and fast developing area in web mining where already a lot of research has been done. Recently,
More informationCHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS
CHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS 48 3.1 Introduction The main aim of Web usage data processing is to extract the knowledge kept in the web log files of a Web server. By using
More informationWeb Mining: A Survey Paper
Web Mining: A Survey Paper K.Amutha 1 Dr.M.Devapriya 2 M.Phil Research Scholoar 1 PG &Research Department of Computer Science Government Arts College (Autonomous), Coimbatore-18. Assistant Professor 2
More informationI. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Enhanced Pre-Processing Research Framework for Web Log Data
More informationAN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM
AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM Masahito Yamamoto, Hidenori Kawamura and Azuma Ohuchi Graduate School of Information Science and Technology, Hokkaido University, Japan
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationSurvey on Web Structure Mining
Survey on Web Structure Mining Hiep T. Nguyen Tri, Nam Hoai Nguyen Department of Electronics and Computer Engineering Chonnam National University Republic of Korea Email: tuanhiep1232@gmail.com Abstract
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationIJMIE Volume 2, Issue 9 ISSN:
WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationLarge-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies
Large-Scale Networks PageRank Dr Vincent Gramoli Lecturer School of Information Technologies Introduction Last week we talked about: - Hubs whose scores depend on the authority of the nodes they point
More informationFarthest First Clustering in Links Reorganization
Farthest First Clustering in Links Reorganization ABSTRACT Deepshree A. Vadeyar 1,Yogish H.K 2 1Department of Computer Science and Engineering, EWIT Bangalore 2Department of Computer Science and Engineering,
More informationAbstract. 1. Introduction
A Visualization System using Data Mining Techniques for Identifying Information Sources on the Web Richard H. Fowler, Tarkan Karadayi, Zhixiang Chen, Xiaodong Meng, Wendy A. L. Fowler Department of Computer
More informationA Dynamic Clustering-Based Markov Model for Web Usage Mining
A Dynamic Clustering-Based Markov Model for Web Usage Mining José Borges School of Engineering, University of Porto, Portugal, jlborges@fe.up.pt Mark Levene Birkbeck, University of London, U.K., mark@dcs.bbk.ac.uk
More informationA Hybrid Page Rank Algorithm: An Efficient Approach
A Hybrid Page Rank Algorithm: An Efficient Approach Madhurdeep Kaur Research Scholar CSE Department RIMT-IET, Mandi Gobindgarh Chanranjit Singh Assistant Professor CSE Department RIMT-IET, Mandi Gobindgarh
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs
More informationLink Structure Analysis
Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationCRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA
CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA An Implementation Amit Chawla 11/M.Tech/01, CSE Department Sat Priya Group of Institutions, Rohtak (Haryana), INDIA anshmahi@gmail.com
More informationBreadth-First Search Crawling Yields High-Quality Pages
Breadth-First Search Crawling Yields High-Quality Pages Marc Najork Compaq Systems Research Center 13 Lytton Avenue Palo Alto, CA 9431, USA marc.najork@compaq.com Janet L. Wiener Compaq Systems Research
More informationLink Analysis. Hongning Wang
Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query
More informationSearch Engines Considered Harmful In Search of an Unbiased Web Ranking
Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/38 Motivation If you are not indexed by
More informationA STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE
A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE Bohar Singh 1, Gursewak Singh 2 1, 2 Computer Science and Application, Govt College Sri Muktsar sahib Abstract The World Wide Web is a popular
More informationWeb Mining Using Cloud Computing Technology
International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain
More informationSearch Engines Considered Harmful In Search of an Unbiased Web Ranking
Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/45 World-Wide Web 10 years ago With Web
More informationImpact of Search Engines on Page Popularity
Impact of Search Engines on Page Popularity Junghoo John Cho (cho@cs.ucla.edu) Sourashis Roy (roys@cs.ucla.edu) University of California, Los Angeles Impact of Search Engines on Page Popularity J. Cho,
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationLecture 17 November 7
CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has
More informationA Probabilistic Validation Algorithm for Web Users Clusters *
A Probabilistic Validation Algorithm for Web Users Clusters * George Pallis, Lefteris Angelis, Athena Vakali Department of Informatics Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com
More informationE-Business s Page Ranking with Ant Colony Algorithm
E-Business s Page Ranking with Ant Colony Algorithm Asst. Prof. Chonawat Srisa-an, Ph.D. Faculty of Information Technology, Rangsit University 52/347 Phaholyothin Rd. Lakok Pathumthani, 12000 chonawat@rangsit.rsu.ac.th,
More informationResearch/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002
Journal) Volume1, Issue3, Nov-Dec, 2014.ISSN: 2349-7173(Online) International Journal of Advanced Research in Technology, Engineering and Science (A Bimonthly Open Access Online. Research/Review Paper:
More informationMining for User Navigation Patterns Based on Page Contents
WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland
More informationANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining
ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila
More informationDiscovering Paths Traversed by Visitors in Web Server Access Logs
Discovering Paths Traversed by Visitors in Web Server Access Logs Alper Tugay Mızrak Department of Computer Engineering Bilkent University 06533 Ankara, TURKEY E-mail: mizrak@cs.bilkent.edu.tr Abstract
More informationExperimental study of Web Page Ranking Algorithms
IOSR IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. II (Mar-pr. 2014), PP 100-106 Experimental study of Web Page Ranking lgorithms Rachna
More informationCollaborative filtering based on a random walk model on a graph
Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:
More informationA P2P-based Incremental Web Ranking Algorithm
A P2P-based Incremental Web Ranking Algorithm Sumalee Sangamuang Pruet Boonma Juggapong Natwichai Computer Engineering Department Faculty of Engineering, Chiang Mai University, Thailand sangamuang.s@gmail.com,
More informationUSER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING
USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING R. Suguna Assistant Professor Department of Computer Science and Engineering Arunai College of Engineering Thiruvannamalai 606
More informationLink Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.
Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation
More informationTitle: Artificial Intelligence: an illustration of one approach.
Name : Salleh Ahshim Student ID: Title: Artificial Intelligence: an illustration of one approach. Introduction This essay will examine how different Web Crawling algorithms and heuristics that are being
More informationBehaviour Recovery and Complicated Pattern Definition in Web Usage Mining
Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.
More informationDivide and Conquer Approach for Efficient PageRank Computation
Divide and Conquer Approach for Efficient agerank Computation rasanna Desikan Dept. of Computer Science University of Minnesota Minneapolis, MN 55455 USA desikan@cs.umn.edu Nishith athak Dept. of Computer
More informationCharacterizing Web Usage Regularities with Information Foraging Agents
Characterizing Web Usage Regularities with Information Foraging Agents Jiming Liu 1, Shiwu Zhang 2 and Jie Yang 2 COMP-03-001 Released Date: February 4, 2003 1 (corresponding author) Department of Computer
More informationA New Technique for Ranking Web Pages and Adwords
A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data
More informationEXTRACTION OF INTERESTING PATTERNS THROUGH ASSOCIATION RULE MINING FOR IMPROVEMENT OF WEBSITE USABILITY
ISTANBUL UNIVERSITY JOURNAL OF ELECTRICAL & ELECTRONICS ENGINEERING YEAR VOLUME NUMBER : 2009 : 9 : 2 (1037-1046) EXTRACTION OF INTERESTING PATTERNS THROUGH ASSOCIATION RULE MINING FOR IMPROVEMENT OF WEBSITE
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More information