SUGGEST : A Web Usage Mining System
|
|
- Jonathan Woods
- 6 years ago
- Views:
Transcription
1 SUGGEST : A Web Usage Mining System Ranieri Baraglia, Paolo Palmerini Ý CNUCE, Istituto del Consiglio Nazionale delle Ricerche (CNR), Pisa, Italy. Ýalso Universitá Ca Foscari, Venezia, Italy (Ranieri.Baraglia, Paolo.Palmerini)@cnuce.cnr.it Abstract During their navigation web users leave many records of their activity. This huge amount of data can be a useful source of knowledge. Sophisticated mining processes are needed for this knowledge to be extracted, understood and used. In this paper we propose a Web Usage Mining (WUM) system, called SUGGEST, designed to efficiently integrate the WUM process with the ordinary web server functionalities. It can provide useful information to make easier the web user navigation and to optimize the web server performance. Two quantities are introduced in order to give a measure of the quality of our WUM system. Keywords: Data mining, Web usage mining, User classification, Web personalization, Adaptive web system. 1 Introduction The problem of knowledge extraction from the huge amount of data left by web users during their navigation is a research task that has increasingly gained attention in the last years. Data can be stored in browser caches or in cookies at client level, and in access log files at server or proxy level. The analysis of such data can be used to understand users preferences and behavior in a process commonly referred to as Web Usage Mining (WUM) [6, 2]. The knowledge extracted can be used for different goals such as service personalization, site structure simplification, and web server performance improvement. In the past, several WUM projects have been proposed [11, 7, 8, 5, 9]. The Analog system [11] is structured according to two main components, performed online and offline with respect to the web server activity. Past users activity recorded in server log files is processed to form clusters of user sessions. The online component builds active user sessions which are then classified into one of the clusters found by the offline component. The classification allows to identify pages related to the ones in the active session and to return the requested page with a list of related documents. Analog was one of the first project of WUM. The geometrical approach used for clustering is affected by several limitations, related to scalability and to the effectiveness of the results found. Nevertheless, the architectural solution introduced was maintained in several other more recent projects. In [8] Perkowitz et al. propose Page Gather, a WUM system that builds index pages containing links to pages similar among themselves. Page Gather finds clusters of pages instead of clusters of sessions. Starting from the user activity sessions, the co-occurrence matrix Å is built. The element Å of Å is defined as the conditional probability that page is visited during a session if page is visited in the same session. A threshold minimum value for Å allows to prune some uninteresting entries. The directed acyclic graph associated with Å is then partitioned finding the graph s cliques. Finally, cliques are merged to originate the clusters. Page Gather main concern is on the index pages creation. There is not an online component of the WUM system, and the static index pages are kept in a separate Suggestion Section of the site. One important concept introduced in [8] is the hypotheses that users behave coherently during their navigation, i.e. pages within the same session are in general conceptually related. This assumption is called visit coherence. We will show in Section 3 how to use this concept to obtain a measure of quality for a WUM system. The WebWatcher system [4] is an interface agent for the World Wide Web. It accompanies a user through the pages by suggesting hyperlinks that it believes will be of interest. The system interacts with the user who can fill predefined forms with keywords to specify his interest. To suggest a hyperlink WebWatcher uses a measure of the hyperlink quality which is interpreted as the probability that a user will select that hyperlink. It is based on both keywords specified by a user, and associated to each hyperlink selected and information coming from the hypertext structure. WebWatcher is implemented as a server and operates much like a proxy. 1
2 In [7], clusters of URLs are found using the Association Rule Hypergraph Partitioning technique. The online component of the system finds the cluster that best matches a fix width sliding window of the current active session, by also taking into account the topology of the site. This component is implemented at a set of CGI scripts that dynamically create customized pages. In this paper we propose SUGGEST a WUM system which is designed to dynamically generate links to pages (suggestions) of potential interest for a user. It was implemented as an extension to the Apache web server. Since the criterium to be used for the validation of results obtained by a WUM system is still an open problem, we introduce and discuss a measure of quality for the SUGGEST system, which could be more generally applied to evaluate other WUM systems. The paper is organized as follows. Section 2 describes the main features of the SUGGEST project. Results of an experimental evaluation are reported in Section 3. Finally, in Section 4 we draw some conclusions along with future directions. 2 The SUGGEST system The main goal of SUGGEST is to find useful information from the user access data collected in web server logs. Such information is then exploited to generate suggestions to a user. Likewise Analog, SUGGEST adopts a two levels architecture, composed by an offline creation of historical knowledge and an online engine that understands users behavior. Moreover, it exploits an algorithm similar to that used in Page Gather for cluster creation, and introduce an effective online component that automatically classifies active user sessions and personalizes on-the-fly the HTML pages requested. The personalization is achieved by means of a set of suggestions dynamically generated on the basis of the active user session. Suggestions for users belonging to the same class may be different. The online component is implemented in such a way that no sort of modification is needed for the local web site (Apache web server), and can be easily extended to proxy servers. After a pre-processing of the data recorded in the web server log files, SUGGEST creates clusters of related pages based on users past activity, and then classifies new users by comparing pages in their active sessions with pages inside the clusters created. A set of suggestions is then obtained for each request. The offline component is performed at fixed time intervals, say once a week or a month depending on the specific characteristics of the web site. In this phase, the access log file is pre-processed and analyzed in order to first produce user sessions, and then to create clusters of pages which can be considered related, according to the users behavior. The offline component is on turn composed by two phases: pre-processing and clustering. During the first one we create user sessions. We begin by removing all the uninteresting entries from the input access log file, supposed to be in Common Log Format. Namely, we remove all the non-html requests, like images or CGI scripts. Also the dumb scans of the entire site coming from robot-like agents are removed. We used the technique described in [10] to model robots behavior. Then we create user sessions by identifying users with their IP address and sessions by means of a predefined timeout between two subsequent requests from the same user. According to Catledge et al. in [1] we fixed a timeout value equal to 30 minutes. The clustering phase finds sets of similar pages starting from the user sessions obtained by the pre-processing phase. We decided to follow the approach proposed in the Page Gather project, but with some modifications. The main difference is in the definition of the co-occurrence matrix Å. We think that the interest in a page depends on its content and not on the order a page is visited during a session. Therefore we adopt a symmetric co-occurrence matrix, and we define Å Æ Ñ Ü Æ Æ µ (1) where Æ is the number of sessions containing both pages and, Æ and Æ are the number of sessions containing only page or, respectively. Dividing by the maximum between single occurrences of the two pages has the effect of reducing the relative importance of index pages. Such pages are very likely to be visited with any other page and nevertheless are of little interest as potential suggestions, since they are too obvious. From the matrix Å we then build the undirected graph whose nodes are the pages and whose edges are the non-null elements of Å. To limit the number of edges in such a graph we apply a threshold filter specified by the parameter MinFreq. Elements of Å whose value is less than MinFreq are too little correlated and thus discarded. In order to find groups of pages strongly correlated, we partition the graph finding its connected components. Pages within the same cluster are ranked according to their occurrence frequency. Moreover, all the clusters with size lower than a threshold value MinClusterSize are discarded because considered not significant. We implemented the online component of SUGGEST as an extension to the Apache web server. Apache provides a mechanism to extend the web server functionalities by means of dynamically loadable modules, that can be used to perform specialized functions, such as custom authentication or dynamic page modification.
3 As requests arrive at the server they are recorded in a buffer of active sessions. Each session is identified on the basis of the client IP address and is associated with a timestamp that permits us to determine when a session is closed. In order to classify an active session, we look for the cluster that includes the larger number of pages in that session. Found the cluster, we need to determine the pages that will constitute the suggestions. The final suggestions are composed by a static and a dynamic part. The first one is given by the most relevant pages in that cluster, according to the order determined in the offline phase. The dynamic part of the suggestions is obtained from the pages in the same cluster that are more strictly related to those in the session that determined the classification. This relation is based on the values stored in the matrix Å. The static and dynamic suggestions are ranked together and returned as a set of interesting pages. It is worth noticing that this is a new feature introduced by our system. By means of the dynamic technique we just described, we allow users belonging to the same class to have different sets of suggestions, depending on the pages visited in their active session. The suggestions are implemented by inserting, a list of links to the pages found, at the end of the page requested. Other modifications can be applied as necessary (i.e. a personalized banner). Table 1. Dataset used in the experiments. Dataset Size Records Period (MBytes) (thousands) (days) Berkeley ¾¼¼ ½ ½ ¾¾ NASA ¾¼¼ ½ ¾ USASK ¾¼¼ ¾¼ ¼ ½ ¼ session is about 3 pages. Since for this value we still have almost half of all the sessions, we choose this value as the minimum length for an active session to be classified. All evaluation tests were run on a dual processor SMP 800 MHz Pentium III PC with 512 MBytes of RAM, two SCSI disks for 27 GBytes of total capacity, operating system Linux Experimental evaluation Measuring the performances of recommendation systems poses more than one problem. It is difficult to characterize the quality of the suggestions obtained and to quantify how useful the system is. We therefore study how our system behaves when varying its parameters and introduce two measures of suggestion quality: the coherence of suggestions and their overlapping with user real behavior. We also study if the SUGGEST system can be used to improve the web server performance, by guessing which page a user is more likely to request and accordingly prefetching it. The SUGGEST experimental evaluation was conduced using three access log files of public domain ½ : Berkeley, NASA, USASK, produced by the web servers of the Computer Science Department of Berkeley University, Saskatchewan University and Kennedy Center Space Center, respectively. Data are stored according to the Common Log Format. The characteristics of the datasets we used are given in Table 1. As shown in Figure 1 the percentage of the sessions formed by a predefined number of pages quickly decreases when the minimum number of pages in a session increases. Moreover, for all the datasets the average length of an user ½ Figure 1. Minimum number of pages in a session. Figures 2 and 3 show the number of clusters and the percentage of pages not clustered as function of the MinFreq parameter. Figure 2. Number of clusters found. The number of clusters increases up to MinFreq=0.5.
4 This is due to the fact that deleting entries from Å, we obtain a less connected graph. When the graph becomes highly disconnected (MinFreq ¼ ), the clusters found are smaller than the MinClusterSize threshold and are thus discarded. Therefore the total number of clusters found does not increase. which measures the fraction of pages of that belongs to the representative cluster for that session. Ô ¾ Ë Ô ¾ Æ (2) where Ô is a page, Ë is the -th session, is the cluster representing, and Æ is the number of pages in the -th session. The average value for over all Æ Ë sessions contained inside the dataset partition treated is given by: È ÆË ½ Æ Ë (3) Figure 4 plots as a function of MinFreq, in percent. For small values of MinFreq almost all pages in every session belong to the same cluster. This can be considered an experimental confirmation of the visit coherence hypothesis. In this case due to large number of pages in the cluster we limit the number of suggestions to those with higher rank. Figure 3. Percentage of outliers. Similar reasoning can be used to describe the behavior of the number of outliers, i.e. the number of pages that do not belong to any cluster and will therefore not contribute to the on-line classification. From Figure 3 we can observe that as MinFreq increases the percentage of outliers also grows. It is worth noticing that the three different datasets show qualitatively similar behavior. Once the clusters are created we are interested in determining if the pages in a cluster are actually somehow related among themselves, or not. In order to evaluate the cluster quality some techniques were introduced in previous works. Fu et al. [3] verify if pages in a cluster are related to the same topic, assuming a priori knowledge of their contents. In Analog [11] is verified if pages in a cluster are linked according to the web site structure. Due to our lack of knowledge about both the content and the general structure of the site that has produced the datasets, to evaluate the quality of the clusters produced by the offline phase we used the visit coherence index that allows to quantify a session intrinsic coherence. It measures the percentage of pages inside a user session which belong to the cluster representing the session considered. As in the Page Gather system, the basic assumption here is that the coherence hypotheses holds for every session. To evaluate the visit coherence, we split the datasets obtained from the pre-processing phase into two halves, apply the clustering on one half and measure if the suggestions generated on the basis of the second half still maintain the expected coherence. To verify if the coherence hypothesis holds for every session in the second half of the dataset, we define a quantity Figure 4. Coherence of visit. To measure the quality of the suggestions generated during the online phase we used a technique similar to that used to evaluate the cluster quality. Sessions found in one half of the dataset are submitted to the online module to classify them and to generate suggestions. Then the fraction of pages belonging both to a session and to the corresponding set of suggestions ËÙ Ø is computed by using the expression 4, for every session. Ô ¾ Ë Ô ¾ ËÙ Ø Æ (4) The average value of is obtained by summing over all the sessions: È ÆË Å ½ (5) Æ Ë where Æ Ë is the half of the total number of sessions.
5 Figure 5 shows in percentage the overlapping between the generated suggestions and the session pages. For small values of MinFreq we can say that the SUGGEST system is able to correctly identify users behavior. When MinFreq increases the number of outliers increases too, and consequently decreases the number of suggestions generated. server request, and not only a disk access. The SUGGEST system can by applied also to proxy servers, without any change, since the Apache web server can also run in proxy mode. We tested the overheads introduced by this first implementation of the SUGGEST system using the ab ¾ benchmarking tool. In Figure 7 we plotted the execution time for a single Apache process to satisfy an HTTP request. We vary the degree of concurrency, by submitting an increasing number of requests to the server. The two lines refer to standard Apache, and to Apache using the SUGGEST system. As the number of concurrent requests increases, the SUG- GEST performance degrade proportionally, due to the mutual exclusive access to shared memory areas by the apache processes. Some optimizations to overcame this limitation are still ongoing. Figure 5. Quality of suggestions. This conclusion leads to the possibility of using SUG- GEST also to optimize the web server performance. If the web server can forecast which pages a user is more likely to visit in the next requests, it can prefetch them in order to have the pages already available in memory when the request arrive. For this purpose we measure the average number of times that, given a sub-session of length, the page ½ in the session is included in the suggestions generated by SUGGEST. This fraction is called «and its behavior as a function of MinFreq is plotted in Figure 6. Figure 7. Total execution time. 4 Conclusions and future work Figure 6. Prefetching quality. For small values of MinFreq we can correctly guess the next page a user is going to request during navigation, with a probability up to 70%. Prefetching can be more effectively applied to proxy servers, where the latency we want to hide is due to a remote In this paper we have studied the problem of the realization of a Web Usage Mining system. We proposed SUGGEST, a system that classifies requests made to a web server, by analyzing past users navigation behavior. SUGGEST gathers different features of previously proposed WUM systems. The layered architecture of SUG- GEST can be used as a reference for further improvements of clustering and classification algorithms. A novel contribution of this work is the introduction of quantities that can be used to evaluate the quality of the suggestions found. Moreover, the original technique adopted for suggestions generation permits a more dynamic personalization with respect to previous systems. We are currently working on several modification and general improvements of the system: (a) an experimental evaluation of SUGGEST on a real world production web ¾
6 server is needed in order to observe how user navigation can be influenced by the presence of suggestions; (b) apply the system to a proxy server, for example, by modifying the cache replacement policies; (c) unify the offline and the online components in one single online module where clusters are hierarchically built and updated as soon as new requests arrive. 5 Acknowledge This research was partially supported by the Fondazione Cassa di Risparmio di Pisa within the project WebDigger: a data mining environment for the web data analysis. References [1] L. D. Catledge and J. E. Pitkow. Characterizing browsing stategies in the world-wide web. Computer Networks and ISDN Systems, 27, [2] O. Etzioni. The world wide web: quagmire or gold mine? Communications of the ACM, 39:65 68, november [3] Y. Fu, K. Sandhu, and M.-Y. Shih. Clustering of web users based on access patterns. In KDD 99 Workshop on Web Usage Analysis and User Profiling WEBKDD 99, August [4] T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. Fifteenth International Joint Conference on Artificial Intelligence, [5] T. Kamdar and A. Joshi. On creating adaptive web servers using weblog mining. Technical Report Tr-CS-00-05, Department of Computer Science and Electrical Engineering. University of Maryland, Baltimore County, November [6] R. Kosala and H. Blockeel. Web mining research: a survey. In ACM SIGKDD, pages 1 15, july [7] B. Mobasher, R. Cooley, and J. Scrivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8): , august [8] M. Perkowitz and O. Etzioni. Adaptive web sites: Conceptual cluster minin. In International Joint Conference on Artificial Intelligence, pages , [9] D. F. T. Joachims. Webwatcher: A tour guide for the world wide web. In Proceedings of IJCAI97, [10] P.-N. Tan and V. Kumar. Modeling of web robot navigational patterns. In WEBKDD 2000 Worskhop on Web Mining for E-Commerce Challenges and Opportunities, August [11] T. W. Yan, M. Jacobsen, H. Garcia-Molina, and D. Umeshwar. From user access patterns to dynamic hypertext linking. Fifth International World Wide Web Conference, May 1996.
On-line Generation of Suggestions for Web Users
On-line Generation of Suggestions for Web Users Fabrizio Silvestri Istituto ISTI - CNR Pisa Italy Ranieri Baraglia Istituto ISTI - CNR Pisa Italy Paolo Palmerini Istituto ISTI - CNR Pisa - Italy {fabrizio.silvestri,ranieri.baraglia,paolo.palmerini}@isti.cnr.it
More informationAmir Masoud Rahmani Department of Computer Engineering, Science and Research Branch, Islamic Azad University (IAU),Tehran, Iran.
(IJCSIS) International Journal of Computer Science and Information Security, A New Clustering Approach based on Page's Path Similarity for Navigation Patterns Mining Heidar Mamosian Department of Computer
More informationA Privacy Preserving Web Recommender System
A Privacy Preserving Web Recommender System Ranieri Baraglia Massimo Serrano Claudio Lucchese Universita Ca Foscari Venezia, Italy Salvatore Orlando Universita Ca Foscari Venezia, Italy Fabrizio Silvestri
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationRecommendation Models for User Accesses to Web Pages (Invited Paper)
Recommendation Models for User Accesses to Web Pages (Invited Paper) Ṣule Gündüz 1 and M. Tamer Özsu2 1 Department of Computer Science, Istanbul Technical University Istanbul, Turkey, 34390 gunduz@cs.itu.edu.tr
More informationPreserving Privacy in Web Recommender Systems
Preserving Privacy in Web Recommender Systems R. Baraglia 1, C. Lucchese 1, S. Orlando 2,1, R. Perego 1, F. Silvestri 1 1 HPC Lab, ISTI-CNR, Pisa, Italy 2 Dept. of Computer Science, Ca Foscari Univ., Venice,
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationWeb Usage Mining: A Research Area in Web Mining
IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Web Usage Mining: A Research Area in Web Mining Nisha Yadav 1 1 Department of Computer
More informationEffectively Capturing User Navigation Paths in the Web Using Web Server Logs
Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Amithalal Caldera and Yogesh Deshpande School of Computing and Information Technology, College of Science Technology and Engineering,
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationAutomated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationConstruction of Web Community Directories by Mining Usage Data
Construction of Web Community Directories by Mining Usage Data Dimitrios Pierrakos 1, Georgios Paliouras 1, Christos Papatheodorou 2, Vangelis Karkaletsis 1, Marios Dikaiakos 3 1 Institute of Informatics
More informationMining for User Navigation Patterns Based on Page Contents
WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland
More informationWeb Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher
Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationMining fuzzy association rules for web access case adaptation
Mining fuzzy association rules for web access case adaptation Cody Wong, Simon Shiu Department of Computing Hong Kong Polytechnic University Hung Hom, Kowloon Hong Kong, China {cskpwong; csckshiu}@comp.polyu.edu.hk
More informationWEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM
WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM K.Dharmarajan 1, Dr.M.A.Dorairangaswamy 2 1 Scholar Research and Development Centre Bharathiar University
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationA Review Paper on Web Usage Mining and Pattern Discovery
A Review Paper on Web Usage Mining and Pattern Discovery 1 RACHIT ADHVARYU 1 Student M.E CSE, B. H. Gardi Vidyapith, Rajkot, Gujarat, India. ABSTRACT: - Web Technology is evolving very fast and Internet
More informationContext-based Navigational Support in Hypermedia
Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
More informationData Mining of Web Access Logs Using Classification Techniques
Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,
More informationGraph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy
Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific
More informationLog Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal
Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical
More informationSurvey Paper on Web Usage Mining for Web Personalization
ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University
More informationAssociating Terms with Text Categories
Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science
More informationPre-processing of Web Logs for Mining World Wide Web Browsing Patterns
Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns # Yogish H K #1 Dr. G T Raju *2 Department of Computer Science and Engineering Bharathiar University Coimbatore, 641046, Tamilnadu
More informationAutomatic Identification of User Goals in Web Search [WWW 05]
Automatic Identification of User Goals in Web Search [WWW 05] UichinLee @ UCLA ZhenyuLiu @ UCLA JunghooCho @ UCLA Presenter: Emiran Curtmola@ UC San Diego CSE 291 4/29/2008 Need to improve the quality
More informationOracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper December 2011
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide An Oracle White Paper December 2011 Disclaimer The following is intended to outline our general product direction.
More informationStudy on Personalized Recommendation Model of Internet Advertisement
Study on Personalized Recommendation Model of Internet Advertisement Ning Zhou, Yongyue Chen and Huiping Zhang Center for Studies of Information Resources, Wuhan University, Wuhan 430072 chenyongyue@hotmail.com
More informationA Framework for Personal Web Usage Mining
A Framework for Personal Web Usage Mining Yongjian Fu Ming-Yi Shih Department of Computer Science Department of Computer Science University of Missouri-Rolla University of Missouri-Rolla Rolla, MO 65409-0350
More informationIdentification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining
Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining The web log file gives a detailed account of who accessed the web site, what pages were requested, and in
More informationWeb Service Usage Mining: Mining For Executable Sequences
7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI
More informationAutomated Clustering-Based Workload Characterization
Automated Clustering-Based Worload Characterization Odysseas I. Pentaalos Daniel A. MenascŽ Yelena Yesha Code 930.5 Dept. of CS Dept. of EE and CS NASA GSFC Greenbelt MD 2077 George Mason University Fairfax
More informationInternational Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 02, February -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Survey
More informationA Survey on Web Personalization of Web Usage Mining
A Survey on Web Personalization of Web Usage Mining S.Jagan 1, Dr.S.P.Rajagopalan 2 1 Assistant Professor, Department of CSE, T.J. Institute of Technology, Tamilnadu, India 2 Professor, Department of CSE,
More informationWeb Mining for Web Personalization
Web Mining for Web Personalization 1 Prof. Jharana Paikaray, 2 Prof.Santosh Kumar Rath, 3 Prof.Smaranika Mohapatra Department of Computer Science & Engineering Gandhi Institute for Education & Technology,
More informationScan Scheduling Specification and Analysis
Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract
More informationCharacterizing Web Usage Regularities with Information Foraging Agents
Characterizing Web Usage Regularities with Information Foraging Agents Jiming Liu 1, Shiwu Zhang 2 and Jie Yang 2 COMP-03-001 Released Date: February 4, 2003 1 (corresponding author) Department of Computer
More informationWeb Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono
Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management
More informationWeb Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono
Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management
More informationarxiv:cs/ v1 [cs.ir] 21 Jul 2004
DESIGN OF A PARALLEL AND DISTRIBUTED WEB SEARCH ENGINE arxiv:cs/0407053v1 [cs.ir] 21 Jul 2004 S. ORLANDO, R. PEREGO, F. SILVESTRI Dipartimento di Informatica, Universita Ca Foscari, Venezia, Italy Istituto
More informationARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining
ARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining Yahya AlMurtadha, MD. Nasir Bin Sulaiman, Norwati Mustapha, Nur Izura Udzir and Zaiton Muda University Putra Malaysia,
More informationAdaptive and Resource-Aware Mining of Frequent Sets
Adaptive and Resource-Aware Mining of Frequent Sets S. Orlando, P. Palmerini,2, R. Perego 2, F. Silvestri 2,3 Dipartimento di Informatica, Università Ca Foscari, Venezia, Italy 2 Istituto ISTI, Consiglio
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationKnowledge Discovery from Web Usage Data: Complete Preprocessing Methodology
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 179 Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology G T Raju 1 and P S Satyanarayana
More informationCharacterizing Home Pages 1
Characterizing Home Pages 1 Xubin He and Qing Yang Dept. of Electrical and Computer Engineering University of Rhode Island Kingston, RI 881, USA Abstract Home pages are very important for any successful
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,
More informationImproving the prediction of next page request by a web user using Page Rank algorithm
Improving the prediction of next page request by a web user using Page Rank algorithm Claudia Elena Dinucă, Dumitru Ciobanu Faculty of Economics and Business Administration Cybernetics and statistics University
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationDIGIT.B4 Big Data PoC
DIGIT.B4 Big Data PoC DIGIT 01 Social Media D02.01 PoC Requirements Table of contents 1 Introduction... 5 1.1 Context... 5 1.2 Objective... 5 2 Data SOURCES... 6 2.1 Data sources... 6 2.2 Data fields...
More informationNitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION
WEB PERSONALIZATION Mrs. M.Kiruthika 1, Nitin Cyriac 2, Aditya Mandhare 3, Soniya Nemade 4 DEPARTMENT OF COMPUTER ENGINEERING Fr. CONCEICAO RODRIGUES INSTITUTE OF TECHNOLOGY,VASHI Email- 1 venkatr20032002@gmail.com,
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationI. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Enhanced Pre-Processing Research Framework for Web Log Data
More informationAn Efficient Parallel and Distributed Algorithm for Counting Frequent Sets
An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets S. Orlando 1, P. Palmerini 1,2, R. Perego 2, F. Silvestri 2,3 1 Dipartimento di Informatica, Università Ca Foscari, Venezia, Italy
More informationIndexing in Search Engines based on Pipelining Architecture using Single Link HAC
Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily
More informationAn enhanced similarity measure for utilizing site structure in web personalization systems
University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 An enhanced similarity measure for utilizing site structure in web personalization
More informationEvaluation of Long-Held HTTP Polling for PHP/MySQL Architecture
Evaluation of Long-Held HTTP Polling for PHP/MySQL Architecture David Cutting University of East Anglia Purplepixie Systems David.Cutting@uea.ac.uk dcutting@purplepixie.org Abstract. When a web client
More informationSemantic Clickstream Mining
Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti
More informationA Framework for Self Adaptive Websites: Tactical versus Strategic Changes
A Framework for Self Adaptive Websites: Tactical versus Strategic Changes Filip.Coenen, Gilbert.Swinnen, Koen.Vanhoof, Geert.Wets@luc.ac.be Limburg University Centre, Faculty of Applied Economic Sciences,
More informationTemplate Extraction from Heterogeneous Web Pages
Template Extraction from Heterogeneous Web Pages 1 Mrs. Harshal H. Kulkarni, 2 Mrs. Manasi k. Kulkarni Asst. Professor, Pune University, (PESMCOE, Pune), Pune, India Abstract: Templates are used by many
More informationSemi-Supervised PCA-based Face Recognition Using Self-Training
Semi-Supervised PCA-based Face Recognition Using Self-Training Fabio Roli and Gian Luca Marcialis Dept. of Electrical and Electronic Engineering, University of Cagliari Piazza d Armi, 09123 Cagliari, Italy
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationA User Preference Based Search Engine
A User Preference Based Search Engine 1 Dondeti Swedhan, 2 L.N.B. Srinivas 1 M-Tech, 2 M-Tech 1 Department of Information Technology, 1 SRM University Kattankulathur, Chennai, India Abstract - In this
More informationSupport System- Pioneering approach for Web Data Mining
Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT
More informationDiscovering Paths Traversed by Visitors in Web Server Access Logs
Discovering Paths Traversed by Visitors in Web Server Access Logs Alper Tugay Mızrak Department of Computer Engineering Bilkent University 06533 Ankara, TURKEY E-mail: mizrak@cs.bilkent.edu.tr Abstract
More informationClaude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique
Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationVisoLink: A User-Centric Social Relationship Mining
VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.
More informationThe Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce
More informationSelection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3
Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationWeb Mining Using Cloud Computing Technology
International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain
More informationDaniel A. Menascé, Ph. D. Dept. of Computer Science George Mason University
Daniel A. Menascé, Ph. D. Dept. of Computer Science George Mason University menasce@cs.gmu.edu www.cs.gmu.edu/faculty/menasce.html D. Menascé. All Rights Reserved. 1 Benchmark System Under Test (SUT) SPEC
More informationEvent List Management In Distributed Simulation
Event List Management In Distributed Simulation Jörgen Dahl ½, Malolan Chetlur ¾, and Philip A Wilsey ½ ½ Experimental Computing Laboratory, Dept of ECECS, PO Box 20030, Cincinnati, OH 522 0030, philipwilsey@ieeeorg
More informationAN ALGORITHMIC APPROACH TO DATA PREPROCESSING IN WEB USAGE MINING
International Journal of Information Technology and Knowledge Management July-December 2010, Volume 2, No. 2, pp. 279-283 AN ALGORITHMIC APPROACH TO DATA PREPROCESSING IN WEB USAGE MINING Navin Kumar Tyagi
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationWeb Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono
Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationFarthest First Clustering in Links Reorganization
Farthest First Clustering in Links Reorganization ABSTRACT Deepshree A. Vadeyar 1,Yogish H.K 2 1Department of Computer Science and Engineering, EWIT Bangalore 2Department of Computer Science and Engineering,
More informationWorkload Characterization Techniques
Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationHierarchical Clustering of Process Schemas
Hierarchical Clustering of Process Schemas Claudia Diamantini, Domenico Potena Dipartimento di Ingegneria Informatica, Gestionale e dell'automazione M. Panti, Università Politecnica delle Marche - via
More informationChapter 3 Process of Web Usage Mining
Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge
More informationSemantic Web Mining and its application in Human Resource Management
International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2
More informationWeb Log Data Cleaning For Enhancing Mining Process
Web Log Data Cleaning For Enhancing Mining Process V.CHITRAA*, Dr.ANTONY SELVADOSS THANAMANI** *(Assistant Professor, CMS College of Science and Commerce **(Reader in Computer Science, NGM College (AUTONOMOUS),
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationWeb Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web
Web Usage Mining Overview Session 1 This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web 1 Outline 1. Introduction 2. Preprocessing 3. Analysis 2 Example
More informationA SURVEY- WEB MINING TOOLS AND TECHNIQUE
International Journal of Latest Trends in Engineering and Technology Vol.(7)Issue(4), pp.212-217 DOI: http://dx.doi.org/10.21172/1.74.028 e-issn:2278-621x A SURVEY- WEB MINING TOOLS AND TECHNIQUE Prof.
More informationUSER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING
USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING R. Suguna Assistant Professor Department of Computer Science and Engineering Arunai College of Engineering Thiruvannamalai 606
More informationAssociation Rule Mining among web pages for Discovering Usage Patterns in Web Log Data L.Mohan 1
Volume 4, No. 5, May 2013 (Special Issue) International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info Association Rule Mining among web pages for Discovering
More informationIJITKMSpecial Issue (ICFTEM-2014) May 2014 pp (ISSN )
A Review Paper on Web Usage Mining and future request prediction Priyanka Bhart 1, Dr.SonaMalhotra 2 1 M.Tech., CSE Department, U.I.E.T. Kurukshetra University, Kurukshetra, India 2 HOD, CSE Department,
More informationWeb Mining. Tutorial June 21, Pisa KDD Laboratory: CNR-CNUCE Dipartimento di Informatica Università di Pisa
Web Mining Tutorial June 21, 2002 Pisa KDD Laboratory: CNR-CNUCE Dipartimento di Informatica Università di Pisa Table of Content Introduction The KDD cycle for the web Preprocessing Data mining tasks for
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More information