Enhanced Web Log Based Recommendation by Personalized Retrieval

Size: px
Start display at page:

Download "Enhanced Web Log Based Recommendation by Personalized Retrieval"

Transcription

1 Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor of Philosophy February 2015

2

3 CERTIFICATE OF AUTHORSHIP/ORIGINALITY I certify that the work in this thesis has not previously been submitted for a degree nor has it been submitted as part of requirements for a degree except as fully acknowledged within the text. I also certify that the thesis has been written by me. Any help that I have received in my research work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis. Signature of Student

4

5 Acknowledgements First I would like to thank my supervisors. Prof Chengqi Zhang, Prof Zhendong Niu and Dr Ling Chen who introduced me to the wonderful world of research. They not only gave me invaluable academic advice, but also helped my transition into a different culture. Prof Chengqi Zhang has been a great mentor and collaborator, being both energetic and full of ideas. Prof Zhendong Niu, who works in the School of Computer Science, Beijing Institute of Technology, guided me in terms of information retrieval and web log mining. Dr Ling Chen helped me by asking insightful questions, and giving me thoughtful comments on the thesis. I enjoyed working with them, and benefited enormously from my interactions with them. I spent nearly two and a half years in the University of Technology, Sydney. I thank the collaborators, faculty staff, fellow students and friends in the QCIS centre, who made my graduate life a very memorable experience. In particular, I thank you if you are reading this thesis. I thank my family. My parents richly endowed me with curiosity about the natural world. Last but not least, my deep gratitude is extended to my dear wife Suling Niu who brings me so much love and happiness. It is no exaggeration to say that she helped to make my thesis writing an enjoyable endeavor.

6

7 Contents Contents vi List of Figures xii List of Tables xiv 1 Introduction Backgroud Research Questions Research Objectives Significance and Main Contributions Research Methodology Thesis Structure Publications Related to This Thesis Literature Review Web Log Mining Technologies Data Collection vi

8 CONTENTS Data Preprocessing Mining Algorithms Pattern Analysis Research and Applications Challenges Personalized Retrieval Query Expansion Result Processing Challenges Recommender System Collaborative Filtering Content-Based Approach Hybrid Approach Challenges Summary Query Suggestion Model Based on The Query Semantics and Click-through Data Introduction Related Works Query Suggestion Bipartite Graph The Proposed Method Query Semantics for Document-based Method Query-URL Bipartite Graph for Log-based Method vii

9 CONTENTS Construction of Query-URL Matrices Matrix Factorisation and Query Relevance Computation Integrate Multiple Suggestion Models Experiments Data Set Evaluation Metrics Comparison of Query Suggestion Results Evaluation of Suggestion Results Summary Collaborative Filtering Retrieval Model Based on Local and Global Features Introduction Related Works Personalized Web Search Filtering Algorithms Click-through Data Proposed Approach User Profile Sequence Score Web Page Rating Preference Score User-Based Collaborative Filtering Personalized Search Model Experiments Data Sets viii

10 CONTENTS Evaluation Metrics Ranking Methods Compared Users Evaluation Impact of Parameters Personalized Search Performance Summary Web Search Recommendation Based on the Retrieval Sequence and the Browsing Features Introduction Related Works User Information Collection Web Search Recommendation The Proposed Method Definition of Terms User Modeling Recommendation of Resource Resource Retrieval Resource Filtering Experiments and Analysis Data Set The Standard of Evaluation Results of Experiment Summary ix

11 CONTENTS 6 Recommendation Based on User Interests Association Findings Introduction Related Works Association Rule Mining Maximal Frequent Itemsets The Proposed Method Basic Concepts and Assumptions Basic Concepts Basic Assumptions Resources Collection and Description Resources Collection Resources Description User Profile Association Algorithm User Modelling Recommendation of Resources Experiments and Analysis Experiment Data Set Evaluation Metrics Analysis of Experiment Results Summary Conclusions and Future Research Conclusions Future Research x

12 CONTENTS References 93 xi

13 List of Figures 1.1 Relationship between chapters High level Web log mining process An example of click-through bipartite MAP comparisons Sequence similarity between two models Impact of the parameter α Impact of the parameter β Comparison between two models The comparison of precision on four classes Average precision between two models Minimum support and number of the transactions The comparison of precision on four classes Comparison between two models xii

14

15 List of Tables 3.1 Samples of search engine click-through data Examples of QCQS query suggestion results Accuracy comparisons User profile Information format of click-through data Precision comparisons Web access log Format of user profile xiv

16

17 Abstract With the rapid development of the Internet and WWW, it is more and more important for people to access quality web information. Thus the problem of enabling users to quickly and accurately find information has become an urgent issue. As one of the basic ways to solve this problem, personalized information services have been focusing on fulfilling the personalized information requirements of different users based on their actual demands, preference characteristics, behaviour patterns, etc. This thesis focuses on enhancing web log based recommendation by personalized retrieval, and its main works and innovations include: For personalized retrieval, the thesis proposes two models to improve user experience and optimize search performance. The first is a query suggestion model based on query semantics and click-through data. This model calculates the subject relevance between queries, and then combines the semantic information and the relevance of the query-click matrix model as this can effectively eliminate the ambiguity and input errors of reminder queries. The second is a collaborative filtering retrieval model based on local and global features. By the integration of the local and global characteristics of the accessed information, this model overcomes the limitations of a single feature, and increases the degree of application of the retrieval model. For recommendation by personalized retrieval, we propose two recommendation models based on the web log. The first is based on the user s atomic

18 retrieval transaction sequence and the browse characteristics. This model decomposes search transactions, and calculates the user s degree of interest on the search term, which allows users to query information more clearly. Further, it incorporates the user feedback on the search results evaluation value, which overcomes the shortcomings of the model based on content filtering. The second model is based on user interests association findings, which can be used to: find the relationship between resources accessed by users, extract the associations of user interests, and address the problem of user interests isolation.

19 Chapter 1 Introduction 1.1 Backgroud With the constantly increasing size of resources available on the Internet, the Internet has effectively become a world s largest and most extensive resource repository. However, most Web structures are large and complicated and users often miss the goal of their inquiry, or receive ambiguous results when they try to navigate through them [Eirinaki and Vazirgiannis, 2003]. We list several common issues when users try to obtain information from the Internet. (1) Difficult to find the desired information Though the Web information is distributed, dynamic, multi-structured, and stored on various sites around the world, no one is responsible for the validity and orderliness of these information. So how to quickly and accurately find desired information from the huge resource repository becomes increasingly important for Web users. With the continuous efforts of many research institutes and commercial companies, Web users can obtain information using classified portal web sites and search engines, and this helps to address the issue to a certain extent. However, this is not always desirable because of the low precision and recall in the returned 1

20 results from portal sites and search engines. (2) Difficult to obtain the deep knowledge and patterns behind Web information Web Data contains huge, useful and often profound knowledge and patterns, which can be difficult to discover and directly exploit. In e-business, What relevance is there between items bought at the same time? What differences exist among different users shopping behaviours? How can users purchase or browse preferences be harnessed to recommend or promote products? In terms of information retrieval, how many times do query strings appear and how long do they have? What pattern exists in the page-turns of query results? What rules are there in the URL-click of query results? It is extremely useful to optimise recommendation or information retrieval algorithms through exploiting the discovered knowledge and patterns. (3) Lack of individual information services Because the information needs of users are various, specific, limited, and the information resources on the Web are infinitely dispersed, there exists a contradiction between the specific information needs of users and the infinite information resources on networks. Accordingly, it is necessary to provide personalized services to meet the needs of specific users. In addition, traditional search engines cannot meet the needs of information services. On one hand, the same query results are returned when different users input the same query string into the traditional search engines, which disregard users preferences. On the other hand, due to dynamic web information, users have to continuously search the same query if they want to obtain the latest information from the Internet. Consequently, it is a hot research topic to provide personalized recommendation and information retrieval for users with different backgrounds and preferences. 2

21 1.2 Research Questions This thesis mainly focuses on three kinds of research questions, which are related to query suggestion for information retrieval, personalized retrieval, and information recommendation. Q1: How to provide query suggestions based on the query semantics and clickthrough data? To address the query suggestion issue, we consider to design approaches from two perspectives: (1) how to borrow the word s concept of the Knowledge Network, which is categorized according to a document based method; (2) how to effectively use query logs, which are categorized according to log-based methods. Based on these two perspectives, the research problems are described as follows: Document based method. It is difficult to find relative documents because the query string is short and sparse text. We mainly address the issue by word frequency and domain knowledge (the word s concept of Knowledge Network). Log based method. Give query logs, and learn to exploit them to improve the performance of query suggestion. Based on the Query-URL bipartite graph methods, we obtain the query similarity matrix to contribute to the performance of query suggestion. Hybrid method. After addressing the problems above, we need to consider how to integrate these approaches to pursue further improvement. Q2: How to optimise the information retrieval algorithm based on collaborative filtering of local and global features? The traditional search engines cannot meet the users personalized information needs because they disregard the users preferences. The different preferences and behaviours of users reflect their different information demands. How to return different query results when different 3

22 users input the same query string is a challenge to traditional search engines. By analyzing and studying users logs, we propose a collaborative filtering retrieval model based on local and global features, which consider the local and global characteristics of the accessed information, and treat the two types of characteristics with different methods. This model overcomes the limitations of a single feature, and increases the degree of the retrieval model s application. Q3: How to recommend information based on search behaviours and browsing features? The information needs of users are ultimately reflected by browsing specific pages and examining the behaviours associated with browsing them. There are many kinds of web resources, such as page contents, page linkages, web logs, and so on, which can be analyzed to discover users preferences and build users profiles. So which types of information can we collect to reflect users interests and how do we find user s preferences and then recommend potential interesting resources to them? To answer these questions, we propose two recommendation models based on search behaviours, browsing features and associated user interests. 1.3 Research Objectives Our research aims to develop innovative solutions to improve the performance of web log based recommendation by personalized retrieval. Several major research objectives (RO) which aim to answer the relevant research questions are discussed below: RO1: To propose a query suggestion model based on the query semantics and click-through data (aims to answer Q1); In order to tackle the deficit in effective semantic processes, this paper proposes a query suggestion model based on query semantics and click-through data. The model combines the click-stream data matrix model and query semantic information. By word frequency informa- 4

23 tion and the word s concept of Knowledge Network (HowNet), the model calculates the subject relevance between queries, and then combines the semantic information and the relevance of the query-click matrix model. RO2: To propose a collaborative filtering retrieval model based on local and global features (aims to answer Q2); In order to improve the performance of personalized information retrieval, we propose a collaborative filtering retrieval model based on local and global features. This considers the local and global characteristics of the accessed information, and treats the two types of characteristics with different methods. The normal process of local features is to use the user s click-stream logs stored in the history of accessed information, analyzing the retrieval session between accessed sequences of resources, and build the user s interest function. The process of global features is to use the global user s evaluation of information resources, to build the global user function of resource interests. By the integration of two types of characteristic information, the model overcomes the limitations of a single feature, and increases the degree of the retrieval model s application. RO3: To propose a recommended model based on the retrieval transaction sequence and the browsing features (aims to answer Q3); A comprehensive recommendation model is developed, which is based on the user s atomic retrieval transaction sequences and the browsing characteristics (save, print, bookmarks, and browsing time). This model decomposes search transactions, and calculates the user s degree of interest in the search term, which allows users to query information more clearly. Further, this model incorporates user feedback into the search results evaluation value, which overcomes the shortcomings of the model based on content filtering. RO4: To propose a recommendation model based on user interests association findings (aims to answer Q3); 5

24 By studying and analyzing the user browsing information, we propose a personalized recommendation model based on user interests association findings, which can be used to find the relationship between resources accessed by users, extract the associations of user interests, and address the problem of user interests isolation. It obtains good recommendation accuracy by testing the actual data. 1.4 Significance and Main Contributions The significance and main contributions of the proposed work are as follows: The proposed query suggestion model based on the query semantics and click-through data will be a new extension for search engine query suggestions. The model utilizes the bipartite graph to learn the low-rank query feature space and build a query similarity matrix model. Meanwhile, it combines query literal similarity with query semantic information and calculates subject relevance among queries by word frequency information and the word s concept of Knowledge Network (HowNet). Finally, the model integrates two kinds of relevance to pursue high performance query suggestion The proposed collaborative filtering retrieval model of local and global represents a new extension for information retrieval algorithms. This model considers the local and global characteristics of user accessed information, and treats the two types of characteristics with different methods. The local features use the user s click-stream logs stored in the history of accessed information, to analyze the retrieval session between accessed sequences of resources, and to build the local user s interest function. The global features use the global user s evaluation of information resources to build the global user function of resource interests. The model overcomes the limitations of a single feature, and enlarges the application domain of the retrieval model. 6

25 The proposed web search recommended model based on the users atomic retrieval transaction sequences and the browsing features solves two challenging questions involving information recommendation: Which types of information reflect the user s interests, and how do we find the user s preferences and recommend potential resources to them? We ahve introduced user s atomic retrieval transaction sequences and exploited the browsing characteristics (save, print, bookmarks, browsing time) to build the user s profile. This model decomposes search transactions, and calculates the user s interest degree on the search term, which allows users to query information more clearly; this model incorporates user feedback on search results evaluation values which overcomes the shortcomings of the model based on content filtering. The proposed recommendation model based on user interests association findings will be an extension of the former recommendation model. By studying and analyzing the user browsing information, the model can be used to find the relationship between the resources accessed by users, extract the associations of user interests, and address the problem of user interests isolation. 1.5 Research Methodology Personalized information services focus on the fulfilment of the personalized information demands of different users based on their actual demands, preference characteristics, behaviour patterns, etc. Personalized services can effectively cater to users personal interests, which are widely accepted, and are becoming more and more popular. This thesis considers three aspects of personalized information services, these being the query suggestion for information retrieval, personalized retrieval and information recommendation. For query suggestion for information retrieval, we propose a novel and efficient query sug- 7

26 gestion model integrating the query semantics and click-through data and thereby overcoming the disadvantages of the two kinds of methods. First, we propose a method which combines query literal similarity with query semantic information, and calculates the subject relevance among queries by word frequency information and the word s concept of the Knowledge Network (HowNet); Secondly, we propose another method which utilises the bipartite graph to learn the low-rank query feature space, and then builds a query similarity matrix model based on the features. Based on these, we design a ranking algorithm to propagate the similarities of the users query log information, and then recommend semantically relevant queries to users. The model is composed of three parts: query semantics, the query-url bipartite graph and the integration of multiple suggestion models. In query semantics for the document-based method, we combine the query literal similarity with query semantic information, and then calculate the subject relevance among queries by word frequency information and the word s concept of the Knowledge Network (HowNet). In the query-url bipartite graph for the log-based method, we utilize the bipartite graph (query-url bipartite graph) to learn the low-rank query feature space, and build a query similarity matrix model based on the features. in integrate multiple suggestion models, we integrate two models to pursue high performance query suggestion. Empirical experiments on the click-through data of a commercial search engine have proved the effectiveness and the efficiency of this model. For personalized retrieval, we propose the local and global features-driven collaborative filtering retrieval model which aims to utilize click-through data and Web page ratings to improve Web searching. By performing analysis on the click-through data, we attempt to discover the 8

27 latent factors among these multi-type objects. Page rating is one important characteristic, which can be calculated from the explicit relevance rates of users who have browsed the Web page. By analyzing associations among click-through data multi-type objects and computing Web page ratings, we construct a personalized search model, and then re-rank search results by the model. The model is composed of three parts: user profile, user-based collaborative filtering and apersonalized search model. In the user s profile of the information retrieval model, we calculate the user s preference score through integrating the user search sequence score and web page rating. This model simultaneously takes into account the local features (search sequence) and the global features (web page). In the user-based collaborative filtering part, we predict a test user s interest in a test item based on the rating information from similar user profiles. Each user profile is sorted by its dissimilarity to the test user s profile. Ratings by similar users contribute to predict the test item rating. A set of similar users can be identified by employing a threshold or selecting top-n. In the personalized search model, by selecting top-scoring documents and the documents of interest to users (including those accessed by users and those that are system predicted), we propose a personalized Web search retrieval model whereby different users enter the same query keywords and the search results list is different. To verify the personalized information retrieval, we evaluate it on real-world datasets. The experimental results show that the collaborative filtering retrieval model of local and global features enhances the accuracy of information retrieval. For information recommendation, we propose a recommended model based on the user s atomic retrieval transaction sequence and the browsing features and recommendation model 9

28 based on user interests association findings. The basic idea of the two models is to extract users preferences, build users profiles, and recommend to users potential information based on the users profile. The models are composed of three parts: preference collection, user s profile, and recommendation. In preference collection which extracts data from the web log, we use the users browsing behaviours and the page content browsed by users to develop a novel tool to generate the users preference data. Particularly, we enrich the data by integrating several types of collected data. In the user s profile part which presents the user s preference, we exploit content-based and rule-based motheds to generate the user s profile. Moreover, we adopt different user models for different user s preferences. In the recommendation part, we design the recommendation algorithm which iteratively employs the user s profile and resources in repository to find potential interesting resources for the user. We evaluate the accuracy of recommendation models on real-world datasets. Our experimental results demonstrate that the proposed methods are effective in recommendation, and consistently outperform existing and baseline methods. 1.6 Thesis Structure The rest of the thesis is organized as follows: Chapter 2 is the literature review, we review related works about web log mining, personalized retrieval, recommedation systems. We then discuss the technical challenges in these areas and examine how the developed techniques meet these challenges. 10

29 Chapter 3 describes in detail the query suggestion model based on the query semantics and click-through data. Firstly, it canvasses the related work of query suggesting. Secondly, in the methodology part, it proposes a query suggestion model based on the query semantics and click-through data, which combines the click-stream data matrix model and query semantic information. And lastly, it outlines experiments that show how technology can effectively eliminate ambiguity and input errors reminder queries. Chapter 4 describes in detail the local and global feature-driven collaborative filtering retrieval model. Firstly, it canvasses the related works on personalized searching and click-through data. Secondly, it proposes a local and global feature-driven collaborative filtering retrieval model, which considers the local and global characteristics of the accessed information and treats the two types of characteristics with different methods. Lastly, it shows provides experimental results and analysis.. Chapter 5 describes in detail recommendations based on user s atomic retrieval transaction sequences and the browsing features. Firstly, it discusses the related work on this kind of method and outlines our motivation for this kind of method. Secondly, in the methodology part, we discuss the process of user s atomic retrieval transaction sequence generation, and then introduce a content filtering algorithm to recommend resources to users. Lastly we analyze the experimental results which show that the model has better recommendation effectiveness, and that the recommended efficiency is significantly increased. Chapter 6 describes in detail the model of personalized recommendation based on user interests association findings. Firstly, it canvasses the related work of association rules mining and content-based filtering. Secondly, in the methodology part, we give the term definitions used in this model, present resources and user s prefereces, and build a recommendation model. Finally, we set up experiments about recommendation accuracy by 11

30 testing the actual data. Chapter 7 presents conclusions and recommendations for future research. The relationships between all chapters are shown in Figure 1.1 Figure 1.1: Relationship between chapters 1.7 Publications Related to This Thesis A list of the papers associated with my PhD research that have been submitted, accepted and published appears below: 12

31 1. Xueping Peng, Zhendong Niu, Sheng Huang. Query Suggestion Based on the Query Semantics and Clickthrough Data. Advanced Science Letters. Volume 9, Number 1, pp (6), April Xueping Peng, Zhengong Niu, Sheng Huang, Yumin Zhao. Personalized Web Search Using Clickthrough Data and Web Page Rating. Journal of Computers, Vol 7, No 10 (2012), pp , Oct Xueping Peng, Zhendong Niu,Sheng Huang. A Study on Personalized Recommendation Model Based on Search Behaviors and Resource Properties. ICIECS2010:International Conference on Information Engineering and Computer Science, pp Wuhan, China. Dec , Xueping Peng, Yujuan Cao, Zhendong Niu. Mining Web Access Log for the Personalization Recommendation International Conference on MultiMedia and Information Technology, pp Three Gorges, China, Dec , Xueping Peng, Zhendong Niu. The Research of the Personalization Recommendation Model Based on the Behavior of User s Retrieval and Browse International Conferences on Web Intelligence and Intelligent Agent Technology, pp Silicon Valley, California. Nov Yumin Zhao, Zhendong Niu, Xueping Peng. Research on Data Mining Technologies for Complicated Attributes Relationship in Digital Library Collections. Appl. Math. Inf. Sci. 8, No. 3, pp , Sheng Huang, Xueping Peng, Zhendong Niu, Kunshan Wang. News topic detection based on hierarchical clustering and named entity International Conference on Natural Language Processing and Knowledge Engineering, pp Tokushima, Japan. Nov , Yujuan Cao, Xueping Peng, Kun Zhao, Zhendong Niu, Guixian Xu, Weiqiang Wang. 13

32 Query expansion based on query log and small world characteristic. WISE 2009: 10th International Conference on Web Information Systems Engineering, Poland Poznan Lecture Notes in Computer Science v5802 LNCS, pp ,

33 Chapter 2 Literature Review In this chapter, we review the related works about web log mining, personalized retrieval, the categorization and technology of recommedation systems, and then discuss the technical challenges in these areas. 2.1 Web Log Mining Web log mining is the application of data mining techniques to the data generated by the interactions of users with web servers. This kind of data, stored in server logs, represents a valuable source of information [Mele, 2013]. In analyzing this data, the users basic behavior and mutual association will be explored. These provide direct support for researching user behavior pattern, evaluating the performance of websites, etc. The results accrued from the mining of web logs can also be used to personalize the presentation of web content; mprove user navigation; improve web design or e-commerce sites; optimize the document-retrieval task; improve query suggestion; and improve the customers satisfaction [Abedin and Sohrabi, 2009; Xie, 2014]. 15

34 2.1.1 Technologies Web log mining also known as web usage mining is the application of data mining techniques on large web log repositories to discover useful knowledge about users behavioral patterns and website usage statistics that can be used for various website design tasks. The main source of data for web usage mining consists of textual logs collected by numerous web servers all around the world. There are four stages in web usage mining [Chitraa et al., 2010]. Data Collection: the users log data is collected from various sources like serverside, client side, proxy servers and so on. Preprocessing: performs a series in processing the web log file covering data cleaning, user identification, session dentification, path completion and transaction identification. Mining Algorithms: this is the various data mining techniques to process data like statistical analysis, association, clustering, pattern matching and so on. Pattern Analysis: once patterns are discovered from web logs, uninteresting rules are filtered out. Analysis is done using knowledge query mechanisms such as SQL or data cubes to perform OLAP operations. All the four stages are depicted through the following figure 2.1 [Singh and Singh, 2010] Data Collection Data collection is the very first initialization step of web usage mining. The data authenticity and integrality directly affects the smooth functioning and final recommendation of characteristic service s quality. Therefore it must use scientific, reasonable and advanced technology to gather various data. At present, in relation to web usage mining technology, the main data has originated from three sources: server data, client data and middle data (agent server data 16

35 Figure 2.1: High level Web log mining process and package detecting) [Bari and Chawan, 2013; Singh et al., 2013]. A Web server log is an important source for performing Web Usage Mining because it explicitly records the browsing behavior of site visitors [Domenech and Lorenzo, 2007]. The data which is recorded in server logs contains the information which relates to the access of a Web site by multiple users. However, the site usage data recorded by server logs may not be entirely reliable due to the presence of various levels of caching within the Web environment. Cached page views are not recorded in a server log. A Web proxy acts as an intermediate level of caching between client browsers and Web servers. Proxy caching can be used to reduce the loading time of a Web page experienced by users as well as the networkload Data Preprocessing The information available in the web log is heterogeneous and unstructured. Therefore, the preprocessing phase is a prerequisite for discovering patterns. The goal of preprocessing is to transform the raw click stream data into a set of user profiles. Data preprocessing mainly 17

36 includes data cleaning, user identification, session identification and path completion. Data Cleaning: Most data used for mining [Srivastava et al., 2000] is collected from Web servers, clients, proxy servers, or server databases, all of whom produce noisy data. Because Web mining is sensitive to noise, data cleaning methods are necessary. Data Cleaning is a process of removing irrelevant (noisy data) items such as graphics, videos and format information containing the filename suffixes of GIF, JPEG, CSS, etc. Improved data quality improves the analysis of it. User and Session Identification:The task of user and session identification is to check the different user sessions on the original web access log. User identification identify who accessed the website and which pages were accessed. The goal of session identification is to divide the page accesses of each user into individual sessions. A session is a series of web pages users browse in a single access. The difficulties to accomplish this step are introduced by using proxy servers, e.g. different users may have the same IP address in the log [Singh et al., 2013]. Path Completion: Another critical step in data preprocessing is path completion. There are a number of reasons that result in path s incompletion, for instance, local cache, agent cache, post technique and browser s back button can result in some important accesses not being recorded in the access log file, and the number of Uniform Resource Locators (URL) recorded in the log may be less than the real one. Using the local caching and proxy servers also produces difficulties for path completion because users can access the pages in the local caching or the proxy servers caching without leaving any record in theserver s access log. As a result, the user access paths are incompletely preserved in the web access log. To discover the user s travel pattern, the missing pages in the user access path should be appended. The purpose of the path completion is to accomplish this task. 18

37 The better results in terms of data preprocessing, the better we can improve the mined patterns quality and save the algorithm s running time. It is especially important to web log files, that the structure of web log files are not the same as the data in the database or data warehouse. They are not structured and complete due to various contributing factors. So it is especially necessary to pre-process web log files in the web usage mining. Through data pre-processing, the web log can be transformed into another data structure, which can be better mined Mining Algorithms Web log mining algorithms use the statistical method to carry on the analysis and mine the pretreated data. At present, the typically used machine learning methods are primarily concerned with clustering, classifying, relation discovery and order model discovery. Each method has its own significance and shortcomings, but the most effective method at the moment is classifying and clustering Pattern Analysis The challenges of pattern analysis are to filter uninteresting information and to visualize and interpret interesting patterns for the user. First, we need to delete the less significant rules or models from the interested model storehouse. Second, we use technology of OLAP to carry out the comprehensive mining and analysis, and allow the discovered data or knowledge to be visible. Finally, we provide the characteristic service to the electronic commerce website Research and Applications Web log mining deals with understanding user behavior in interacting with the web or with a website. One of the aims is to obtain information that may assist web site reorganization 19

38 or assist site adaptation to better suit the user. Web log mining model is a form of mining to server logs and its aim is to get useful user access information in logs to make sites perfect themselves with appropriate user requirements, serve users better and provide more economical benefits [Singh and Singh, 2010]. Many researches have developed Web Usage Mining (WUM) algorithms utilizing Web log records in order to discover useful knowledge to be used in supporting business applications and decision making. The quality of WUM in knowledge discovery, however, depends on the algorithm as well as the data. Tao et al. [2008] explored a new data source called intentional browsing data (IBD) for potentially improving the effectiveness of WUM applications. IBD is a category of online browsing actions, such as copy, scroll, or save as, and is not recorded in web log files. Consequently, the research aims to build a basic understanding of IBD which will lead to its easy adoption in WUM research and practice. Recently, a number of WUM algorithms [Bhushan and Nath, 2012; Hollink et al., 2013; Hosseini and Abolhassani, 2007; Hung et al., 2013; Mele, 2013; Sumathi et al., 2010] have been proposed to analyze and predict user behavior patterns. Prediction of user future movements and intentions is based on the users clickstream data. Romero et al. [2013] developed a specific Moodle mining tool and applied it to e-learning systems in order to predict the marks that university students will obtain in the final exams. Jalali et al. [2009] developed a model for online predicting through web usage mining systems and proposed an approach for classifying user navigation patterns to predict users future intentions. The approach is based on using the longest common subsequence algorithm to classify current user activities to predict the user s next movement. 20

39 2.1.3 Challenges Although many techniques and applications have been proposed to support web log mining, there are still many issues that need to be tackled in order to provide high quality web services. These issues are listed as follows: Discovering high quality knowledge: The quality of the discovered knowledge directly influences the quality of the web services provided. In order to discover high quality knowledge, new data mining methods and techniques are required. Applying the discovered knowledge for advanced web applications: Once access patterns have been discovered, they should be further analyzed and applied to advanced web applications, such as personalized retrieval and recommendation. Discovering semantic information: Since web logs lack semantic information about the web pages visited by users, it is difficult to understand the preferences and intentions of users. With the development of the Semantic Web (such as HowNet), semantics in web content can be used for improving the results of web log mining. 2.2 Personalized Retrieval Many information systems have attempted to solve the information overload problem that querying information seekers are currently facing. However, despite being very efficient, traditional Information Retrieval (IR) techniques often follow the one-size-fits-all paradigm by delivering the same information in the same form and order for every user with the same query. Since different user information needs and queries arise in varying contexts with different intentions, research has started to focus on retrieving potentially relevant documents [Dumais, 2009]. This development has sparked off the notion of personalized retrieval, which attempts 21

40 to modify and evolve established IR techniques in order to produce more personally relevant results. Such systems tend to represent users with simplified profiles, which are often based on historic interests or user location properties (e.g. geographical location, language prevalence in a region). Initial evidence has emerged that some of these PIR techniques have been applied within popular web search engines, however little detail has been published so far. Although such statistical approaches enable the efficient calculation of personalized ranked lists, other considerations such as user context or preferences are often neglected [Steichen et al., 2012]. Personalized retrieval is based on the standard Information Retrieval model, which traditionally focuses on the retrieval of documents that are relevant to a unitary query. While personalized retrieval extends this model by taking into account historical interactions, the paradigm is still concerned with finding the most relevant documents for a single user query. This fundamental underpinning of personalized retrieval makes such systems particularly suitable for the general information access paradigm of searching by query, where it is assumed that a user can express their information need in a relatively precise user query [Steichen et al., 2012]. The current strategies of personalized search fall into two categories [Pitkow et al., 2002]. Cai et al. [2014] described two approaches to personalizing Web search results: query expansion and re-ranking of search results. In query expansion, user interests are conflated with a given query, and the expanded query is used for searching the Web. For re-ranking of search results, the search engine results are re-ranked by computing the similarity between the document contents and the terms in the user interest preference [Kumar et al., 2014] Query Expansion Query expansion [Chirita et al., 2007] refers to modifying the original query either by expanding it with other terms or assigning different weights to the terms in the query [Cai et al., 2014]. 22

41 Query expansion involves adding new words and phrases to the existing search terms to generate an expanded query. The expansion can be computed by finding relationships between query terms and document terms in terms of probabilistic correlations or association rules. It can also be approached by analyzing the implicit actions that a user performs during the search [Agosti et al., 2012]. In [Cui et al., 2002, 2003], by exploiting correlations between terms in documents and user queries mined from user logs, the query expansion method achieved significant improvements in retrieval effectiveness compared to other query expansion techniques. The central idea of the method is that if a set of documents is often selected for the same queries, then the terms in these documents are strongly related to the terms of the queries. Thus some probabilistic correlations between query terms and document terms can be established based on the query logs, and these probabilistic correlations can be used for selecting high-quality expansion terms from documents for new queries. Shi and Yang [2007] used an improved association rule mining model to mine related queries from query transactions in query logs. The model presented an algorithm that firstly segments the user sessions identified in query logs into query transactions, and then mines association rules of related queries using an improved association rule mining model. This mining model utilizes not only the co-occurrences between distinct queries but also the distance similarity between them. White et al. [2007] reports the results of a comparison of pseudo-relevance feedback and query log-based refinement. The study showed that the source, the amount of feedback and the query type affect the similarity between query extension and pseudo-relevance feedback. Conceivably, both techniques can be deployed in parallel and refinements can be offered based on query classification. 23

42 2.2.2 Result Processing Result processing adapts the search results to a particular user s preferences. Most reranking strategies attempt to construct a user profile from the user s historical behavior and use the profile to filter out resources that do not match his/her interests [Cai et al., 2014]. Pretschner and Gauch [1999] structured user profiles with an ontology consisting of 4400 nodes. Chirita et al. [2005] modeled both user profiles and resources as topic vectors from an ODP8 hierarchy; thus the matching between user interest and content can be measured by their vector distance. Besides learning user profiles based on their own browsing histories, Sugiyama et al. [2004] also explored social information to refine search results with the help of like-minded neighbors. Dou et al. [2007] compared various personalization approaches (e.g. click-based, profile based, long term based, and short term based) and proposed an evaluation framework for the strategies Challenges Although the methods described in the above-mentioned work are able to handle personalized searches with user and item profiles, there are some limitations. Most (if not all) of the current methods for personalized searching construct user profiles and resource profiles based on the Vector Space Model (VSM) or BM25 ranking model [Kumar et al., 2014; Sun et al., 2005a; Wang and Zhai, 2007]. The weight of each item in a user profile is the degree to which the user is interested in the item. In addition, the weight of each item in a resource profile is the degree to which the resource is relevant to the item. However, solely relying on TF, or BM25 values to measure the weight of items does not sufficiently indicate how much a user is interested in an item. 24

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Web Mining Using Cloud Computing Technology

Web Mining Using Cloud Computing Technology International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

International Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations

International Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 02, February -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Survey

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns

Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns # Yogish H K #1 Dr. G T Raju *2 Department of Computer Science and Engineering Bharathiar University Coimbatore, 641046, Tamilnadu

More information

Inferring User Search for Feedback Sessions

Inferring User Search for Feedback Sessions Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

R. R. Badre Associate Professor Department of Computer Engineering MIT Academy of Engineering, Pune, Maharashtra, India

R. R. Badre Associate Professor Department of Computer Engineering MIT Academy of Engineering, Pune, Maharashtra, India Volume 7, Issue 4, April 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Service Ranking

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Survey Paper on Web Usage Mining for Web Personalization

Survey Paper on Web Usage Mining for Web Personalization ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Pattern Classification based on Web Usage Mining using Neural Network Technique

Pattern Classification based on Web Usage Mining using Neural Network Technique International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering Optimized Re-Ranking In Mobile Search Engine Using User Profiling A.VINCY 1, M.KALAIYARASI 2, C.KALAIYARASI 3 PG Student, Department of Computer Science, Arunai Engineering College, Tiruvannamalai, India

More information

Nitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION

Nitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION WEB PERSONALIZATION Mrs. M.Kiruthika 1, Nitin Cyriac 2, Aditya Mandhare 3, Soniya Nemade 4 DEPARTMENT OF COMPUTER ENGINEERING Fr. CONCEICAO RODRIGUES INSTITUTE OF TECHNOLOGY,VASHI Email- 1 venkatr20032002@gmail.com,

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Web Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques

Web Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques Web Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques Imgref: https://www.kdnuggets.com/2014/09/most-viewed-web-mining-lectures-videolectures.html Contents Introduction

More information

ImgSeek: Capturing User s Intent For Internet Image Search

ImgSeek: Capturing User s Intent For Internet Image Search ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Semantic Clickstream Mining

Semantic Clickstream Mining Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana School of Information Technology A Frequent Max Substring Technique for Thai Text Indexing Todsanai Chumwatana This thesis is presented for the Degree of Doctor of Philosophy of Murdoch University May

More information

A Web Page Recommendation system using GA based biclustering of web usage data

A Web Page Recommendation system using GA based biclustering of web usage data A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,

More information

A Survey on Web Personalization of Web Usage Mining

A Survey on Web Personalization of Web Usage Mining A Survey on Web Personalization of Web Usage Mining S.Jagan 1, Dr.S.P.Rajagopalan 2 1 Assistant Professor, Department of CSE, T.J. Institute of Technology, Tamilnadu, India 2 Professor, Department of CSE,

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Data Preprocessing Method of Web Usage Mining for Data Cleaning and Identifying User navigational Pattern

Data Preprocessing Method of Web Usage Mining for Data Cleaning and Identifying User navigational Pattern Data Preprocessing Method of Web Usage Mining for Data Cleaning and Identifying User navigational Pattern Wasvand Chandrama, Prof. P.R.Devale, Prof. Ravindra Murumkar Department of Information technology,

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Systems. IS Ph.D. Program. Page 0

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Systems. IS Ph.D. Program. Page 0 ASSIUT UNIVERSITY Faculty of Computers and Information Department of Information Systems Informatiio on Systems PhD Program IS Ph.D. Program Page 0 Assiut University Faculty of Computers & Informationn

More information

A Survey On Diversification Techniques For Unabmiguous But Under- Specified Queries

A Survey On Diversification Techniques For Unabmiguous But Under- Specified Queries J. Appl. Environ. Biol. Sci., 4(7S)271-276, 2014 2014, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com A Survey On Diversification Techniques

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

An Effective method for Web Log Preprocessing and Page Access Frequency using Web Usage Mining

An Effective method for Web Log Preprocessing and Page Access Frequency using Web Usage Mining An Effective method for Web Log Preprocessing and Page Access Frequency using Web Usage Mining Jayanti Mehra 1 Research Scholar, Department of computer Application, Maulana Azad National Institute of Technology

More information

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data The Discovery and Retrieval of Temporal Rules in Interval Sequence Data by Edi Winarko, B.Sc., M.Sc. School of Informatics and Engineering, Faculty of Science and Engineering March 19, 2007 A thesis presented

More information

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Dr.K.Duraiswamy Dean, Academic K.S.Rangasamy College of Technology Tiruchengode, India V. Valli Mayil (Corresponding

More information

An Ontological Framework for Contextualising Information in Hypermedia Systems.

An Ontological Framework for Contextualising Information in Hypermedia Systems. An Ontological Framework for Contextualising Information in Hypermedia Systems. by Andrew James Bucknell Thesis submitted for the degree of Doctor of Philosophy University of Technology, Sydney 2008 CERTIFICATE

More information

Ontology Based Search Engine

Ontology Based Search Engine Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India

More information

data-based banking customer analytics

data-based banking customer analytics icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering

A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering R.Dhivya 1, R.Rajavignesh 2 (M.E CSE), Department of CSE, Arasu Engineering College, kumbakonam 1 Asst.

More information

Research/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002

Research/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002 Journal) Volume1, Issue3, Nov-Dec, 2014.ISSN: 2349-7173(Online) International Journal of Advanced Research in Technology, Engineering and Science (A Bimonthly Open Access Online. Research/Review Paper:

More information

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,

More information

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor

More information

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM K.Dharmarajan 1, Dr.M.A.Dorairangaswamy 2 1 Scholar Research and Development Centre Bharathiar University

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Information Push Service of University Library in Network and Information Age

Information Push Service of University Library in Network and Information Age 2013 International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2013) Information Push Service of University Library in Network and Information Age Song Deng 1 and Jun Wang

More information

The influence of caching on web usage mining

The influence of caching on web usage mining The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,

More information

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES VISUAL RERANKING USING MULTIPLE SEARCH ENGINES By Dennis Lim Thye Loon A REPORT SUBMITTED TO Universiti Tunku Abdul Rahman in partial fulfillment of the requirements for the degree of Faculty of Information

More information

Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy

Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy 2017 IJSRST Volume 3 Issue 1 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy

More information

AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT

AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT Brindha.S 1 and Sabarinathan.P 2 1 PG Scholar, Department of Computer Science and Engineering, PABCET, Trichy 2 Assistant Professor,

More information

ABSTRACT I. INTRODUCTION II. METHODS AND MATERIAL

ABSTRACT I. INTRODUCTION II. METHODS AND MATERIAL 2016 IJSRST Volume 2 Issue 4 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology A Paper on Multisite Framework for Web page Recommendation Using Incremental Mining Mr.

More information

I. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs

I. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Enhanced Pre-Processing Research Framework for Web Log Data

More information

Recommendation on the Web Search by Using Co-Occurrence

Recommendation on the Web Search by Using Co-Occurrence Recommendation on the Web Search by Using Co-Occurrence S.Jayabalaji 1, G.Thilagavathy 2, P.Kubendiran 3, V.D.Srihari 4. UG Scholar, Department of Computer science & Engineering, Sree Shakthi Engineering

More information

Document Clustering for Mediated Information Access The WebCluster Project

Document Clustering for Mediated Information Access The WebCluster Project Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at

More information

Web Database Integration

Web Database Integration In Proceedings of the Ph.D Workshop in conjunction with VLDB 06 (VLDB-PhD2006), Seoul, Korea, September 11, 2006 Web Database Integration Wei Liu School of Information Renmin University of China Beijing,

More information

Topic Diversity Method for Image Re-Ranking

Topic Diversity Method for Image Re-Ranking Topic Diversity Method for Image Re-Ranking D.Ashwini 1, P.Jerlin Jeba 2, D.Vanitha 3 M.E, P.Veeralakshmi M.E., Ph.D 4 1,2 Student, 3 Assistant Professor, 4 Associate Professor 1,2,3,4 Department of Information

More information

Study on Personalized Recommendation Model of Internet Advertisement

Study on Personalized Recommendation Model of Internet Advertisement Study on Personalized Recommendation Model of Internet Advertisement Ning Zhou, Yongyue Chen and Huiping Zhang Center for Studies of Information Resources, Wuhan University, Wuhan 430072 chenyongyue@hotmail.com

More information

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program.

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program. ASSIUT UNIVERSITY Faculty of Computers and Information Department of Information Technology Informatiio on Technology PhD Program IT PH.D. Program Page 0 Assiut University Faculty of Computers & Informationn

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Re-Ranking of Web Image Search Using Relevance Preserving Ranking Techniques

Re-Ranking of Web Image Search Using Relevance Preserving Ranking Techniques Re-Ranking of Web Image Search Using Relevance Preserving Ranking Techniques Delvia Mary Vadakkan #1, Dr.D.Loganathan *2 # Final year M. Tech CSE, MET S School of Engineering, Mala, Trissur, Kerala * HOD,

More information

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR International Journal of Emerging Technology and Innovative Engineering QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR V.Megha Dept of Computer science and Engineering College Of Engineering

More information

Recommender System for Personalization in. Daniel Mican Nicolae Tomai

Recommender System for Personalization in. Daniel Mican Nicolae Tomai Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican Nicolae Tomai Introduction The ability of a web application to offer personalised content

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2 SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2 1 PG student, Department of Computer Engineering, Alard College of Engineering and Management 2 Department of

More information

Information Retrieval (Part 1)

Information Retrieval (Part 1) Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected

More information

An Algorithm for user Identification for Web Usage Mining

An Algorithm for user Identification for Web Usage Mining An Algorithm for user Identification for Web Usage Mining Jayanti Mehra 1, R S Thakur 2 1,2 Department of Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, MP, India

More information

Probability Measure of Navigation pattern predition using Poisson Distribution Analysis

Probability Measure of Navigation pattern predition using Poisson Distribution Analysis Probability Measure of Navigation pattern predition using Poisson Distribution Analysis Dr.V.Valli Mayil Director/MCA Vivekanandha Institute of Information and Management Studies Tiruchengode Ms. R. Rooba,

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION vi TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION iii xii xiii xiv 1 INTRODUCTION 1 1.1 WEB MINING 2 1.1.1 Association Rules 2 1.1.2 Association Rule Mining 3 1.1.3 Clustering

More information

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES Zui Zhang, Kun Liu, William Wang, Tai Zhang and Jie Lu Decision Systems & e-service Intelligence Lab, Centre for Quantum Computation

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Web Usage Data for Web Access Control (WUDWAC)

Web Usage Data for Web Access Control (WUDWAC) Web Usage Data for Web Access Control (WUDWAC) Dr. Selma Elsheikh* Abstract The development and the widespread use of the World Wide Web have made electronic data storage and data distribution possible

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information