PERSONALIZING WEB DIRECTORIES THROUGH WEB USAGE MINING G Neelima

Size: px

Start display at page:

Download "PERSONALIZING WEB DIRECTORIES THROUGH WEB USAGE MINING G Neelima"

James Dawson
5 years ago
Views:

1 PERSONALIZING WEB DIRECTORIES THROUGH WEB USAGE MINING G Neelima Department of Computer Science & Engineering GMRIT, Rajam, Srikakulam(Dt) ,A.P gullipalli.neelima@gmail.com M. Vijaya Bharathi Department of Computer Science & Engineering GMRIT, Rajam, Srikakulam(Dt) ,A.P vijayabharathi116@gmail.com Abstract - Web directory is a website that has been created to gather links from all sorts of business and other types of organizations.these links will appear under categories and subcategories so that they can be browsed easily by visitors..web Personalization is the task of making Web-based information systems adaptive to the needs and interests of users and acts as an important means to tackle information overload. The Web Directory is viewed as a concept hierarchy which is generated by a content-based document clustering method and personalization is realized by constructing user community models.the community models are extracted with the use of a simple cluster mining algorithm(cdm) as per the needs of user communities which are constructed on the basis of usage data as maintained by the proxy servers in access log of an internet service provider and for identifying latent factors in data for Community Web Directory discovery PLSA is used. Index Terms Web directory, personalization,community models,cdm,plsa. 1. INTRODUCTION As the amount of information on the web increases rapidly, it creates many new challenges for web search. From the point of view of users, usually a great portion of time is needed to look for and find the appropriate information.it became necessary to organize the web content into thematic hierarchies called web directories.web directories provide the user with websites arranged according to the list of topics that are to be visited by the users. Personalization is a possibility for the success of the evolving of a Web infrastructure. Web Personalization is a task of making Web-based information systems adapted to the needs and interests of individual users or group of users like user communities, to handle information overload. In a web directory users can find any topic that they are interested in searching the websites, starting with large categories and gradually moving down and choosing the category related to their interests. Searching for a particular topic, the user has to check deep inside the directory. Hence, the size and the complexity of the Web directory has increased and information overload problem will occur. To overcome the difficulties of web directories, we focus on the construction of community models. The users of a community can use the community model as a starting point for navigating the Web, based on the topics that they are interested in, instead of accessing vast Web directories.the goal is constructing community web directories based on web usage data. The usage data that form the basis for the construction of community Web directories are collected from the browsing history of the user. Creating accurate and operational user models can overcome the deficiencies of Web directories and Web personalization by combining their strengths, providing a new tool to fight information overload. A user community model is constructed on the basis of usage data by usage mining methods. Web usage mining is an approach that employs knowledge discovery from usage data to automate the creation of user models. Web usage mining, is traditionally performed in several stages [1], [3] to achieve its goals: A collection of Web data such as activities/clickstreams recorded in Web server logs, Pre-processing of Web data such as filtering crawler requests, requests for graphics, and identifying unique sessions, Analysis of Web data, also known as Web Usage Mining [4], to discover interesting usage patterns or profiles. Web directories that model the interests of groups of users are known as user communities. The construction of user community models, i.e., usage patterns representing the browsing preferences of the community members, with the aid of Web Usage Mining has primarily been studied in the context of specific Web sites [1]. Community Web Directories exemplify a new objective of Web personalization, beyond Web page recommendations [1], or adaptive Web sites [2]. The members of a community can use the community directory as a starting point for navigating the Web, based on the topics that they are interested in, without the requirement of accessing vast Web directories. Thus, personalization can be of particular benefit to large generic directories such as ODP, or Yahoo!. Personalized versions of these directories can also be employed by various services on the Web, such as Web portals, in order to offer their subscribers a personalized view of the Web. Moreover, community Web directories can be exploited by Web search engines to provide personalized results to queries. 1.1 ProposedSystem: We propose a knowledge discovery framework for

2 building Web directories according to the preferences of user communities. User community models take the form of thematic hierarchies and are constructed by employing clustering and probabilistic learning approaches. It makes it possible to handle large amounts of data residing in a sparse dimensional space, by aggregating the statistics of many users through community web directories. Structure data, Usage data and User profile data as shown in the below diagram Figure 2: Web Data categories Figure 1.1Construction of a personalized web directory The fig 1.1 shows the construction of a personalized web directory. Our proposed system identifies users of similar type into a community and applies personalization to all the users of the web directory based on the browsing history of the user. 1.2 Advantages of proposed system: Browsing: directories tend to work best when you want to browse a relatively broad subject. Starting with a directory can give you a good idea about the amount and type of web based information on your topic. Although many Web directories offer a search functionality of some kind (otherwise it will be impossible to browse thousands of pages for let's say Computers), search directories are fundamentally different from search engines in the two ways most directories are edited by humans and URLs are not gathered automatically by spiders but submitted by site owners. The main advantage of Web directories is that no matter how clever spiders become, when there is a human to view and check the pages, there is a lesser chance that pages will be classified in the wrong categories. 2. Experimental investigation: Web personalization and recommendation system utilize data, which is collected explicitly or implicitly during the interaction of user with the website. Such a collection of data is known as Web data that can be divided into four major categories: Content data, o Content data are presented to the end-user appropriately structured. They can be simple text, images, or structured data, such as information retrieved from databases. o Structure data represent the way content is organized. They can be either data entities used within a Web page, such as HTML or XML tags, or data entities used to put a Web site together, such as hyperlinks connecting one page to another. o Usage data represent a Web site s usage, such as a visitor s IP address, time and date of access, complete path (files or directories) accessed, referrers address, and other attributes that can be included in a Web access log. o User profile data provide information about the users of a Web site. A user profile contains demographic information for each user of a Web site, as well as information about users interests and preferences. Such information is acquired through registration forms or questionnaires, or can be inferred by analyzing Web usage logs. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents and services. In the literature, three main axes of Web mining have been identified, according to the Web data used as input in the data mining process, namely Web structure, Web content and Web usage mining. o Web structure mining is to categorize the Web pages and generate information such as the similarity and relationship between them, taking advantage of their hyperlink topology. In the latter years, the area of Web structure mining focuses on the identification of authorities, i.e. pages that are considered as important sources of information from many people in the Web community. o Web content mining has to do with the retrieval of information (content) available on the Web into more structured forms as well as its indexing for easy tracking information locations. Web content may be unstructured (plain text), semi structured (HTML

3 documents), or structured (extracted from databases into dynamic Web pages). Such dynamic data cannot be indexed and consist what is called the hidden Web. A research area closely related to content mining is text mining. o Web usage mining is the process of identifying browsing patterns by analyzing the user s navigational behaviors. This information takes as input the usage data, i.e. the data residing in the Web server logs, recording the visits of the users to a Web site. Extensive research in the area of Web usage mining led to the appearance of a related research area, that of Web personalization. Web personalization utilizes the results produced after performing Web usage mining, in order to dynamically provide recommendations to each user. A recommendation system learns from a customer s behavior and recommends a product in which users may be interested. It helps to build up a long lasting relationship with loyal users of the website. Various vendors offer web personalization tools that can be employed in existing systems to achieve personalized web system. This paper presents a brief review of efforts in web personalization and recommendation by means of web usage mining. It also elaborates the role of web usage mining in personalization, and presents the open challenges that are yet to be met. 3. Process Model To personalize a user directory, we propose a data mining task over and above user s navigation patterns. To this the system needs to understand users with similar navigation behavior and similar search interests. Such users frame interest groups. Mining tasks are performed for the same. The modules in the project are : Construction of web directory Personalization methodology The Proposed System provides collecting and mining of the collected data that which can improve the directory performance also provides the quality of the User-Interface. The Proposed Model constitutes the following activities. Collecting all the Web accessed data on a web server. Preparing the collected information as a data set. [Pre pruning Process]. Creating per-user profiles. Creating decisions based on the user profiles. A collection of Web data such as activities/click Streams recorded in Web Server. Interpretation/Evaluation of the discovered profiles. 3.1 MODULES Construction of Web Directory Web directories are thematic hierarchies corresponding to listing of topics which are organized and overseen by humans. As the size of the web is increasing at a galloping pace; the information overload of its usersemerges as one of the web s major shortcomings. We have implemented a model system to evaluate the mining and personalization task proposed in this paper. Developing a web application is the first module where users are allowed to register and browse the directory.the directory categories and the categories of web sites were stored in a database.all listings are organized within the categories displayed on the directory page. You can browse through the directory by looking through the category list and clicking various category links. A directory is a subject-tree style catalogue that organizes the Web into major topics, including Arts, Business and Economy, Computers and Internet, Education, Entertainment, Government, Health, News, Recreation, Reference, Regional, Science, Social Science, Society and Culture. Under each of these topics is a list of subtopics and under each of those is another list, and another, and so on, moving from the most general to the more specific. Login is the first part of the construction of a web directory. The login page helps us to navigate inside the Web site. The login page is found as a link on the home page of the personalized Web directory. After logging in, the user will proceed with the directory services. This login consists of two sub modules namely. The registration page consists of various fields like username, password, id and mobile number. These data are stored in the database and these data can be retrieved from the database when the login page can be accessed for the new users to navigate throughout the Web site. Only authenticated users are allowed to register. The username should be of alphabetic characters. Password can be of any character of user s type. id must be a valid of any prototype Website. A valid ten digit mobile number is to be provided in the next mobile number field. User Authentication is the main page for logging in the Web directory. This link is provided on the home page of the Web directory. It consists of the username and the password field through which the user can log in. The system checks with the database details already registered for the new user when the submit button is

4 clicked. It thrives the user to the success page if the details provided are correct. Else the user has thrived to the error page to make user provide the correct details already registered. This page also allows the user to transact to the registration page in case of undefined new user Personalization Methodology Interest Group (Communities) The main part of the proposed work is involved in recognizing how navigational patterns of different users are similar to each other. Here we use a metric to measure the similarity between navigation patterns. The metric is defined as the ratio of the number of the common categories that exist in the navigation patterns to the total number of distinct categories in the patterns. Let P1 and P2 be the two navigation patterns. The similarity S between two navigation patterns (p1 and p2) is calculated as follows: ݎݑ All the values that are greater than a certain threshold are considered to be similar. The categories of a topic in a directory are hierarchically arranged. This accepted order should be established in the discovered navigation patterns. So we need to focus on discovering frequent sequences of popular categories in navigation patterns. A category is popular because its number of visits is highly ranked. These sequences are called as sequential navigation sub patterns. These sub patterns capture the idea of popular moves among categories. The order for a navigation pattern tells apart the discovery of frequent sequential navigation patterns of the discovery of association rules [17]. The process of finding the similar /interest groups can be said to follow the association rule mining algorithm of data mining. Association rule mining has a wide range of applicability such market basket analysis, medical diagnosis/ research, Website navigation analysis, homeland security, education, financial and business domain and so on. Association is a data mining technique.association (or relation) is probably the better known and most familiar and straightforward data mining technique. Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.the generation of the association rule mining, aims to find correlations and association relationships among the already existing recordings in a dataset items Personalization: Personalized search most important module in this project. Personalized search has been around for a while for signed in users who have web history enabled. After authentication the user allowed to search their data. Mining task is more useful in personalization of the topic directory in terms of the navigation behaviour of the user and their interest groups. The result of personalization tasks are a set of shortcuts among categories in the directory. A shortcut A B is a direct link from A to B.These shortcuts are the used for navigating the directory based on navigation behaviour of different users or interest group. We used two modes of personalization to create a shortcut for the user based on user history and based on history of all the users.the user s search is maintained as history. The user mode provides the user with a personalized view providing the most visited categories as shortcut to the users which lets the user to navigate to their favorite patterns with a less navigation path. The other mode provides the user with all the websites that are organized according to the list of topics, along with a recommended list of websites as analyzed by the system from the navigation patterns of all users using the directory. Whenever a user is logs in, the navigation patterns of the users are recorded in database.however a visitor having no previous interaction with the website poses a problem to the personalization system as there is no data available for personalizing the interactions with such a user. A similar problem occurs for a newly added item. Due to the absence of rating history, system cannot recommend a new item to users until it rating history has been collected. So the user will be provided data analysing all the previous users of the directory. Personalization can be experienced by user in his/her next usage of the directory. When ready to use the directory services, the history of the user will be analyzed and the community of the user is determined. Based on the community the user is put, he/she will be recommended with highly rated sites used in that community. In order to identify similar users and form a community for them, the formula mentioned above under interest group is used. Their web searches are thus personalized

4. Results And Discussion: 4.1 Construction of Web Directory: Figure 4.

2 Personalization Methodology: Figure 4.2 Login page The fig 4.

how Personalization is realized. Figure 4.4 Category Search The fig 4.

The right side of the page shows the recommended categories for the user to search based on the

2 shows the login page for a registered user to access the Web Directory i.e. the login page help to navigate inside the Web Directory.

5 4. Results And Discussion: 4.1 Construction of Web Directory: Figure 4.1 Web Directory home page 4 Figure 4.3 Directory page with two modes.2 Personalization Methodology: Figure 4.2 Login page The fig 4.1 shows the home page of Web Directory providing a brief introduction about a Web Directory and how Personalization is realized. Figure 4.4 Category Search The fig 4.4 shows the view after entering into a categorized search mode of the user. The right side of the page shows the recommended categories for the user to search based on the history of current and other users. The fig 4.2 shows the login page for a registered user to access the Web Directory i.e. the login page help to navigate inside the Web Directory.The registration page consists of various fields like username, password, id and mobile number. These data are stored in the database and these data can be retrieved from the database. Figure 4.5 Database of user s history

The fig 4.4 provides the data of signed in users who have web history enabled. The user s search is maintained as historic. Their web searches personalized.

Category mode recommendations Shopping Computers and Web Arts ݎݑ 1 2 1 2 1 2 100 0 5 10 15 u Freq ency Figure 4.6 Bar graph for history of users Fig 4.

7 Categories under Arts along with recommended categories Figure 4.10 Websites under courses The fig 4.

6 The fig 4.4 provides the data of signed in users who have web history enabled. The user s search is maintained as historic. Their web searches personalized.in order to identify similar users and form a community for them, the formula mentioned is analysed shown. Category mode recommendations Shopping Computers and Web Arts ݎݑ u Freq ency Figure 4.6 Bar graph for history of users Fig 4.6 shows bar graph for history of users based on which the category mode recommendations are made. Figure 4.9 Identifying similar users (community) Figure 4.7 Categories under Arts along with recommended categories Figure 4.10 Websites under courses The fig 4.10 show the navigation behavior of the newly logged in user with no history enabled. Figure 4.8 Categories under Animation In this case since the new has no history and could not find suggestions, the new user is provided with suggestions based on all the previous users of the directory. The new will not have any suggestions under the user mode

7 5. Conclusion And Future Scope: Web Usage Mining becomes an important aspect in today s era because the quantity of data is continuously increasing. In order to achieve personalization of web directories, this paper advocates the concept of a community Web directory, as a Web directory that specializes to the needs and interests of particular user communities. Furthermore, it presents the complete methodology for the construction of such directories with the aid of machine learning methods and employs the personalization methodology to provide recommended view for the user. Our proposed system used Web usage mining to discover interesting user navigation patterns, which can then be applied to real-world problems such as Web site/page improvement, additional product/topic recommendations, user/customer behavior studies, etc.. 7. References: [1] Dimitrios Pierrakos and Georgios Paliouras, Personalizing Web Directories With The Aid Of Web Usage Data, Ieee Transactions On Knowledge And Data Engineering, Vol. 22, No. 9,Pp: [2] Dimitris pierrakos, Georgios paliouras, Christos Papathedorou, Vangelis Karkaletsis and Marios Dikaiakos, Construcation web community Directories using Document Clustering and Web usage mining. September [3] D. Pierrakos, G. Paliouras, C. Papatheodorou, and C.D.Spyropoulos, Web Usage Mining as a Tool for Personalization: A Survey, User Modeling and User- Adapted Interaction, vol. 13, no. 4, pp , [4] P.I. Hofgesang, Online Mining of Web Usage Data: An Overview, Web Mining Applications in E-Commerce and E- Services, pp. 1-24, Springer, [5] G. Castellano, A.M. Fanelli, and M.A. Torsello, Computational Intelligence Techniques for Web Personalization, Web Intelligence and Agent Systems, vol. 6, no. 3, pp , [6] T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, From useraccess Patterns to Dynamic Hypertext Linking, Proc. Fifth Int lworld Wide Web Conf. (WWW 96), [7] M. Perkowitz and O. Etzioni, Adaptive Web Sites: Automatically Learning for User Access Pattern, Proc. Sixth Int l WWW Conf. (WWW 97), [8] J. Borges and M. Levene, Data Mining of User Navigation Patterns, Web Usage Analysis and User Profiling, LNCS, H.A. Abbass, R.A. Sarker, and C.S. Newton, eds. pp , Springer-Verlag, [9] R. Cooley, B. Mobster, and J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, Proc. Ninth IEEE Int l Conf. Tools with AI (ICTAI 97), pp , [10] P.A. Chirita, W. Nejdl, R. Paiu, and C. Kohlschu tter, Using odp Metadata to Personalize Search, Proc. 28th Ann. Int l ACM SIGIR Conf. Research and Development in Information Retrieval, pp , [11] A. Sieg, B. Mobasher, and R. Burke, Ontological User Profiles for Representing Context in Web Search, Proc. IEEE/WIC/ACM Int l Conf. Web Intelligence and Intelligent Agent Technology Workshops, pp , [12] T. Oishi, K. Yoshiaki, M. Tsunenori, H. Ryuzo, F. Hiroshi, and M. Koshimura, Personalized Search Using odp-based User Profiles Created from User Bookmark, Proc. 10th Pacific Rim Int l Conf. Artificial Intelligence, pp , [13] J. Garofalakis, T. Giannakoudi, and A. Vopi, Personalized Web Search by Constructing Semantic Clusters of User Profiles, Proc. 12th Proc. Int l Conf. Knowledge-Based Intelligent Information and Eng. Systems, pp , [14] Bing Liu, Web Data Mining, Second Edition ed.: Springer,

8 [15] Sungjune Park, Nallan Suresh, and Bong- Keun Jeong, "Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm," Data & Knowledge Engineering, vol. 65, pp , [16]Bamshad Mobasher, Hoghua Dai, Tao Luo, Yuqing Sun, and Jiang Zhu, "Integrating Web Usage and Content Mining for More Effective Personalization," in Proceedings of the International Conference on E- Commerce and Web Technologies, Greenwich, UK, [17] Magdalini Eirinaki, Michalis Vazirgiannis, and Iraklis Varlamis, "Using Site Semantics and a Taxonomy to Enhance the Web Personalization Process," in Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 03), Washington DC, [18] M. Eirinaki and M. Vazirgiannis, Web mining for web personalization, ACM Trans. Internet Techn. 3 (2003), no

International Journal of Science, Engineering and Technology Research (IJSETR), Volume 4, Issue 4, April 2015

International Journal of Science, Engineering and Technology Research (IJSETR), Volume 4, Issue 4, April 2015 A new approach of web directory creation and personalization Asst. Prof. Madhavi S. Darokar Department of Computer Enginnering (JSPM s Imperial College of Engineering & Research, Pune.) Abstract: Now a