A Data Preprocessing Framework of Geoscience Data Sharing Portal for User Behavior Mining

Size: px
Start display at page:

Download "A Data Preprocessing Framework of Geoscience Data Sharing Portal for User Behavior Mining"

Transcription

1 A Data Preprocessing Framework of Geoscience Data Sharing Portal for User Behavior Mining Mo Wang,,2, Juanle Wang,,3' 1 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research Chinese Academy of Sciences, Beijing, China 2 College of Resources and Environment University Chinese Academy of Sciences, Beijing, China 3 Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China *Corresponding author, wangil@igsnrr.ac.cn Abstract-Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining. Keywords-geoscience data sharing; web usage mining; spatial data mining; data preprocessing I. INTRODUCTION Data has been seen as basic infrastructure of science. Data sharing has many advantages that boost scientific research and education. Data sharing in science community has a long history, usually in ad hoc ways [1]. However development of information technology and internet endows new conceptions and fashion of data sharing. Data sharing is nowadays conducted mainly through internet. Hence data sharing behaviors are becoming web usage behaviors of data sharing portals. Geodata.cn is a leading data sharing portal in earth system science in China. It has a large number of users and abundant data resources across Earth System Science disciplines [2]. Yet the user behaviors, which can be interpreted as data sharing behavior from end users perspective, are in state of lack of knowledge. User behavior mining of a web site pertains to the field of web usage mining, which is a subfield of Web mining. Web mining is a field that fulfills knowledge discovery from the Internet. More precisely, the field is often categorized as the following three topics, Web Content Mining, Web Structure Mining and Web Usage Mining [3]. Output of Web Usage Mining could be of great value in network structure optimization and website server configuration [4]. Moreover extracted user behavior can be further used in recommendation system and proactive service in context of data sharing. This study aims to set up a data preprocessing framework for user behavior mining of science data sharing portal in the context of web usage mining and spatial data mining. A. Background II. DATA AND METHOD The subject of this study is user behaviors of National Data Sharing Platform of Earth System Science (Geodata.cn). It is one of the National Science & Technology Infrastructure platforms dedicating in science data sharing. The objective of the platform is to provide data support and service for researches in Earth System Science and for pioneering innovation across relevant disciplines. Geodata.cn has been operating for nearly 10 years and has a representative position in China and even in the world in science data sharing domain. By the end of Aguste 2014, registered users of the platform reaches 91,944, total visits to the portal is 17 million [5]. These numbers left abundant data recorded by the website servers. Thereby user behaviors of the platform can be mined and analyzed with data mining methods. B. Data Data sources of Web Usage Ming are mainly from Web servers, besides data from proxy servers and Web clients can also be utilized if available. This study used web server log data since the other two were not available. A web server log of two months (July and August, 2014) were acquired for this study. The web server log was stored in Common Log Format. Fig. 1 is an example of a log entry, from which information of user's IP, visiting time, method, URL visited, status, referrer, and client details can be acquired. The log file of this study contains 1.69 million log entries in the form of that example. Tab. I lists the information retrieved from a log entry. Supported by National Technology Infrastructure -Data Sharing Platform of Earth System Science, Special Informational Infrastructure Program ofcas(xxhi2504-i-oi)

2 .I.2S [,5/A:Jg/2014.I.e 2E J CO "GE':' IPortal/Samp... epr-e\ le.."ld=ll I.l.C l -.I.3C39 H':'TP/:'.I. " 2)... l lc234 "h'.:.tp geoda'.:.a cn/porta./metadata/llstmeta data ) sp"category=.1. o3&oraer=order%20ny%2 Oglcnalld%20desc&&&pn=2" "r-loz-llal S C ( lr.do.. s NT 6.I., OWE4) Apple ebk.'.:./5r 36 (KH':'ML, llke Gecko) Chr-ome/32 C.I. O) 2 Safar-_/53-36",,_to Figure I. A log entry example TABLE T. COMPONENTS OF A LOG ENTRY IP Time 05/Aug/2014:10:26: Method GET URL lextralres/libs/kendo/extensionslkendo.extension.ui..is Protocol HTTP/l.l Status 200 File size(bvte) IS072 Referrer Client MozillalS.O (Windows NT 6.3; WOW64; rv:31.0) Gecko/ Firefoxl31.0 C. Method Data fusion and data cleaning For large scale websites, user information may come from multiple Web server or Application server. Data fusion is the process of merging logs from multiple Web servers or Application servers. Data cleaning aims to eliminate irrelevant and redundant records for the analysis, e.g. requests for graphical page content (.jpg,.png,.gif, and et.al.), style.css file, voice file, etc. [6]. In addition, requests from web crawlers (or robots) and error requests should also be removed from original log. Requests of graphical content, style file and error requests are easy to eliminate owing to that they can be identified from URL request field and statues field. However navigation patterns of robots and Web crawlers are sometimes hard to identify if robots use a fake user agent. In this case, developing a heuristic navigation behavior to imitate robots' behavior is often used in studies, for instance, in the work of Tan and Kumar [7]. The algorithm employed for data cleaning is described as: Input: Rawlog II source web server log file Output: Logbase Ilcleaned log database Begin LogEnlry= Read(Rawlog); If not (LogRecordRequesl. url.conlains(.gif, jpeg,jpg,. css, js) or LogRecordSlalus(> 299, <200) or LogRecordAgent=(Crawler,spider, robot)) Then write(logbase, LogRecord); End User Identification User Identification is the process of distinguishing. different users. If without authentication mechanism, the best source to identify user is through cookies. However, cookies from agent is often disabled by users and some websites do not use cookies. Another useful information is users' IP address. Yet the IP address alone is not sufficient to map log entries to unique users. This is due to proliferation of ISP proxy servers which assign rotating IP address to clients as they browse the Web [3]. If cookies is not available the agent and referrer in log entries can provide auxiliary information to identify users. A heuristic method was devised to achieve user identification: Step 1, assume a new IP address represents a new user. Step 2, for multiple log entries that share a same IP, if their Internet browser or Operating System is different it means they are different users. Step 3, for the users identified by the above two steps, if a URL request of a user cannot be linked to by any hyperlinks of the user's visited pages, a new user exists. Once individual users are identified, the geographical location can be determined by IP address. A GeoIP lookup service provided by ipinfo.io [9] was used to acquire geolocation of users. Session identification Session identification is the process of dividing each user's page access activities into sessions [10]. Each session represents a visit to the website. Websites without authorization mechanism or embedded session ID system have to rely on heuristic method to complete session identification [3]. The simplest, but often useful, method to achieve this is through a time window, where if the time between page requests exceeds a certain limit, it is assumed that a new session begins. A previous study in heuristic algorithms of session identification by Berendt and Mobasher [11] compared three heuristic method under frame-based and frame-free site structures. Results showed that Referrer-based heuristic algorithm (Hret) outperformed the other two in frame-free circumstance. With respect to that result the Href algorithm was adopted in this study. The core of the algorithm is described as following: presume p and q are two consecutive page requests, and p belongs to session S. Let tp and tq denote the timestamps of p and q, respectively. Then, q will belong to session S if the referrer of q was previously invoked within S, or if (tq-tp)!:::,., for a specific delay!:::" where the referrer is undefmed (" " in the log). Otherwise, a new session is constructed - that embodies q. For comparison purpose, Href algorithm and a time window based method were tested simultaneously. The time window based heuristic method (Htwin) [12] is described as following: Step 1, if a new user emerges, generate a new session. Step 2, within the sessions identified by step 1, if the referrer of a log entry is "_" it is assumed that a new session starts. Step 3, with the sessions identified by step 1 and step 2, if the time interval of a log entry and its previous one exceeds a threshold (30 min), a new session starts.

3 Data formatting and data modeling In this study, cleaned log were written into MySQL database, so were the user ID, user geolocation and session ID. Once these data is stored in database, a final data fonnatting model is demanded for specific data mining to be accomplished [13]. For example, the temporal information is not necessary for user cluster mining and association rules mining, the data fonnatting model will not read time stamp from the log entries. The final step of data preparation for data mining is to construct a proper mathematical data model. A geo-referenced data model based on traditional user-pageview matrix data model [8] is hereby proposed in the next section. A. Results Data cleaning III. RESULTS AND DISCUSSION The raw log file has 1,694,561 entries. 451,544 are left after data cleaning, nearly accounting for 114 of the whole. It can be concluded that most of the raw log data is redundant for user behavior mining. The script of data cleaning algorithm was written with Python. Raw log entry was read by the script and filtered by checking the status, client, URL and fmally exported to MySQL data base. A screenshot of resulted log table in the database is shown by Fig. 2. from major cities, for instance, Beijing, Shanghai, and Wuhan. This is in accordance with distribution of Universities and research institutes. Figure 3. JSON response of a GeolP request from ipinfo.io Session Identification With regard to session identification, results of Htwin method and Href method showed significant difference. Sessions identified by Htwin was 115,517, while that number of Href method was 56,211, only about a half of sessions identified by Htwin. This can be explained by that quite a large proportion of undefined referrer ("_") exist in the log, which lead to overestimated sessions with the Htwin method. Identified users and sessions are stored in a table with log entry ID, which is shown in Fig. 4. Overall, Tab. II shows the results of the preprocessing steps. Figure 2. A screenshot of log entry table in the database User Identification The GeoIP lookup service provides a JSON API, which can easily be built into a script with returning geolocation infonnation in JSON. Fig. 3 illustrates a JSON response of a GeoIP request. Due to dynamic IP allocated by network access providers, the returned longitude and latitude cannot precisely identify a user's location, but a region that the user located at sub-city scale. With the 451,544 valuable log entries, 14,549 users were identified, amongst which 13,786 geolocations were identified with the GeoIP lookup API. Fig. 5 is a map that depicts user distribution worldwide. Users are majorly from three regions, i.e. China, United States and Europe, and most of them are from China. To zoom in to China, users are mainly Figure 4. A screenshot of session table in the database TABLE 2. RESULTS OF PREPROCESSING

4 Figure 5. User distribution of July and August, 2014 Data formatting and data modeling As the last step, a refined data modeling is proposed as following_ Given a set of n pageviews after data cleaning, P = {Pv Pz,..., Pn} and a set of m transactions T = {tv tz,..., tn}, where t, in T is a subset of P. Each transaction can be denoted as an I-length sequence of order pairs. t = {(pi, w(pd), (p, w(pd),..., (pt, w(pf))} (1) where each pi = Pi for some j in {I, 2, "', n}, and wept) is the weight associated with pageview pi in transaction t, representing its significance. wept) can be a binary 1 or 0, representing existence or non-existence of a pageview, or can be the time spent on the page, depending on the data mining task. Given the transaction t above, a transaction vector tv can be defined as: where each W i =w( p J), for some j in {l, 2, "', n}, if Pj exists in transaction t. Otherwise, W = O. i Thus, the set of all user transactions can be modeled as an m x n user-pageview matrix, as an example shown with Fig. 6. (2) Sessions / users usero user1 user2 user3 user4 user5 user6 user7 users user9 (" Pageviews A B C D E F '\ Figure 6. An example of a user-pageview (transaction) matrix [8]. Here the weights for each pageview is the amount of time (e.g., in seconds) that a particular user spent on the pageview. Adding the users' geolocation, this model can be refined to be a three-dimensional matrix, by which each transaction can be viewed as t = (x, y, tv), where x and y represent its geographic coordinates (longitude and latitude), and tv denotes its transaction vector in the form of formula (2). Fig. 7 illustrates the data model structure. With such data model, should spatial analysis of user behavior, for example user similarity analysis considering both transaction vector distance and geographical distance, be feasible.

5 I - tv(transaction vector) I 50 o..})'l -50 -\J>-S l Q) '> 15 If. 10 Figure 7. An example of a georeferenced user transaction data model, blue line represents a transaction vector of a user located at 30oE, 45 N. 5 o [5] Data Sharing Platform of Earth System Science, "Operating report 2014, " [6] R. Cooley, B. Mobasher and J. Srivastava, "Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, " 1999, 1(1): p [7] P. N. Tan and V. Kumar, "Discovery of web robot sessions based on their navigational patterns, " Data Mining and Knowledge Discovery, 2002, 6(1): [8] B. Liu, Web Data Mining, 2nd ed., Berlin: Springer. 2011, pp [9] ipinfo.io, [10] Zhu, P. and M.-s. Zhao, "Session identification algorithm for web log mining, " International Conference on Management and Service Science. 20 I 0 [II] B. Berendt, B. Mobasherb and M. Nakagawa, 'The impact of site structure and user environment on session reconstruction in web usage analysis, " Knowledge Discovery and Data Mining, 2002, [12] L. C. Feng, "Study on crucial techniques of web usage mining, " Wuhan: Huazhong Univ. of Science and Technology, 2007, Chinese. [13] Tanasa, D. and B. Trousse, "Advanced data preprocessing for intersites web usage mining, " IEEE Expert I IEEE Intelligent Systems, (2): B. Discussion This study implemented data preprocessing procedures for user behavior mining of a geoscience data sharing portal. The aim of the study is to set up a data preprocessing framework and to yield ready-to-use data for further data mining task, e.g. user classification, association rules of users' data interest. Three preprocessing steps were conducted using heuristic methods, and users' geolocation were identified according to their IP address. These procedures are indispensable for mining user behaviors and their spatial attributes. With the two methods of session identification, Href is deemed to be more plausible with careful examining on the log entries. Final results of these procedures are written into a database along with log entry identifier. Depending on specific data mining task, cleaned log entry, user, session, and geolocation information can be read from the database and with proper data formatting and data modeling, data mining tasks thus can be achieved. Future work will focus on data mining based on users' interest in geoscience data by parsing URL requests within each session. ACKNOWLEDGMENT The authors would like to express appreciations for data support from Data Sharing Platform of Earth System Science, National Science & Technology Infrastructure of China. REFERENCES [I] C. Tenopir, S. Allard and K. Douglass, "Data sharing by scientists: practices and perceptions, " PLoS ONE, 2011, 6(6): p. e [2] Zhu, Y., Sun J. and Liao S., "Earth System Scientific Data Sharing Research and Practice: Earth system scientific data sharing research and practice, " Geo-information Science, 2010, 12(1): 1-8. [3] R. Kosala and H. Blockeel, "Web mining research: a survey, " Sigkdd Explorations, 2000, 2(1): p [4] J. Srivastava, R. Cooley and M. Deshpande, "Web usage mining: discovery and applications of usage patterns from web data, " Sigkdd Explorations, 2000, 1(2): p

User Session Identification Using Enhanced Href Method

User Session Identification Using Enhanced Href Method User Session Identification Using Enhanced Href Method Department of Computer Science, Constantine the Philosopher University in Nitra, Slovakia jkapusta@ukf.sk, psvec@ukf.sk, mmunk@ukf.sk, jskalka@ukf.sk

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Pattern Classification based on Web Usage Mining using Neural Network Technique

Pattern Classification based on Web Usage Mining using Neural Network Technique International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA

More information

CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES

CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

EFFECTIVELY USER PATTERN DISCOVER AND CLASSIFICATION FROM WEB LOG DATABASE

EFFECTIVELY USER PATTERN DISCOVER AND CLASSIFICATION FROM WEB LOG DATABASE EFFECTIVELY USER PATTERN DISCOVER AND CLASSIFICATION FROM WEB LOG DATABASE K. Abirami 1 and P. Mayilvaganan 2 1 School of Computing Sciences Vels University, Chennai, India 2 Department of MCA, School

More information

The influence of caching on web usage mining

The influence of caching on web usage mining The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,

More information

Survey Paper on Web Usage Mining for Web Personalization

Survey Paper on Web Usage Mining for Web Personalization ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University

More information

Improved Data Preparation Technique in Web Usage Mining

Improved Data Preparation Technique in Web Usage Mining International Journal of Computer Networks and Communications Security VOL.1, NO.7, DECEMBER 2013, 284 291 Available online at: www.ijcncs.org ISSN 2308-9830 C N C S Improved Data Preparation Technique

More information

Data Mining of Web Access Logs Using Classification Techniques

Data Mining of Web Access Logs Using Classification Techniques Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,

More information

Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining

Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.

More information

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data Wei Yang 1, Tinghua Ai 1, Wei Lu 1, Tong Zhang 2 1 School of Resource and Environment Sciences,

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information

Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns

Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns # Yogish H K #1 Dr. G T Raju *2 Department of Computer Science and Engineering Bharathiar University Coimbatore, 641046, Tamilnadu

More information

A Survey on Web Personalization of Web Usage Mining

A Survey on Web Personalization of Web Usage Mining A Survey on Web Personalization of Web Usage Mining S.Jagan 1, Dr.S.P.Rajagopalan 2 1 Assistant Professor, Department of CSE, T.J. Institute of Technology, Tamilnadu, India 2 Professor, Department of CSE,

More information

12 Web Usage Mining. With Bamshad Mobasher and Olfa Nasraoui

12 Web Usage Mining. With Bamshad Mobasher and Olfa Nasraoui 12 Web Usage Mining With Bamshad Mobasher and Olfa Nasraoui With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of clickstream, transaction

More information

Effectively Capturing User Navigation Paths in the Web Using Web Server Logs

Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Amithalal Caldera and Yogesh Deshpande School of Computing and Information Technology, College of Science Technology and Engineering,

More information

Using Petri Nets to Enhance Web Usage Mining 1

Using Petri Nets to Enhance Web Usage Mining 1 Using Petri Nets to Enhance Web Usage Mining 1 Shih-Yang Yang Department of Information Management Kang-Ning Junior College of Medical Care and Management Nei-Hu, 114, Taiwan Shihyang@knjc.edu.tw Po-Zung

More information

Web Log Data Cleaning For Enhancing Mining Process

Web Log Data Cleaning For Enhancing Mining Process Web Log Data Cleaning For Enhancing Mining Process V.CHITRAA*, Dr.ANTONY SELVADOSS THANAMANI** *(Assistant Professor, CMS College of Science and Commerce **(Reader in Computer Science, NGM College (AUTONOMOUS),

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical

More information

Remotely Sensed Image Processing Service Automatic Composition

Remotely Sensed Image Processing Service Automatic Composition Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University

More information

USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING

USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING R. Suguna Assistant Professor Department of Computer Science and Engineering Arunai College of Engineering Thiruvannamalai 606

More information

Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

I. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs

I. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Enhanced Pre-Processing Research Framework for Web Log Data

More information

Data Preparation for Web Mining A survey

Data Preparation for Web Mining A survey Data Preparation for Web Mining A survey Amog Rajenderan Department of Computer Science Rochester Institute of Technology Rochester, NY, USA Abstract An accepted trend is to categorize web mining into

More information

DISCOVERING USER IDENTIFICATION MINING TECHNIQUE FOR PREPROCESSED WEB LOG DATA

DISCOVERING USER IDENTIFICATION MINING TECHNIQUE FOR PREPROCESSED WEB LOG DATA DISCOVERING USER IDENTIFICATION MINING TECHNIQUE FOR PREPROCESSED WEB LOG DATA 1 ASHWIN G. RAIYANI, PROF. SHEETAL S. PANDYA 1, Department Of Computer Engineering, 1, RK. University, School of Engineering.

More information

Chapter 3 Process of Web Usage Mining

Chapter 3 Process of Web Usage Mining Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge

More information

CHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS

CHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS CHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS 48 3.1 Introduction The main aim of Web usage data processing is to extract the knowledge kept in the web log files of a Web server. By using

More information

Context-based Navigational Support in Hypermedia

Context-based Navigational Support in Hypermedia Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,

More information

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Research/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002

Research/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002 Journal) Volume1, Issue3, Nov-Dec, 2014.ISSN: 2349-7173(Online) International Journal of Advanced Research in Technology, Engineering and Science (A Bimonthly Open Access Online. Research/Review Paper:

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information

Knowledge Discovery from Web Usage Data: An Efficient Implementation of Web Log Preprocessing Techniques

Knowledge Discovery from Web Usage Data: An Efficient Implementation of Web Log Preprocessing Techniques Knowledge Discovery from Web Usage Data: An Efficient Implementation of Web Log Preprocessing Techniques Shivaprasad G. Manipal Institute of Technology, Manipal University, Manipal N.V. Subba Reddy Manipal

More information

Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology

Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 179 Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology G T Raju 1 and P S Satyanarayana

More information

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University

More information

ARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining

ARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining ARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining Yahya AlMurtadha, MD. Nasir Bin Sulaiman, Norwati Mustapha, Nur Izura Udzir and Zaiton Muda University Putra Malaysia,

More information

A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo

A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo

More information

Chapter 12: Web Usage Mining

Chapter 12: Web Usage Mining Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M. Spiliopoulou Introduction Web usage mining: automatic

More information

A Survey on Preprocessing Techniques in Web Usage Mining

A Survey on Preprocessing Techniques in Web Usage Mining COMP 630H A Survey on Preprocessing Techniques in Web Usage Mining Ke Yiping Student ID: 03997175 Email: keyiping@ust.hk Computer Science Department The Hong Kong University of Science and Technology Dec

More information

Farthest First Clustering in Links Reorganization

Farthest First Clustering in Links Reorganization Farthest First Clustering in Links Reorganization ABSTRACT Deepshree A. Vadeyar 1,Yogish H.K 2 1Department of Computer Science and Engineering, EWIT Bangalore 2Department of Computer Science and Engineering,

More information

Create a Profile for User Using Web Usage Mining

Create a Profile for User Using Web Usage Mining Journal of Academic and Applied Studies (Special Issue on Applied Sciences) Vol. 3(9) September 2013, pp. 1-12 Available online @ www.academians.org ISSN1925-931X Create a Profile for User Using Web Usage

More information

A Method for Representing Thematic Data in Three-dimensional GIS

A Method for Representing Thematic Data in Three-dimensional GIS A Method for Representing Thematic Data in Three-dimensional GIS Yingjie Hu, Jianping Wu, Zhenhua Lv, Haidong Zhong, Bailang Yu * Key Laboratory of Geographic Information Science, Ministry of Education

More information

Web Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web

Web Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web Web Usage Mining Overview Session 1 This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web 1 Outline 1. Introduction 2. Preprocessing 3. Analysis 2 Example

More information

A New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph

A New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph A New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph T. Vijaya Kumar, H. S. Guruprasad, Bharath Kumar K. M., Irfan Baig, and Kiran Babu S. Abstract To have

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

A Review Paper on Web Usage Mining and Pattern Discovery

A Review Paper on Web Usage Mining and Pattern Discovery A Review Paper on Web Usage Mining and Pattern Discovery 1 RACHIT ADHVARYU 1 Student M.E CSE, B. H. Gardi Vidyapith, Rajkot, Gujarat, India. ABSTRACT: - Web Technology is evolving very fast and Internet

More information

A SURVEY ON WEB LOG MINING AND PATTERN PREDICTION

A SURVEY ON WEB LOG MINING AND PATTERN PREDICTION A SURVEY ON WEB LOG MINING AND PATTERN PREDICTION Nisha Soni 1, Pushpendra Kumar Verma 2 1 M.Tech.Scholar, 2 Assistant Professor, Dept.of Computer Science & Engg. CSIT, Durg, (India) ABSTRACT Web sites

More information

Support System- Pioneering approach for Web Data Mining

Support System- Pioneering approach for Web Data Mining Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT

More information

THE STUDY OF WEB MINING - A SURVEY

THE STUDY OF WEB MINING - A SURVEY THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World

More information

Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv

Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Submitted on: 19.06.2015 Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Zhixiong Zhang National Science Library, Chinese Academy of Sciences, Beijing, China. E-mail address:

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Mubug: a mobile service for rapid bug tracking

Mubug: a mobile service for rapid bug tracking . MOO PAPER. SCIENCE CHINA Information Sciences January 2016, Vol. 59 013101:1 013101:5 doi: 10.1007/s11432-015-5506-4 Mubug: a mobile service for rapid bug tracking Yang FENG, Qin LIU *,MengyuDOU,JiaLIU&ZhenyuCHEN

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Web Mining Using Cloud Computing Technology

Web Mining Using Cloud Computing Technology International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER PROJECT

DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER PROJECT DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER BY Javid M. Alimohideen Meerasa M.S., University of Illinois at Chicago, 2003 PROJECT Submitted as partial fulfillment of the requirements for the degree

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,

More information

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

Open Access Research on the Prediction Model of Material Cost Based on Data Mining Send Orders for Reprints to reprints@benthamscience.ae 1062 The Open Mechanical Engineering Journal, 2015, 9, 1062-1066 Open Access Research on the Prediction Model of Material Cost Based on Data Mining

More information

Ontology Generation from Session Data for Web Personalization

Ontology Generation from Session Data for Web Personalization Int. J. of Advanced Networking and Application 241 Ontology Generation from Session Data for Web Personalization P.Arun Research Associate, Madurai Kamaraj University, Madurai 62 021, Tamil Nadu, India.

More information

VOL. 3, NO. 3, March 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

VOL. 3, NO. 3, March 2013 ISSN ARPN Journal of Science and Technology All rights reserved. An Effective Method to Preprocess the Data in Web Usage Mining 1 B.Uma Maheswari, 2 P.Sumathi 1 Doctoral student in Bharathiyar University, Coimbatore, Tamil Nadu, India 2 Asst. Professor, Govt. Arts College,

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over

More information

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila

More information

Metric and Identification of Spatial Objects Based on Data Fields

Metric and Identification of Spatial Objects Based on Data Fields Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 25-27, 2008, pp. 368-375 Metric and Identification

More information

Fabric Defect Detection Based on Computer Vision

Fabric Defect Detection Based on Computer Vision Fabric Defect Detection Based on Computer Vision Jing Sun and Zhiyu Zhou College of Information and Electronics, Zhejiang Sci-Tech University, Hangzhou, China {jings531,zhouzhiyu1993}@163.com Abstract.

More information

Mohri, Kurukshetra, India

Mohri, Kurukshetra, India Volume 4, Issue 8, August 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Revised Two

More information

Personalizing Web Directories with Community Discovery Algorithm

Personalizing Web Directories with Community Discovery Algorithm Personalizing Web Directories with Community Discovery Algorithm Sriram K.P ME Computer Science & Engineering, SMK Fomra Institute of Technology, Kelambakkam, Chennai-603103. leosri888@gmail.com Joel Robinson

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

Inferring User Search for Feedback Sessions

Inferring User Search for Feedback Sessions Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department

More information

Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3

Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 170 Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 1 M.Tech.

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM K.Dharmarajan 1, Dr.M.A.Dorairangaswamy 2 1 Scholar Research and Development Centre Bharathiar University

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Information Push Service of University Library in Network and Information Age

Information Push Service of University Library in Network and Information Age 2013 International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2013) Information Push Service of University Library in Network and Information Age Song Deng 1 and Jun Wang

More information

Efficiently Mining Positive Correlation Rules

Efficiently Mining Positive Correlation Rules Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,

More information

Comparison of UWAD Tool with Other Tools Used for Preprocessing

Comparison of UWAD Tool with Other Tools Used for Preprocessing Comparison of UWAD Tool with Other Tools Used for Preprocessing Nirali Honest Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charotar University of Science and Technology (CHARUSAT),

More information

AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT

AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT Brindha.S 1 and Sabarinathan.P 2 1 PG Scholar, Department of Computer Science and Engineering, PABCET, Trichy 2 Assistant Professor,

More information

Nitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION

Nitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION WEB PERSONALIZATION Mrs. M.Kiruthika 1, Nitin Cyriac 2, Aditya Mandhare 3, Soniya Nemade 4 DEPARTMENT OF COMPUTER ENGINEERING Fr. CONCEICAO RODRIGUES INSTITUTE OF TECHNOLOGY,VASHI Email- 1 venkatr20032002@gmail.com,

More information

Pre-Processing of Query Logs in Web Usage Mining

Pre-Processing of Query Logs in Web Usage Mining Industrial Engineering & Management Systems Vol 11, No 1, Mar 2012, pp.82-86 ISSN 1598-7248 EISSN 2234-6473 http://dx.doi.org/10.7232/iems.2012.11.1.082 2012 KIIE Pre-Processing of Query Logs in Web Usage

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

INSPIRE and SPIRES Log File Analysis

INSPIRE and SPIRES Log File Analysis INSPIRE and SPIRES Log File Analysis Cole Adams Science Undergraduate Laboratory Internship Program Wheaton College SLAC National Accelerator Laboratory August 5, 2011 Prepared in partial fulfillment of

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining

Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining The web log file gives a detailed account of who accessed the web site, what pages were requested, and in

More information

On the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring

On the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring On the Effectiveness of Web Usage Mining for Recommendation and Restructuring Hiroshi Ishikawa, Manabu Ohta, Shohei Yokoyama, Junya Nakayama, and Kaoru Katayama Tokyo Metropolitan University Abstract.

More information