A Data Preprocessing Framework of Geoscience Data Sharing Portal for User Behavior Mining
|
|
- Willis Daniels
- 5 years ago
- Views:
Transcription
1 A Data Preprocessing Framework of Geoscience Data Sharing Portal for User Behavior Mining Mo Wang,,2, Juanle Wang,,3' 1 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research Chinese Academy of Sciences, Beijing, China 2 College of Resources and Environment University Chinese Academy of Sciences, Beijing, China 3 Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China *Corresponding author, wangil@igsnrr.ac.cn Abstract-Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining. Keywords-geoscience data sharing; web usage mining; spatial data mining; data preprocessing I. INTRODUCTION Data has been seen as basic infrastructure of science. Data sharing has many advantages that boost scientific research and education. Data sharing in science community has a long history, usually in ad hoc ways [1]. However development of information technology and internet endows new conceptions and fashion of data sharing. Data sharing is nowadays conducted mainly through internet. Hence data sharing behaviors are becoming web usage behaviors of data sharing portals. Geodata.cn is a leading data sharing portal in earth system science in China. It has a large number of users and abundant data resources across Earth System Science disciplines [2]. Yet the user behaviors, which can be interpreted as data sharing behavior from end users perspective, are in state of lack of knowledge. User behavior mining of a web site pertains to the field of web usage mining, which is a subfield of Web mining. Web mining is a field that fulfills knowledge discovery from the Internet. More precisely, the field is often categorized as the following three topics, Web Content Mining, Web Structure Mining and Web Usage Mining [3]. Output of Web Usage Mining could be of great value in network structure optimization and website server configuration [4]. Moreover extracted user behavior can be further used in recommendation system and proactive service in context of data sharing. This study aims to set up a data preprocessing framework for user behavior mining of science data sharing portal in the context of web usage mining and spatial data mining. A. Background II. DATA AND METHOD The subject of this study is user behaviors of National Data Sharing Platform of Earth System Science (Geodata.cn). It is one of the National Science & Technology Infrastructure platforms dedicating in science data sharing. The objective of the platform is to provide data support and service for researches in Earth System Science and for pioneering innovation across relevant disciplines. Geodata.cn has been operating for nearly 10 years and has a representative position in China and even in the world in science data sharing domain. By the end of Aguste 2014, registered users of the platform reaches 91,944, total visits to the portal is 17 million [5]. These numbers left abundant data recorded by the website servers. Thereby user behaviors of the platform can be mined and analyzed with data mining methods. B. Data Data sources of Web Usage Ming are mainly from Web servers, besides data from proxy servers and Web clients can also be utilized if available. This study used web server log data since the other two were not available. A web server log of two months (July and August, 2014) were acquired for this study. The web server log was stored in Common Log Format. Fig. 1 is an example of a log entry, from which information of user's IP, visiting time, method, URL visited, status, referrer, and client details can be acquired. The log file of this study contains 1.69 million log entries in the form of that example. Tab. I lists the information retrieved from a log entry. Supported by National Technology Infrastructure -Data Sharing Platform of Earth System Science, Special Informational Infrastructure Program ofcas(xxhi2504-i-oi)
2 .I.2S [,5/A:Jg/2014.I.e 2E J CO "GE':' IPortal/Samp... epr-e\ le.."ld=ll I.l.C l -.I.3C39 H':'TP/:'.I. " 2)... l lc234 "h'.:.tp geoda'.:.a cn/porta./metadata/llstmeta data ) sp"category=.1. o3&oraer=order%20ny%2 Oglcnalld%20desc&&&pn=2" "r-loz-llal S C ( lr.do.. s NT 6.I., OWE4) Apple ebk.'.:./5r 36 (KH':'ML, llke Gecko) Chr-ome/32 C.I. O) 2 Safar-_/53-36",,_to Figure I. A log entry example TABLE T. COMPONENTS OF A LOG ENTRY IP Time 05/Aug/2014:10:26: Method GET URL lextralres/libs/kendo/extensionslkendo.extension.ui..is Protocol HTTP/l.l Status 200 File size(bvte) IS072 Referrer Client MozillalS.O (Windows NT 6.3; WOW64; rv:31.0) Gecko/ Firefoxl31.0 C. Method Data fusion and data cleaning For large scale websites, user information may come from multiple Web server or Application server. Data fusion is the process of merging logs from multiple Web servers or Application servers. Data cleaning aims to eliminate irrelevant and redundant records for the analysis, e.g. requests for graphical page content (.jpg,.png,.gif, and et.al.), style.css file, voice file, etc. [6]. In addition, requests from web crawlers (or robots) and error requests should also be removed from original log. Requests of graphical content, style file and error requests are easy to eliminate owing to that they can be identified from URL request field and statues field. However navigation patterns of robots and Web crawlers are sometimes hard to identify if robots use a fake user agent. In this case, developing a heuristic navigation behavior to imitate robots' behavior is often used in studies, for instance, in the work of Tan and Kumar [7]. The algorithm employed for data cleaning is described as: Input: Rawlog II source web server log file Output: Logbase Ilcleaned log database Begin LogEnlry= Read(Rawlog); If not (LogRecordRequesl. url.conlains(.gif, jpeg,jpg,. css, js) or LogRecordSlalus(> 299, <200) or LogRecordAgent=(Crawler,spider, robot)) Then write(logbase, LogRecord); End User Identification User Identification is the process of distinguishing. different users. If without authentication mechanism, the best source to identify user is through cookies. However, cookies from agent is often disabled by users and some websites do not use cookies. Another useful information is users' IP address. Yet the IP address alone is not sufficient to map log entries to unique users. This is due to proliferation of ISP proxy servers which assign rotating IP address to clients as they browse the Web [3]. If cookies is not available the agent and referrer in log entries can provide auxiliary information to identify users. A heuristic method was devised to achieve user identification: Step 1, assume a new IP address represents a new user. Step 2, for multiple log entries that share a same IP, if their Internet browser or Operating System is different it means they are different users. Step 3, for the users identified by the above two steps, if a URL request of a user cannot be linked to by any hyperlinks of the user's visited pages, a new user exists. Once individual users are identified, the geographical location can be determined by IP address. A GeoIP lookup service provided by ipinfo.io [9] was used to acquire geolocation of users. Session identification Session identification is the process of dividing each user's page access activities into sessions [10]. Each session represents a visit to the website. Websites without authorization mechanism or embedded session ID system have to rely on heuristic method to complete session identification [3]. The simplest, but often useful, method to achieve this is through a time window, where if the time between page requests exceeds a certain limit, it is assumed that a new session begins. A previous study in heuristic algorithms of session identification by Berendt and Mobasher [11] compared three heuristic method under frame-based and frame-free site structures. Results showed that Referrer-based heuristic algorithm (Hret) outperformed the other two in frame-free circumstance. With respect to that result the Href algorithm was adopted in this study. The core of the algorithm is described as following: presume p and q are two consecutive page requests, and p belongs to session S. Let tp and tq denote the timestamps of p and q, respectively. Then, q will belong to session S if the referrer of q was previously invoked within S, or if (tq-tp)!:::,., for a specific delay!:::" where the referrer is undefmed (" " in the log). Otherwise, a new session is constructed - that embodies q. For comparison purpose, Href algorithm and a time window based method were tested simultaneously. The time window based heuristic method (Htwin) [12] is described as following: Step 1, if a new user emerges, generate a new session. Step 2, within the sessions identified by step 1, if the referrer of a log entry is "_" it is assumed that a new session starts. Step 3, with the sessions identified by step 1 and step 2, if the time interval of a log entry and its previous one exceeds a threshold (30 min), a new session starts.
3 Data formatting and data modeling In this study, cleaned log were written into MySQL database, so were the user ID, user geolocation and session ID. Once these data is stored in database, a final data fonnatting model is demanded for specific data mining to be accomplished [13]. For example, the temporal information is not necessary for user cluster mining and association rules mining, the data fonnatting model will not read time stamp from the log entries. The final step of data preparation for data mining is to construct a proper mathematical data model. A geo-referenced data model based on traditional user-pageview matrix data model [8] is hereby proposed in the next section. A. Results Data cleaning III. RESULTS AND DISCUSSION The raw log file has 1,694,561 entries. 451,544 are left after data cleaning, nearly accounting for 114 of the whole. It can be concluded that most of the raw log data is redundant for user behavior mining. The script of data cleaning algorithm was written with Python. Raw log entry was read by the script and filtered by checking the status, client, URL and fmally exported to MySQL data base. A screenshot of resulted log table in the database is shown by Fig. 2. from major cities, for instance, Beijing, Shanghai, and Wuhan. This is in accordance with distribution of Universities and research institutes. Figure 3. JSON response of a GeolP request from ipinfo.io Session Identification With regard to session identification, results of Htwin method and Href method showed significant difference. Sessions identified by Htwin was 115,517, while that number of Href method was 56,211, only about a half of sessions identified by Htwin. This can be explained by that quite a large proportion of undefined referrer ("_") exist in the log, which lead to overestimated sessions with the Htwin method. Identified users and sessions are stored in a table with log entry ID, which is shown in Fig. 4. Overall, Tab. II shows the results of the preprocessing steps. Figure 2. A screenshot of log entry table in the database User Identification The GeoIP lookup service provides a JSON API, which can easily be built into a script with returning geolocation infonnation in JSON. Fig. 3 illustrates a JSON response of a GeoIP request. Due to dynamic IP allocated by network access providers, the returned longitude and latitude cannot precisely identify a user's location, but a region that the user located at sub-city scale. With the 451,544 valuable log entries, 14,549 users were identified, amongst which 13,786 geolocations were identified with the GeoIP lookup API. Fig. 5 is a map that depicts user distribution worldwide. Users are majorly from three regions, i.e. China, United States and Europe, and most of them are from China. To zoom in to China, users are mainly Figure 4. A screenshot of session table in the database TABLE 2. RESULTS OF PREPROCESSING
4 Figure 5. User distribution of July and August, 2014 Data formatting and data modeling As the last step, a refined data modeling is proposed as following_ Given a set of n pageviews after data cleaning, P = {Pv Pz,..., Pn} and a set of m transactions T = {tv tz,..., tn}, where t, in T is a subset of P. Each transaction can be denoted as an I-length sequence of order pairs. t = {(pi, w(pd), (p, w(pd),..., (pt, w(pf))} (1) where each pi = Pi for some j in {I, 2, "', n}, and wept) is the weight associated with pageview pi in transaction t, representing its significance. wept) can be a binary 1 or 0, representing existence or non-existence of a pageview, or can be the time spent on the page, depending on the data mining task. Given the transaction t above, a transaction vector tv can be defined as: where each W i =w( p J), for some j in {l, 2, "', n}, if Pj exists in transaction t. Otherwise, W = O. i Thus, the set of all user transactions can be modeled as an m x n user-pageview matrix, as an example shown with Fig. 6. (2) Sessions / users usero user1 user2 user3 user4 user5 user6 user7 users user9 (" Pageviews A B C D E F '\ Figure 6. An example of a user-pageview (transaction) matrix [8]. Here the weights for each pageview is the amount of time (e.g., in seconds) that a particular user spent on the pageview. Adding the users' geolocation, this model can be refined to be a three-dimensional matrix, by which each transaction can be viewed as t = (x, y, tv), where x and y represent its geographic coordinates (longitude and latitude), and tv denotes its transaction vector in the form of formula (2). Fig. 7 illustrates the data model structure. With such data model, should spatial analysis of user behavior, for example user similarity analysis considering both transaction vector distance and geographical distance, be feasible.
5 I - tv(transaction vector) I 50 o..})'l -50 -\J>-S l Q) '> 15 If. 10 Figure 7. An example of a georeferenced user transaction data model, blue line represents a transaction vector of a user located at 30oE, 45 N. 5 o [5] Data Sharing Platform of Earth System Science, "Operating report 2014, " [6] R. Cooley, B. Mobasher and J. Srivastava, "Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, " 1999, 1(1): p [7] P. N. Tan and V. Kumar, "Discovery of web robot sessions based on their navigational patterns, " Data Mining and Knowledge Discovery, 2002, 6(1): [8] B. Liu, Web Data Mining, 2nd ed., Berlin: Springer. 2011, pp [9] ipinfo.io, [10] Zhu, P. and M.-s. Zhao, "Session identification algorithm for web log mining, " International Conference on Management and Service Science. 20 I 0 [II] B. Berendt, B. Mobasherb and M. Nakagawa, 'The impact of site structure and user environment on session reconstruction in web usage analysis, " Knowledge Discovery and Data Mining, 2002, [12] L. C. Feng, "Study on crucial techniques of web usage mining, " Wuhan: Huazhong Univ. of Science and Technology, 2007, Chinese. [13] Tanasa, D. and B. Trousse, "Advanced data preprocessing for intersites web usage mining, " IEEE Expert I IEEE Intelligent Systems, (2): B. Discussion This study implemented data preprocessing procedures for user behavior mining of a geoscience data sharing portal. The aim of the study is to set up a data preprocessing framework and to yield ready-to-use data for further data mining task, e.g. user classification, association rules of users' data interest. Three preprocessing steps were conducted using heuristic methods, and users' geolocation were identified according to their IP address. These procedures are indispensable for mining user behaviors and their spatial attributes. With the two methods of session identification, Href is deemed to be more plausible with careful examining on the log entries. Final results of these procedures are written into a database along with log entry identifier. Depending on specific data mining task, cleaned log entry, user, session, and geolocation information can be read from the database and with proper data formatting and data modeling, data mining tasks thus can be achieved. Future work will focus on data mining based on users' interest in geoscience data by parsing URL requests within each session. ACKNOWLEDGMENT The authors would like to express appreciations for data support from Data Sharing Platform of Earth System Science, National Science & Technology Infrastructure of China. REFERENCES [I] C. Tenopir, S. Allard and K. Douglass, "Data sharing by scientists: practices and perceptions, " PLoS ONE, 2011, 6(6): p. e [2] Zhu, Y., Sun J. and Liao S., "Earth System Scientific Data Sharing Research and Practice: Earth system scientific data sharing research and practice, " Geo-information Science, 2010, 12(1): 1-8. [3] R. Kosala and H. Blockeel, "Web mining research: a survey, " Sigkdd Explorations, 2000, 2(1): p [4] J. Srivastava, R. Cooley and M. Deshpande, "Web usage mining: discovery and applications of usage patterns from web data, " Sigkdd Explorations, 2000, 1(2): p
User Session Identification Using Enhanced Href Method
User Session Identification Using Enhanced Href Method Department of Computer Science, Constantine the Philosopher University in Nitra, Slovakia jkapusta@ukf.sk, psvec@ukf.sk, mmunk@ukf.sk, jskalka@ukf.sk
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationEFFECTIVELY USER PATTERN DISCOVER AND CLASSIFICATION FROM WEB LOG DATABASE
EFFECTIVELY USER PATTERN DISCOVER AND CLASSIFICATION FROM WEB LOG DATABASE K. Abirami 1 and P. Mayilvaganan 2 1 School of Computing Sciences Vels University, Chennai, India 2 Department of MCA, School
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationSurvey Paper on Web Usage Mining for Web Personalization
ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University
More informationImproved Data Preparation Technique in Web Usage Mining
International Journal of Computer Networks and Communications Security VOL.1, NO.7, DECEMBER 2013, 284 291 Available online at: www.ijcncs.org ISSN 2308-9830 C N C S Improved Data Preparation Technique
More informationData Mining of Web Access Logs Using Classification Techniques
Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,
More informationBehaviour Recovery and Complicated Pattern Definition in Web Usage Mining
Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.
More informationA Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data
A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data Wei Yang 1, Tinghua Ai 1, Wei Lu 1, Tong Zhang 2 1 School of Resource and Environment Sciences,
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationPre-processing of Web Logs for Mining World Wide Web Browsing Patterns
Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns # Yogish H K #1 Dr. G T Raju *2 Department of Computer Science and Engineering Bharathiar University Coimbatore, 641046, Tamilnadu
More informationA Survey on Web Personalization of Web Usage Mining
A Survey on Web Personalization of Web Usage Mining S.Jagan 1, Dr.S.P.Rajagopalan 2 1 Assistant Professor, Department of CSE, T.J. Institute of Technology, Tamilnadu, India 2 Professor, Department of CSE,
More information12 Web Usage Mining. With Bamshad Mobasher and Olfa Nasraoui
12 Web Usage Mining With Bamshad Mobasher and Olfa Nasraoui With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of clickstream, transaction
More informationEffectively Capturing User Navigation Paths in the Web Using Web Server Logs
Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Amithalal Caldera and Yogesh Deshpande School of Computing and Information Technology, College of Science Technology and Engineering,
More informationUsing Petri Nets to Enhance Web Usage Mining 1
Using Petri Nets to Enhance Web Usage Mining 1 Shih-Yang Yang Department of Information Management Kang-Ning Junior College of Medical Care and Management Nei-Hu, 114, Taiwan Shihyang@knjc.edu.tw Po-Zung
More informationWeb Log Data Cleaning For Enhancing Mining Process
Web Log Data Cleaning For Enhancing Mining Process V.CHITRAA*, Dr.ANTONY SELVADOSS THANAMANI** *(Assistant Professor, CMS College of Science and Commerce **(Reader in Computer Science, NGM College (AUTONOMOUS),
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationLog Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal
Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical
More informationRemotely Sensed Image Processing Service Automatic Composition
Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
More informationUSER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING
USER INTEREST LEVEL BASED PREPROCESSING ALGORITHMS USING WEB USAGE MINING R. Suguna Assistant Professor Department of Computer Science and Engineering Arunai College of Engineering Thiruvannamalai 606
More informationWeb Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher
Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationI. Introduction II. Keywords- Pre-processing, Cleaning, Null Values, Webmining, logs
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Enhanced Pre-Processing Research Framework for Web Log Data
More informationData Preparation for Web Mining A survey
Data Preparation for Web Mining A survey Amog Rajenderan Department of Computer Science Rochester Institute of Technology Rochester, NY, USA Abstract An accepted trend is to categorize web mining into
More informationDISCOVERING USER IDENTIFICATION MINING TECHNIQUE FOR PREPROCESSED WEB LOG DATA
DISCOVERING USER IDENTIFICATION MINING TECHNIQUE FOR PREPROCESSED WEB LOG DATA 1 ASHWIN G. RAIYANI, PROF. SHEETAL S. PANDYA 1, Department Of Computer Engineering, 1, RK. University, School of Engineering.
More informationChapter 3 Process of Web Usage Mining
Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge
More informationCHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS
CHAPTER - 3 PREPROCESSING OF WEB USAGE DATA FOR LOG ANALYSIS 48 3.1 Introduction The main aim of Web usage data processing is to extract the knowledge kept in the web log files of a Web server. By using
More informationContext-based Navigational Support in Hypermedia
Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationResearch/Review Paper: Web Personalization Using Usage Based Clustering Author: Madhavi M.Mali,Sonal S.Jogdand, Deepali P. Shinde Paper ID: V1-I3-002
Journal) Volume1, Issue3, Nov-Dec, 2014.ISSN: 2349-7173(Online) International Journal of Advanced Research in Technology, Engineering and Science (A Bimonthly Open Access Online. Research/Review Paper:
More informationMURDOCH RESEARCH REPOSITORY
MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout
More informationKnowledge Discovery from Web Usage Data: An Efficient Implementation of Web Log Preprocessing Techniques
Knowledge Discovery from Web Usage Data: An Efficient Implementation of Web Log Preprocessing Techniques Shivaprasad G. Manipal Institute of Technology, Manipal University, Manipal N.V. Subba Reddy Manipal
More informationKnowledge Discovery from Web Usage Data: Complete Preprocessing Methodology
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 179 Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology G T Raju 1 and P S Satyanarayana
More informationDesign of Distributed Data Mining Applications on the KNOWLEDGE GRID
Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University
More informationARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining
ARS: Web Page Recommendation System for Anonymous Users Based On Web Usage Mining Yahya AlMurtadha, MD. Nasir Bin Sulaiman, Norwati Mustapha, Nur Izura Udzir and Zaiton Muda University Putra Malaysia,
More informationA Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo
A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo
More informationChapter 12: Web Usage Mining
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M. Spiliopoulou Introduction Web usage mining: automatic
More informationA Survey on Preprocessing Techniques in Web Usage Mining
COMP 630H A Survey on Preprocessing Techniques in Web Usage Mining Ke Yiping Student ID: 03997175 Email: keyiping@ust.hk Computer Science Department The Hong Kong University of Science and Technology Dec
More informationFarthest First Clustering in Links Reorganization
Farthest First Clustering in Links Reorganization ABSTRACT Deepshree A. Vadeyar 1,Yogish H.K 2 1Department of Computer Science and Engineering, EWIT Bangalore 2Department of Computer Science and Engineering,
More informationCreate a Profile for User Using Web Usage Mining
Journal of Academic and Applied Studies (Special Issue on Applied Sciences) Vol. 3(9) September 2013, pp. 1-12 Available online @ www.academians.org ISSN1925-931X Create a Profile for User Using Web Usage
More informationA Method for Representing Thematic Data in Three-dimensional GIS
A Method for Representing Thematic Data in Three-dimensional GIS Yingjie Hu, Jianping Wu, Zhenhua Lv, Haidong Zhong, Bailang Yu * Key Laboratory of Geographic Information Science, Ministry of Education
More informationWeb Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web
Web Usage Mining Overview Session 1 This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web 1 Outline 1. Introduction 2. Preprocessing 3. Analysis 2 Example
More informationA New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph
A New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph T. Vijaya Kumar, H. S. Guruprasad, Bharath Kumar K. M., Irfan Baig, and Kiran Babu S. Abstract To have
More informationEFFICIENT ATTRIBUTE REDUCTION ALGORITHM
EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms
More informationPrivacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationA Review Paper on Web Usage Mining and Pattern Discovery
A Review Paper on Web Usage Mining and Pattern Discovery 1 RACHIT ADHVARYU 1 Student M.E CSE, B. H. Gardi Vidyapith, Rajkot, Gujarat, India. ABSTRACT: - Web Technology is evolving very fast and Internet
More informationA SURVEY ON WEB LOG MINING AND PATTERN PREDICTION
A SURVEY ON WEB LOG MINING AND PATTERN PREDICTION Nisha Soni 1, Pushpendra Kumar Verma 2 1 M.Tech.Scholar, 2 Assistant Professor, Dept.of Computer Science & Engg. CSIT, Durg, (India) ABSTRACT Web sites
More informationSupport System- Pioneering approach for Web Data Mining
Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationDeveloping ArXivSI to Help Scientists to Explore the Research Papers in ArXiv
Submitted on: 19.06.2015 Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Zhixiong Zhang National Science Library, Chinese Academy of Sciences, Beijing, China. E-mail address:
More informationAutomatic New Topic Identification in Search Engine Transaction Log Using Goal Programming
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationMubug: a mobile service for rapid bug tracking
. MOO PAPER. SCIENCE CHINA Information Sciences January 2016, Vol. 59 013101:1 013101:5 doi: 10.1007/s11432-015-5506-4 Mubug: a mobile service for rapid bug tracking Yang FENG, Qin LIU *,MengyuDOU,JiaLIU&ZhenyuCHEN
More informationWEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE
WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationWeb Mining Using Cloud Computing Technology
International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationIntelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD
World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationVisoLink: A User-Centric Social Relationship Mining
VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.
More informationDESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER PROJECT
DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER BY Javid M. Alimohideen Meerasa M.S., University of Illinois at Chicago, 2003 PROJECT Submitted as partial fulfillment of the requirements for the degree
More informationIJMIE Volume 2, Issue 9 ISSN:
WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information
More informationAn Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs
An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,
More informationOpen Access Research on the Prediction Model of Material Cost Based on Data Mining
Send Orders for Reprints to reprints@benthamscience.ae 1062 The Open Mechanical Engineering Journal, 2015, 9, 1062-1066 Open Access Research on the Prediction Model of Material Cost Based on Data Mining
More informationOntology Generation from Session Data for Web Personalization
Int. J. of Advanced Networking and Application 241 Ontology Generation from Session Data for Web Personalization P.Arun Research Associate, Madurai Kamaraj University, Madurai 62 021, Tamil Nadu, India.
More informationVOL. 3, NO. 3, March 2013 ISSN ARPN Journal of Science and Technology All rights reserved.
An Effective Method to Preprocess the Data in Web Usage Mining 1 B.Uma Maheswari, 2 P.Sumathi 1 Doctoral student in Bharathiyar University, Coimbatore, Tamil Nadu, India 2 Asst. Professor, Govt. Arts College,
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationA B2B Search Engine. Abstract. Motivation. Challenges. Technical Report
Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over
More informationANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining
ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila
More informationMetric and Identification of Spatial Objects Based on Data Fields
Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 25-27, 2008, pp. 368-375 Metric and Identification
More informationFabric Defect Detection Based on Computer Vision
Fabric Defect Detection Based on Computer Vision Jing Sun and Zhiyu Zhou College of Information and Electronics, Zhejiang Sci-Tech University, Hangzhou, China {jings531,zhouzhiyu1993}@163.com Abstract.
More informationMohri, Kurukshetra, India
Volume 4, Issue 8, August 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Revised Two
More informationPersonalizing Web Directories with Community Discovery Algorithm
Personalizing Web Directories with Community Discovery Algorithm Sriram K.P ME Computer Science & Engineering, SMK Fomra Institute of Technology, Kelambakkam, Chennai-603103. leosri888@gmail.com Joel Robinson
More informationSemantic Web Mining and its application in Human Resource Management
International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationData warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3
International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 170 Data warehousing and Phases used in Internet Mining Jitender Ahlawat 1, Joni Birla 2, Mohit Yadav 3 1 M.Tech.
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationWEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM
WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM K.Dharmarajan 1, Dr.M.A.Dorairangaswamy 2 1 Scholar Research and Development Centre Bharathiar University
More informationRiMOM Results for OAEI 2009
RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn
More informationInformation Push Service of University Library in Network and Information Age
2013 International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2013) Information Push Service of University Library in Network and Information Age Song Deng 1 and Jun Wang
More informationEfficiently Mining Positive Correlation Rules
Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,
More informationComparison of UWAD Tool with Other Tools Used for Preprocessing
Comparison of UWAD Tool with Other Tools Used for Preprocessing Nirali Honest Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charotar University of Science and Technology (CHARUSAT),
More informationAN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT
AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT Brindha.S 1 and Sabarinathan.P 2 1 PG Scholar, Department of Computer Science and Engineering, PABCET, Trichy 2 Assistant Professor,
More informationNitin Cyriac et al, Int.J.Computer Technology & Applications,Vol 5 (1), WEB PERSONALIZATION
WEB PERSONALIZATION Mrs. M.Kiruthika 1, Nitin Cyriac 2, Aditya Mandhare 3, Soniya Nemade 4 DEPARTMENT OF COMPUTER ENGINEERING Fr. CONCEICAO RODRIGUES INSTITUTE OF TECHNOLOGY,VASHI Email- 1 venkatr20032002@gmail.com,
More informationPre-Processing of Query Logs in Web Usage Mining
Industrial Engineering & Management Systems Vol 11, No 1, Mar 2012, pp.82-86 ISSN 1598-7248 EISSN 2234-6473 http://dx.doi.org/10.7232/iems.2012.11.1.082 2012 KIIE Pre-Processing of Query Logs in Web Usage
More informationLetter Pair Similarity Classification and URL Ranking Based on Feedback Approach
Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India
More informationINSPIRE and SPIRES Log File Analysis
INSPIRE and SPIRES Log File Analysis Cole Adams Science Undergraduate Laboratory Internship Program Wheaton College SLAC National Accelerator Laboratory August 5, 2011 Prepared in partial fulfillment of
More informationCLASSIFICATION FOR SCALING METHODS IN DATA MINING
CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department
More informationTheme Identification in RDF Graphs
Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published
More informationIdentification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining
Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining The web log file gives a detailed account of who accessed the web site, what pages were requested, and in
More informationOn the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring
On the Effectiveness of Web Usage Mining for Recommendation and Restructuring Hiroshi Ishikawa, Manabu Ohta, Shohei Yokoyama, Junya Nakayama, and Kaoru Katayama Tokyo Metropolitan University Abstract.
More information