TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION

Similar documents
Part I: Data Mining Foundations

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT 5 LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS xxi

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

TABLE OF CONTENTS PAGE TITLE NO.

Enhanced Web Log Based Recommendation by Personalized Retrieval

TABLE OF CONTENTS CHAPTER TITLE PAGE NO NO.

LIST OF ACRONYMS & ABBREVIATIONS

DOTNET PROJECTS. DOTNET Projects. I. IEEE based IOT IEEE BASED CLOUD COMPUTING

INTRODUCTION Background of the Problem Statement of the Problem Objectives of the Study Significance of the Study...

TABLE OF CONTENTS CHAPTER TITLE PAGE

AY SECOND TERM Technology Education Revision Sheet

DATA MINING II - 1DL460. Spring 2017

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Chapter 27 Introduction to Information Retrieval and Web Search

DATA MINING - 1DL105, 1DL111

Summary of Contents LIST OF FIGURES LIST OF TABLES

Visualization and text mining of patent and non-patent data

An Approach To Web Content Mining

Business Intelligence Roadmap HDT923 Three Days

User Centric Web Page Recommender System Based on User Profile and Geo-Location

Data Clustering in C++

An Introduction to Search Engines and Web Navigation

"Charting the Course... MOC /2: Planning, Administering & Advanced Technologies of SharePoint Course Summary

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

DATA MINING II - 1DL460. Spring 2014"

Installing SharePoint Server 2007

Comparative Study of Web Structure Mining Techniques for Links and Image Search

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Corrective Action User Walkthrough: New Portal Login

Survey on Different Ranking Algorithms Along With Their Approaches

Semantic Clickstream Mining

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Contents. Topics. 01. WWW 02. WWW Documents 03. Web Service 04. Web Technologies. Management of Technology. C01-1. Documents

documentation Editing Files and Folders

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

1/6/ :28 AM Approved New Course (First Version) CS 50A Course Outline as of Fall 2014

Survey on Web Structure Mining

How to Guide. Create a Data Set. Version: Release 3.0

Hiking Gears Comparing System (HGCS) Using Web Scraping Technique

Information Retrieval

Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang. Microsoft Research, Asia School of EECS, Peking University

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Automated Online News Classification with Personalization

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

Contents. Preface to the Second Edition

COMP5331: Knowledge Discovery and Data Mining

IP Camera Installation Brief Manual

Search Engines Information Retrieval in Practice

BINUS INTERNATIONAL UNIVERSITAS BINA NUSANTARA. Major Computer Science Sarjana Komputer Thesis Semester [Even] year 2007

Ontology Based Search Engine

Preface...xi Coverage of this edition...xi Acknowledgements...xiii

Contents. Foreword to Second Edition. Acknowledgments About the Authors

II.1 Running a Crystal Report from Infoview

Last updated: May 10, Desktop Setup User Guide

Contents. Part I Setting the Scene

Data Mining with SPSS Modeler

"Charting the Course... MOC A Configuring and Deploying a Private Cloud with System Center Course Summary

An Improved Markov Model Approach to Predict Web Page Caching

Knowledge libraries and information space

Word Disambiguation in Web Search

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

JAVA Projects. 1. Enforcing Multitenancy for Cloud Computing Environments (IEEE 2012).

Remote Access Guide.

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA

Inline Processing Engine User Guide. Release: August 2017 E

CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A

UPnP Design by Example

Keywords Data alignment, Data annotation, Web database, Search Result Record

EFFICIENT ALGORITHM FOR MINING ON BIO MEDICAL DATA FOR RANKING THE WEB PAGES

"Charting the Course to Your Success!" MOC D Installing and Configuring Windows Server Course Summary

Mining Web Data. Lijun Zhang

The Content Editor UBC Department of Botany Website

BMEGUI Tutorial 1 Spatial kriging

A PRAGMATIC ALGORITHMIC APPROACH AND PROPOSAL FOR WEB MINING

A Study on Web Structure Mining

Comparative Study of Clustering Algorithms using R

INTRODUCTION. Chapter GENERAL

ONTOPARK: ONTOLOGY BASED PAGE RANKING FRAMEWORK USING RESOURCE DESCRIPTION FRAMEWORK

A SURVEY- WEB MINING TOOLS AND TECHNIQUE

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Introduction to PTC Windchill ProjectLink 11.0

Developing Focused Crawlers for Genre Specific Search Engines

TEXT MINING APPLICATION PROGRAMMING

SIMILARITY MEASURE USING LINK BASED APPROACH

Competitive Intelligence and Web Mining:

Social Networks: Service Selection and Recommendation

This course is designed for web developers that want to learn HTML5, CSS3, JavaScript and jquery.

Machine Learning in Action

Web Mining Team 11 Professor Anita Wasilewska CSE 634 : Data Mining Concepts and Techniques

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

International Journal of Advanced Research in Computer Science and Software Engineering

Web Crawling As Nonlinear Dynamics

BINUS INTERNATIONAL UNIVERSITAS BINA NUSANTARA. Computer Science Major. Multimedia Stream. Computer Science Thesis Bachelor

Updating Your Local Program Webpage

: Semantic Web (2013 Fall)

Weighted PageRank using the Rank Improvement

Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering Recommendation Algorithms

Information Retrieval. Lecture 11 - Link analysis

CITY UNIVERSITY OF NEW YORK. Creating a New Project in IRBNet. i. After logging in, click Create New Project on left side of the page.

Transcription:

vi TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION iii xii xiii xiv 1 INTRODUCTION 1 1.1 WEB MINING 2 1.1.1 Association Rules 2 1.1.2 Association Rule Mining 3 1.1.3 Clustering 3 1.1.4 Classification 4 1.2 WEB MINING CATEGORIES 5 1.2.1 Web Content Mining 5 1.2.2 Web Structure Mining 6 1.2.3 Web Usage Mining 6 1.3 INFORMATION RETRIEVAL ON THE WEB 7 1.3.1 Web Search Environments 7 1.3.1.1 Ontology Web 8 1.3.1.2 Semantic Web 8 1.3.2 Information Searching on the Web 9 1.3.3 Web Information Retrieval Approaches 10 1.4 INFORMATION FILTERING SYSTEM 11 1.4.1 Content-Based System 12 1.4.2 Collaborative Filtering 12

vii 1.4.3 Content-Boosted Collaborative Filtering 12 1.4.4 Combining Content-Based and Collaborative Filters 13 1.5 PAGE RANKING IN WEB SEARCH 13 1.5.1 Key Documents Ranking 14 1.5.2 Related Documents Ranking 14 1.6 WEB PERSONALIZATION 15 1.6.1 Recommendation System 15 1.6.2 Personalized Recommendation System 17 1.7 SEARCH ENGINES 17 1.7.1 Ontology Mining Search Engine 18 1.7.2 Crawler-Based Search Engines 19 1.7.3 Human-Powered Directories 19 1.7.4 Hybrid Search Engines 20 1.8 PROPOSED WORK 20 1.8.1 Problem Definition 20 1.8.2 Research Focus 20 1.9 THESIS CONTRIBUTIONS 21 1.9.1 Contribution in Web Page Classification 21 1.9.2 Contribution in Retrieval of Relevant Web Pages 21 1.9.3 Contribution to preparation of User Profile from Web Log File 22 1.9.4 Contribution to Personalizing the Web 22 1.10 THESIS ORGANIZATION 23

viii 2. LITERATURE SURVEY 24 2.1 WEB PAGE RETRIEVAL PROCESS 24 2.2 KNOWLEDGE ACQUISITION FOR WEB PERSONALIZATION 24 2.3 USER PROFILE ANALYSIS 26 2.3.1 Works on User Profile Analysis 26 2.3.2 User behaviour Analysis 27 2.3.3 Cluster Analysis 28 2.4 WEB PAGE ANALYSIS 29 2.4.1 Classification of Web Pages 29 2.4.2 Works on Classification 31 2.4.3 Fuzzy Classification 33 2.5 ASSOCIATION RULE MINING 34 2.5.1 Works on Association Rule Mining 36 2.5.2 Fuzzy Association Rule Mining 37 2.6 RELEVANT INFORMATION RETRIEVAL 38 2.6.1 Works on Relevant Information 38 2.6.2 Web Page Ranking Algorithms 41 2.6.2.1 Hyper Search Algorithm 42 2.6.2.2 Hyperlink-Induced Topic Search (HITS) 42 2.6.2.3 PageRank 42 2.6.2.4 Trust Rank 43 2.7 WORKS ON PAGERANK ALGORITHMS 43 2.7.1 Web Page Filtering Process 44 2.7.2 Content-based system 45

ix 2.7.3 Collaborative Filtering System 45 2.7.4 Hybrid Filtering 46 2.7.5 Works on Filtering Process 46 2.8 INTELLIGENT PERSONALIZED RECOMMENDATION 47 2.8.1 Personalized Web Search 47 2.8.2 Works on Personalization 48 2.8.3 Works on Recommendation 49 2.9 PROPOSED WORK 51 3. SYSTEM ARCHITECTURE 53 3.1 USER INTERFACE 54 3.2 SEARCH ENGINE INTERFACES 54 3.3 WEB PAGES 54 3.4 FUZZY ASSOCIATION RULE GENERATOR 55 3.5 CLASSIFIED WEB PAGES 55 3.6 KNOWLEDGE ACQUISITION SYSTEM 56 3.7 DOMAIN EXPERT INTERFACE 56 3.8 RULE MANAGER 57 3.9 RULE BASE 57 3.10 USER PROFILE 57 3.11 USER PROFILES ANALYSIS MODULE 57 3.11.1 Feature selection 58 3.11.2 Classification 58 3.11.3 Clustering 58 3.12 RELEVANT INFORMATION EXTRACTION MODULE 59 3.12.1 Filtering 59

x 3.12.2 Page Ranking 59 3.13 RELEVANT WEB PAGES 60 3.14 WEB PERSONALIZATION AND RECOMMENDATION MODULE. 60 3.14.1 Fuzzy Temporal Association Rule Mining 60 3.15 THESIS CONTRIBUTION 61 4. USER PROFILE ANALYSIS 63 4.1 DATA PREPROCESSING 64 4.1.1 Data Set 64 4.1.2 Data Discretization for Preprocessing 65 4.1.3 Classification on Anova-T data Selection 65 4.1.3.1 Algorithm Steps 67 4.1.3.2 Pseudo Code for Anova-T Classifier 67 4.1.4 Fuzzy-D Discretization 69 4.1.4.1 Algorithm Steps 69 4.2 USER PROFILE CLUSTERING 70 4.2.1 Results and Discussion 73 4.3 WEBPAGE ANALYSIS SUBSYSTEM 74 4.3.1 Algorithm 75 4.3.2 Proposed Algorithm for Fuzzy Association Rule Mining 76 4.3.3 Results and Discussion 79 4.4 RELEVANT INFORMATION EXTRACTION 80 4.4.1 Rule Schema 80 4.4.2 Proposed rule discovery algorithm 81

xi 4.4.3 Filtering 82 4.4.4 Proposed Algorithm 84 4.4.5 Page Ranking Module 85 4.4.6 Proposed Algorithm 86 4.4.7 Results and Discussion 87 5. WEB PERSONALIZATION AND RECOMMENDATION 89 5.1 FUZZY TEMPORAL ASSOCIATION RULE MINING 90 5.1.1 Proposed Algorithm 91 5.1.2 Proposed Fuzzy Temporal Association Rule Mining Algorithm 92 5.1.3 Pseudo Code for FTA Rule Mining 93 5.1.4 Result and Discussion 94 6. CONCLUSIONS AND FUTURE ENHANCEMENTS 96 6.1 CONCLUSIONS 96 6.1.1 Web Page Classification 96 6.1.2 Retrieval of Relevant Web Pages 97 6.1.3 User Profile Preparation and its Analysis 97 6.1.4 Personalizing the Web 98 6.2 FUTURE ENHANCEMENTS 99 REFERENCES 100 LIST OF PUBLICATION 113 VITAE 114

xii LIST OF TABLES TABLE NO. TITLE PAGE NO. 4.1 Anova-T Residue Classifications on User Data 68 4.2 Fuzzy-D Discretization - Reduced Classification Error Report 70 4.3 User s Profile Analysis 71 4.4 Cluster Analysis of User Profiles 73 4.5 Ontology based Collaborative Filter Analysis 85

xiii LIST OF FIGURES FIGURE NO. TITLE PAGE NO. 3.1 System Architecture 53 4.1 Architecture for User Profile Analysis 63 4.2 Anova T Classification Method 66 4.3 Cluster Structure 72 4.4 Performance of Cluster Analysis 73 4.5 Web Page Analysis Using Fuzzy Association Rule Mining 74 4.6 Comparison of Classification Accuracies using Association Rules 75 4.7 Classification Accuracy of Proposed Fuzzy Association Rule Mining Algorithm 79 4.8 Precision and Recall Analysis Graph 80 4.9 System Architecture for Relevant Web Page Retrieval 81 4.10 System Overview 82 4.11 Architecture of Ontology Based Collaborative Filter 83 4.12 Relationship between Precision and Recall 87 4.13 Web Document Retrieval Analysis with respect to Time 88 5.1 System Architecture of Web Personalization and Recommendation Module 89 5.2 Performance Analysis of Proposed Recommendation System 94 5.3 Relevancy Measurement 95

xiv LIST OF ABBREVIATIONS ANOVA - Analysis of Variances FTARM - Fuzzy Temporal Association Rule Mining HITS - Hyperlink Induced Topic Search HTML - Hyper Text Markup Language HTTP - Hyper Text Transfer Protocol LODAP - Log Data Processor MVE - Minimum Volume Ellipsoid MSN - Microsoft Network PEBL - Positive Example Based Learning SVM - Support Vector Machine URI - Uniform Resource Identifier URL - Uniform Resource Locator WCM - Web Content Mining WSM - Web Structure Mining WWW - World Wide Web XHTML - Extensible Hyper Text Markup Language