Introducing Dynamic Ranking on Web-Pages Based on Multiple Ontology Supported Domains

Similar documents
A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies

Web-page Indexing based on the Prioritize Ontology Terms

Web-Page Indexing Based on the Prioritized Ontology Terms

Multi-Agent System for Search Engine based Web Server: A Conceptual Framework

A New Semantic Web Approach for Constructing, Searching and Modifying Ontology Dynamically

A Hierarchical Web Page Crawler for Crawling the Internet Faster

Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine

A Novel Architecture of Ontology based Semantic Search Engine

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler

FILTERING OF URLS USING WEBCRAWLER

A New Model of Search Engine based on Cloud Computing

A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

THE DEGREE OF POLYNOMIAL CURVES WITH A FRACTAL GEOMETRIC VIEW

A Novel Interface to a Web Crawler using VB.NET Technology

CHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER

A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus

THE WEB SEARCH ENGINE

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

How to Evaluate the Effectiveness of URL Normalizations

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

Mining Local Association Rules from Temporal Data Set

Notes on Binary Dumbbell Trees

A Proxy-Based Dynamic Inheritance of Soft-Device

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

A Chosen-Plaintext Linear Attack on DES

MRF Based LSB Steganalysis: A New Measure of Steganography Capacity

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Sequences Modeling and Analysis Based on Complex Network

TSS: A Hybrid Web Searches

Intermediate Code Generation

A Novel Architecture of Ontology-based Semantic Web Crawler

Replication in Mirrored Disk Systems

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

Oleksandr Kuzomin, Bohdan Tkachenko

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Effective Performance of Information Retrieval by using Domain Based Crawler

New Concept based Indexing Technique for Search Engine

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

Sentiment Analysis for Customer Review Sites

Semantic Web Search Model for Information Retrieval of the Semantic Data *

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch

A Heuristic Based AGE Algorithm For Search Engine

Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites *

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

A genetic algorithm based focused Web crawler for automatic webpage classification

CURRICULUM VITAE. June, 2013

Automatic Code Generation Using Uml To Xml Schema Transformation

An Enhanced Page Ranking Algorithm Based on Weights and Third level Ranking of the Webpages

Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results

New Strategies for Filtering the Number Field Sieve Matrix

2 Ontology evolution algorithm based on web-pages and users behavior logs

Annotation for the Semantic Web During Website Development

Focused Web Crawler with Page Change Detection Policy

SEMANTIC ENHANCED UDDI USING OWL-S PROFILE ONTOLOGY FOR THE AUTOMATIC DISCOVERY OF WEB SERVICES IN THE DOMAIN OF TELECOMMUNICATION

A Novel Link and Prospective terms Based Page Ranking Technique

BookAidee: Managing Evacuees from Natural Disaster by RFID Tagged Library Books

General network with four nodes and four activities with triangular fuzzy number as activity times

A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.

Ontology Generation from Session Data for Web Personalization

Document Retrieval using Predication Similarity

Competitive Intelligence and Web Mining:

PAijpam.eu SECURE SCHEMES FOR SECRET SHARING AND KEY DISTRIBUTION USING PELL S EQUATION P. Muralikrishna 1, S. Srinivasan 2, N. Chandramowliswaran 3

Web Service Recommendation Using Hybrid Approach

Payola: Collaborative Linked Data Analysis and Visualization Framework

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA

A GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE

Agent Semantic Communications Service (ASCS) Teknowledge

An Efficient Storage Mechanism to Distribute Disk Load in a VoD Server

Fast and Effective System for Name Entity Recognition on Big Data

Data Flow Graph Partitioning Schemes

Chapter 6: Information Retrieval and Web Search. An introduction

2 Experimental Methodology and Results

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Visualizing Algorithms for the Design and Analysis of Survivable Networks

An Improved Computation of the PageRank Algorithm 1

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Effectively Capturing User Navigation Paths in the Web Using Web Server Logs

Collaborative Rough Clustering

Efficient approximate SPARQL querying of Web of Linked Data

1.

A Jini Based Implementation for Best Leader Node Selection in MANETs

Locating Objects in a Sensor Grid

OSDBQ: Ontology Supported RDBMS Querying

A Study of Focused Crawler Approaches

Automatic Query Type Identification Based on Click Through Information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

A NOVEL APPROACH TO INTEGRATED SEARCH INFORMATION RETRIEVAL TECHNIQUE FOR HIDDEN WEB FOR DOMAIN SPECIFIC CRAWLING

Mobile Agent-Based Load Monitoring System for the Safety Web Server Environment

A Tagging Approach to Ontology Mapping

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Adding Usability to Web Engineering Models and Tools

Genetic algorithm for optimal imperceptibility in image communication through noisy channel

Automation of URL Discovery and Flattering Mechanism in Live Forum Threads

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Towards Formalization of ARD+ Conceptual Design and Refinement Method

INDIAN INSTITUTE OF MANAGEMENT CALCUTTA WORKING PAPER SERIES. WPS No. 644/ August A Markov-based Diurnal Mobility Model for 3G Cellular Networks

Research and Design of Key Technology of Vertical Search Engine for Educational Resources

Transcription:

Introducing Dynamic Ranking on Web-Pages Based on Multiple Ontology Supported Domains Debajyoti Mukhopadhyay 1,4, Anirban Kundu 2,4, and Sukanta Sinha 3,4 1 Calcutta Business School, D.H. Road, Bishnupur 743503, India 2 Netaji Subhash Engineering College, West Bengal 700152, India 3 Tata Consultancy Services, Whitefield Rd, Bangalore 560066, India 4 WIDiCoReL, Green Tower C- 9/1, Golf Green, Kolkata 700095, India {debajyoti.mukhopadhyay,anik76in,sukantasinha2003}@gmail.com Abstract. Search Engine ensures efficient Web-page ranking and retrieving. Page ranking is typically used for displaying the Web-pages at client-side. We are going to introduce a data structural model for retrieval of the searched Webpages. We propose two algorithms in this paper. The first algorithm constructs the Index Based Acyclic Graph generated by multiple ontologies supported crawling and the second algorithm is for calculation of ranking of the selected Web-pages from Index Based Acyclic Graph. Keywords: Search Engine, Ontology, Relevant Page Graph Model, Index Based Acyclic Graph Model, Web-page Ranking. 1 Introduction Search Engine exhibits a list of Web-pages as a result of a search made by the users. In this scenario, the display order of the links of Web-pages is very important factor. Different Search Engines use several ranking algorithms to rank the Web-pages properly with respect to the users point of view [3]. Relevant Page Graph Model consists of multiple domain specific Web-pages [2]. This model takes huge time to retrieve the data. In this background, we incorporate a new Index Based Acyclic Graph Model which provides faster access of Web pages to the users. This paper involves the basic idea of searching Web-pages from Index Based Acyclic Graph and also provides the order of selected Web-pages at the user-end. 2 Existing Model of Relevant Page Graph Model In this section, Relevant Page Graph (RPaG) is described. Every Crawler [5] needs some seed URLs to retrieve Web-pages from World Wide Web (WWW). All Ontologies [1], Weight Tables and Syntables [4, 6] are needed for retrieval of relevant Web-pages. RPaG is generated only considering relevant Web-pages. In RPaG, each node contains Page Identifier (P_ID), Unified Resource Locator (URL), four Parent Page Identifiers (PP_IDs),Ontology relevance value (ONT_1_REL_VAL, T. Janowski, H. Mohanty, and E. Estevez (Eds.): ICDCIT 2010, LNCS 5966, pp. 104 109, 2010. Springer-Verlag Berlin Heidelberg 2010

Introducing Dynamic Ranking on Web-Pages 105 ONT_2_REL_VAL, ONT_3_REL_VAL), Ontology relevance flag (ONT_1_F, ONT_2_F and ONT_3_F) fields information. A sample RPaG is shown in Fig. 1. Each node in this figure of RPaG contains four fields; i.e., Web-page URL, ONT_1_REL_VAL, ONT_2_REL_VAL and ONT_3_REL_VAL. Here, Ontology Relevance Value contain calculated relevance value if these overcome Relevance Limit Value of their respective domains. Otherwise, these fields contain Zero (0). Fig. 1. Arbitrary Example of Relevance Page Graph (RPaG) Definition 1. Weight Table - This table contains two columns; first column denotes Ontology terms and second column denotes weight value of that Ontology term. Weight value must be in the interval [0,1]. Definition 2. Syntable - This table contains two columns; first column denotes Ontology terms and second column denotes synonym of that ontology term. For a particular ontology term, if more than one synonyms exists then it should be kept using comma (,) separator. 3 Proposed Approach with Analytical Study In our approach, we have constructed from RPaG and further have searched the Web-pages from for a given Search String. Finally, a search string is given as input on the Graphical User Interface (GUI); and as a result, corresponding Webpage URLs are shown as per ranking mechanism followed. 3.1 Index Based Acyclic Graph Model In this section, Index Based Acyclic Graph () has been described. A connected acyclic graph is known as a tree. In Fig. 2, a sample is shown. It is generated by our prescribed algorithm which is described in Section 3.2. RPaG pages are related in some Ontologies and the generated from this specific RPaG is also related to the same Ontologies. Each node in the figure (refer Fig. 2) of contains Page Identifier (P_ID), Unified Resource Locator (URL), Parent Page Identifier (PP_ID), Mean Relevance value (MEAN_REL_VAL), Ontology link (ONT_1_L, ONT_2_L, ONT_3_L) fields. In each level, all the Web-pages Mean Relevance Value are kept in a sorted order and all the indexes which track that domain related pages are also stored. In Fig. 2, X means currently the ontology link does not exist. The calculation of MEAN_REL_VAL is described in Method 1.1 of Section 3.2. Using Maximum Mean Relevance Span Value (α), Minimum Mean Relevance Span Value (β) and Number of Mean Relevance Span level (n) we calculate Mean Gap Factor (ρ) = (α - β) / n. Now we define ranges such as β to β+ ρ, β+ ρ to β+ 2ρ, β+ 2ρ to β+ 3ρ and so on.

106 D. Mukhopadhyay, A. Kundu, S. Sinha Fig. 2. Index Based Acyclic Graph () 3.2 Construction of from RPaG In this section, the design of an algorithm is discussed. It generates from RPaG. Different methods are shown for better understanding of the algorithm. Algorithm 1. Construction of INPUT: Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span OUTPUT: Index Based Acyclic Graph () Step 1: Take Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span from user and generate one Dummy Page for each Mean Relevance Span Level Step 2: Take one Page (P) from RPaG and Call CAL_MEAN_REL_VAL (Page P) and find Mean Relevance Span Level Step 3: If this Mean Relevance Span Level contains only Dummy Page; Then replace the Dummy Page and goto Step 4; Otherwise goto Step 5 Step 4: For Each Supported Ontology Set Ontology Index Filed of That Level = P_ID of Page P goto Step 6 Step 5: Insert Page (P) in as follows: Call Find_Location (Incomplete, Page P) Call Find_Parent (RPaG, Incomplete, Page P) Call Set_Link (RPaG, Incomplete, Page P) Step 6: goto Step2 until all the pages traverses in RPaG Step 7: End Method 1.1: Cal_Mean_Rel_Val Cal_Mean_Rel_Val (Page P) MEAN_REL_VAL:= (Relevance Value for each Ontology) / Number of supported Ontology. Return MEAN_REL_VAL

Introducing Dynamic Ranking on Web-Pages 107 Method 1.2: Find_Location Find_Location (Incomplete, Page P) All Left Side Page Mean Relevance Value is Grater Than Page P Mean Relevance Value and All Right Side Page Mean Relevance Value is Lesser Than Page P Mean Relevance Value and return Location. Method 1.3: Find_Parent Find_Parent (RPaG, Incomplete, Page P) If More than one parent exists in RPaG Then For Each Parent Page Call Cal_Mean_Rel_Val(Parent Page of Page P in RPaG) Take Maximum MEAN_REL_VAL Page among those Parent Pages in RPaG as a Parent of Page P in If Page P Location is Left Most Position Then For each left side page in parent level of right side Parent Page of page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in and Return; Add Page P as a Child of Right Side Page Parent in Else If Page P Location is Right Most Position Then For Each right side Page in parent level of left side Parent Page of Page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in and Return; Add Page P as a Child of left Side Page Parent in Else If Left Side Page Parent of Page P in = Parent of Page P in RPaG Then Add Page P as a Child of Left Side Page Parent in Else If Right Side Page Parent of Page P in = Parent of Page P in RPaG Then Add Page P as a Child of Right Side Page Parent in Else If Left Side Page Parent of Page P in!= Right Side Page Parent of Page P in Then Find Parent Page of P in RPaG between those two Parents in If Found Then Add Page P as a Child of that Parent Page in

108 D. Mukhopadhyay, A. Kundu, S. Sinha Else Add Page P as a Child of left Side Page Parent in Else Add Page P as a Child of Left Side Page Parent in Return; Method 1.4: Set_Link Set_Link (RPaG, Incomplete, Page P) For Each Supported Ontology Check Left Side Page Ontology Link Field until Link Not Found and Then If Link Came From Index Then Set Page P Ontology Link Field = Ontology Index Filed of That Level and Ontology Index Filed of That Level = P_ID of Page P Else Set Ontology Link Field of Page P in = Ontology Link Field of Left Side Tracked Page in and Ontology Link Field of Left Side Tracked Page in = P_ID of Page P 3.3 Procedure for Web-Page Selection and Its Related Dynamic Ranking In this section we have described an algorithm which typically selects Web-pages from from the given Relevance Range and have selected Ontologies from User-side. Finally, Web-page URLs are shown based on their calculated rank. Algorithm 2. Web-page Selection INPUT: Relevance Range, Ontology Flags, Search String, Index Based Acyclic Graph () OUTPUT: Web Pages According to the Search String Step 1: Initially taken one Search string, Index Based Acyclic Graph () Step 2: Parse the Input Search string and find ontology terms. If there doesn t exist any ontology terms then exit Step 3: Select all Web pages according to their Range and Selected Ontology Step 4: Call Cal_Rank (Input String Ontology Terms, Selected Web Pages) Step 5: Display Web pages according to their Rank Step 6: End

Introducing Dynamic Ranking on Web-Pages 109 Method 2.1: Cal_Rank Cal_Rank (Input String Ontology Terms, Selected Web Pages) For Each Web Page For Each Input String Ontology Term RANK = RANK + Number of occurrence of Input String Ontology Terms in the Web page * Weight Value of Ontology Term; End loop Set RANK Value of the Web Page and then make RANK = 0; End loop 4 Conclusion In this paper, a prototype of Multiple Ontology supported Web Search Engine is shown. It retrieves Web-pages from Index Based Acyclic Graph model. This prototype produces faster result as well as it is highly scalable and the Ranking algorithm generates the order of the Web-page URLs. ï References 1. Heflin, J., Hendler, J.: Dynamic Ontologies on the Web, Department of Computer Science University of Maryland College Park, MD 20742 2. Mukhopadhyay, D., Sinha, S.: A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies. In: 11th International Conference on Information Technology, ICIT 2008 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2008) 3. Kundu, A., Dutta, R., Mukhopadhyay, D.: An Alternate Way to Rank Hyper-linked Webpages. In: 9th International Conference on Information Technology, ICIT 2006 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2006) 4. WordNet, http://en.wikipedia.org/wiki/wordnet 5. Mukhopadhyay, D., Biswas, A., Sinha, S.: A New Approach to Design Domain Specific Ontology Based Web Crawler. In: 10th International Conference on Information Technology, ICIT 2007 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2007) 6. WordNet, http://en.wikipedia.org/wiki/george_a._miller