Introducing Dynamic Ranking on Web-Pages Based on Multiple Ontology Supported Domains Debajyoti Mukhopadhyay 1,4, Anirban Kundu 2,4, and Sukanta Sinha 3,4 1 Calcutta Business School, D.H. Road, Bishnupur 743503, India 2 Netaji Subhash Engineering College, West Bengal 700152, India 3 Tata Consultancy Services, Whitefield Rd, Bangalore 560066, India 4 WIDiCoReL, Green Tower C- 9/1, Golf Green, Kolkata 700095, India {debajyoti.mukhopadhyay,anik76in,sukantasinha2003}@gmail.com Abstract. Search Engine ensures efficient Web-page ranking and retrieving. Page ranking is typically used for displaying the Web-pages at client-side. We are going to introduce a data structural model for retrieval of the searched Webpages. We propose two algorithms in this paper. The first algorithm constructs the Index Based Acyclic Graph generated by multiple ontologies supported crawling and the second algorithm is for calculation of ranking of the selected Web-pages from Index Based Acyclic Graph. Keywords: Search Engine, Ontology, Relevant Page Graph Model, Index Based Acyclic Graph Model, Web-page Ranking. 1 Introduction Search Engine exhibits a list of Web-pages as a result of a search made by the users. In this scenario, the display order of the links of Web-pages is very important factor. Different Search Engines use several ranking algorithms to rank the Web-pages properly with respect to the users point of view [3]. Relevant Page Graph Model consists of multiple domain specific Web-pages [2]. This model takes huge time to retrieve the data. In this background, we incorporate a new Index Based Acyclic Graph Model which provides faster access of Web pages to the users. This paper involves the basic idea of searching Web-pages from Index Based Acyclic Graph and also provides the order of selected Web-pages at the user-end. 2 Existing Model of Relevant Page Graph Model In this section, Relevant Page Graph (RPaG) is described. Every Crawler [5] needs some seed URLs to retrieve Web-pages from World Wide Web (WWW). All Ontologies [1], Weight Tables and Syntables [4, 6] are needed for retrieval of relevant Web-pages. RPaG is generated only considering relevant Web-pages. In RPaG, each node contains Page Identifier (P_ID), Unified Resource Locator (URL), four Parent Page Identifiers (PP_IDs),Ontology relevance value (ONT_1_REL_VAL, T. Janowski, H. Mohanty, and E. Estevez (Eds.): ICDCIT 2010, LNCS 5966, pp. 104 109, 2010. Springer-Verlag Berlin Heidelberg 2010
Introducing Dynamic Ranking on Web-Pages 105 ONT_2_REL_VAL, ONT_3_REL_VAL), Ontology relevance flag (ONT_1_F, ONT_2_F and ONT_3_F) fields information. A sample RPaG is shown in Fig. 1. Each node in this figure of RPaG contains four fields; i.e., Web-page URL, ONT_1_REL_VAL, ONT_2_REL_VAL and ONT_3_REL_VAL. Here, Ontology Relevance Value contain calculated relevance value if these overcome Relevance Limit Value of their respective domains. Otherwise, these fields contain Zero (0). Fig. 1. Arbitrary Example of Relevance Page Graph (RPaG) Definition 1. Weight Table - This table contains two columns; first column denotes Ontology terms and second column denotes weight value of that Ontology term. Weight value must be in the interval [0,1]. Definition 2. Syntable - This table contains two columns; first column denotes Ontology terms and second column denotes synonym of that ontology term. For a particular ontology term, if more than one synonyms exists then it should be kept using comma (,) separator. 3 Proposed Approach with Analytical Study In our approach, we have constructed from RPaG and further have searched the Web-pages from for a given Search String. Finally, a search string is given as input on the Graphical User Interface (GUI); and as a result, corresponding Webpage URLs are shown as per ranking mechanism followed. 3.1 Index Based Acyclic Graph Model In this section, Index Based Acyclic Graph () has been described. A connected acyclic graph is known as a tree. In Fig. 2, a sample is shown. It is generated by our prescribed algorithm which is described in Section 3.2. RPaG pages are related in some Ontologies and the generated from this specific RPaG is also related to the same Ontologies. Each node in the figure (refer Fig. 2) of contains Page Identifier (P_ID), Unified Resource Locator (URL), Parent Page Identifier (PP_ID), Mean Relevance value (MEAN_REL_VAL), Ontology link (ONT_1_L, ONT_2_L, ONT_3_L) fields. In each level, all the Web-pages Mean Relevance Value are kept in a sorted order and all the indexes which track that domain related pages are also stored. In Fig. 2, X means currently the ontology link does not exist. The calculation of MEAN_REL_VAL is described in Method 1.1 of Section 3.2. Using Maximum Mean Relevance Span Value (α), Minimum Mean Relevance Span Value (β) and Number of Mean Relevance Span level (n) we calculate Mean Gap Factor (ρ) = (α - β) / n. Now we define ranges such as β to β+ ρ, β+ ρ to β+ 2ρ, β+ 2ρ to β+ 3ρ and so on.
106 D. Mukhopadhyay, A. Kundu, S. Sinha Fig. 2. Index Based Acyclic Graph () 3.2 Construction of from RPaG In this section, the design of an algorithm is discussed. It generates from RPaG. Different methods are shown for better understanding of the algorithm. Algorithm 1. Construction of INPUT: Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span OUTPUT: Index Based Acyclic Graph () Step 1: Take Relevant Page Graph (RPaG) Constructed from Original Crawling, Number of Mean Relevance Span Level, Maximum Mean Relevance Span and Minimum Mean Relevance Span from user and generate one Dummy Page for each Mean Relevance Span Level Step 2: Take one Page (P) from RPaG and Call CAL_MEAN_REL_VAL (Page P) and find Mean Relevance Span Level Step 3: If this Mean Relevance Span Level contains only Dummy Page; Then replace the Dummy Page and goto Step 4; Otherwise goto Step 5 Step 4: For Each Supported Ontology Set Ontology Index Filed of That Level = P_ID of Page P goto Step 6 Step 5: Insert Page (P) in as follows: Call Find_Location (Incomplete, Page P) Call Find_Parent (RPaG, Incomplete, Page P) Call Set_Link (RPaG, Incomplete, Page P) Step 6: goto Step2 until all the pages traverses in RPaG Step 7: End Method 1.1: Cal_Mean_Rel_Val Cal_Mean_Rel_Val (Page P) MEAN_REL_VAL:= (Relevance Value for each Ontology) / Number of supported Ontology. Return MEAN_REL_VAL
Introducing Dynamic Ranking on Web-Pages 107 Method 1.2: Find_Location Find_Location (Incomplete, Page P) All Left Side Page Mean Relevance Value is Grater Than Page P Mean Relevance Value and All Right Side Page Mean Relevance Value is Lesser Than Page P Mean Relevance Value and return Location. Method 1.3: Find_Parent Find_Parent (RPaG, Incomplete, Page P) If More than one parent exists in RPaG Then For Each Parent Page Call Cal_Mean_Rel_Val(Parent Page of Page P in RPaG) Take Maximum MEAN_REL_VAL Page among those Parent Pages in RPaG as a Parent of Page P in If Page P Location is Left Most Position Then For each left side page in parent level of right side Parent Page of page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in and Return; Add Page P as a Child of Right Side Page Parent in Else If Page P Location is Right Most Position Then For Each right side Page in parent level of left side Parent Page of Page P If parent of P in RPaG found Then Add Page P as a Child of that Parent Page in and Return; Add Page P as a Child of left Side Page Parent in Else If Left Side Page Parent of Page P in = Parent of Page P in RPaG Then Add Page P as a Child of Left Side Page Parent in Else If Right Side Page Parent of Page P in = Parent of Page P in RPaG Then Add Page P as a Child of Right Side Page Parent in Else If Left Side Page Parent of Page P in!= Right Side Page Parent of Page P in Then Find Parent Page of P in RPaG between those two Parents in If Found Then Add Page P as a Child of that Parent Page in
108 D. Mukhopadhyay, A. Kundu, S. Sinha Else Add Page P as a Child of left Side Page Parent in Else Add Page P as a Child of Left Side Page Parent in Return; Method 1.4: Set_Link Set_Link (RPaG, Incomplete, Page P) For Each Supported Ontology Check Left Side Page Ontology Link Field until Link Not Found and Then If Link Came From Index Then Set Page P Ontology Link Field = Ontology Index Filed of That Level and Ontology Index Filed of That Level = P_ID of Page P Else Set Ontology Link Field of Page P in = Ontology Link Field of Left Side Tracked Page in and Ontology Link Field of Left Side Tracked Page in = P_ID of Page P 3.3 Procedure for Web-Page Selection and Its Related Dynamic Ranking In this section we have described an algorithm which typically selects Web-pages from from the given Relevance Range and have selected Ontologies from User-side. Finally, Web-page URLs are shown based on their calculated rank. Algorithm 2. Web-page Selection INPUT: Relevance Range, Ontology Flags, Search String, Index Based Acyclic Graph () OUTPUT: Web Pages According to the Search String Step 1: Initially taken one Search string, Index Based Acyclic Graph () Step 2: Parse the Input Search string and find ontology terms. If there doesn t exist any ontology terms then exit Step 3: Select all Web pages according to their Range and Selected Ontology Step 4: Call Cal_Rank (Input String Ontology Terms, Selected Web Pages) Step 5: Display Web pages according to their Rank Step 6: End
Introducing Dynamic Ranking on Web-Pages 109 Method 2.1: Cal_Rank Cal_Rank (Input String Ontology Terms, Selected Web Pages) For Each Web Page For Each Input String Ontology Term RANK = RANK + Number of occurrence of Input String Ontology Terms in the Web page * Weight Value of Ontology Term; End loop Set RANK Value of the Web Page and then make RANK = 0; End loop 4 Conclusion In this paper, a prototype of Multiple Ontology supported Web Search Engine is shown. It retrieves Web-pages from Index Based Acyclic Graph model. This prototype produces faster result as well as it is highly scalable and the Ranking algorithm generates the order of the Web-page URLs. ï References 1. Heflin, J., Hendler, J.: Dynamic Ontologies on the Web, Department of Computer Science University of Maryland College Park, MD 20742 2. Mukhopadhyay, D., Sinha, S.: A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies. In: 11th International Conference on Information Technology, ICIT 2008 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2008) 3. Kundu, A., Dutta, R., Mukhopadhyay, D.: An Alternate Way to Rank Hyper-linked Webpages. In: 9th International Conference on Information Technology, ICIT 2006 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2006) 4. WordNet, http://en.wikipedia.org/wiki/wordnet 5. Mukhopadhyay, D., Biswas, A., Sinha, S.: A New Approach to Design Domain Specific Ontology Based Web Crawler. In: 10th International Conference on Information Technology, ICIT 2007 Proceedings, Bhubaneswar, India. IEEE Computer Society Press, California (2007) 6. WordNet, http://en.wikipedia.org/wiki/george_a._miller