LIST OF ACRONYMS & ABBREVIATIONS ARPA CBFSE CBR CS CSE FiPRA GUI HITS HTML HTTP HyPRA NoRPRA ODP PR RBSE RS SE TF-IDF UI URI URL W3 W3C WePRA WP WWW Alpha Page Rank Algorithm Context based Focused Search Engine Context based Relevance Contextual Sense Contextual Sense Extractor Filtered Page Rank Algorithm Graphical User Interface Hyperlink Induced Topic Search Hypertext Markup Language Hypertext Transfer Protocol Hybrid Page Rank Algorithm Noise Removed Page Rank Algorithm Open Directory Project Page Rank Repository-Based Software Engineering Relevance Score Search Engine Term Frequency-Inverse Document Frequency User Interface Uniform Resource Identifier Uniform Resource Locator World Wide Web World Wide Web Consortium Weighted Page Rank Algorithm Web Page World Wide Web xiv
LIST OF FIGURES Figure Caption Page 2.1 Web architecture 8 2.2 Comparison of Documents Indexed by Google and Bing 10 2.3 Billions of documents indexed by Google 10 2.4 Billions of documents indexed by Bing 11 2.5 Comparison of documents indexed by Google and Yahoo 11 2.6 World Population Penetration Rates 12 2.7 World Internet Users Growth 12 2.8 Internet Users in the World Distribution 13 2.9 Asia top Internet countries 13 2.10 Meta Search Engine Basic Architecture 16 2.11 Architecture of a search engine 18 2.12 Use of the Internet 22 2.13 Popular areas of applications of the Internet 22 2.14 Too many results to browse 23 2.15 Percentage of users getting information on first page 23 2.16 Users do not search beyond third level 24 2.17 Category Taxonomy Based Focused Approach 29 3.1 Google results for keyword Student 48 3.2 Changed Result from Google for keyword Student 51 3.3 High level architecture of proposed context based focused 53 search engine 3.4 Context Based Index Structure 58 xv
3.5 Structure of the node 59 3.6 Client Side Crawl Worker Activities 62 3.7 Algorithm for Crawl_Worker 63 3.8 Algorithm for URL_Mapper 64 3.9 Algorithm for Downloader 65 3.10 Data Flow among various architectural components 66 3.11 Example for the proposed architecture 69 4.1 Block Diagram for Context Based Relevance Calculator 72 4.2 WordNet Dictionary Structure Matrix 77 4.3 WordNet Storage Example 78 4.4 Result from WordNet for keyword Student 78 4.5 Result from WordNet for keyword Spider 79 4.6 Pseudo code for extraction of various contextual senses 81 4.7 Contextual senses from WordNet dictionary 81 5.1 Ranking Module and other components 97 5.2 (a) Ordering results by proposed context based tanking mechanism 103 5.2 (b) Ordering results by Page Rank ranking mechanism 104 5.3 Average Precision of Results by Proposed Ranking Mechanism 106 Compared to Page Rank Ranking Mechanism 6.1 Back-Links 111 6.2 Block diagram for Back-link Extraction and Relevance Evaluation 112 6.3 Data Flow between Back-link Extraction and Relevance Evaluator 115 Processes 6.4 Back-link Relevance 119 6.5 Comparison when URLs and URLs + Back-Links considered 126 6.6 Result Analysis in Scenario 1 127 xvi
6.7 Result Analysis in Scenario 2 127 7.1 Prototype for Context Based Focused Search Engine 130 7.2 An Instance of Table named words 132 7.3 An Instance of table named searchresults 133 7.4 An instance of table linkresults 134 7.5 An Instance of searchresultdetails 136 7.6 An Instance of linkresultdetails 137 7.7 An Instance of table urlkeywordscorefinal 139 7.8 An instance of user interface 140 7.9 Various contextual senses of Java displayed by search module 141 to user 7.10 Ranked list of matched documents for Java Island 142 7.11 The web page corresponding to first link in ranked list 142 A.1 Web pages and CS association 162 A.2 Hyperlinked Structure 164 A.3 Relation between back-link page with the page it point to 167 B.1 (a) Context score based ranking of URLs on topic Computer Mouse 170 B.1 (b) Page Rank based ranking of URLs on topic Computer Mouse 170 B.2 (a) Context score based ranking of URLs on topic Mouse Rodent 171 B.2 (b) Page Rank based ranking of URLs on topic Mouse Rodent 171 B.3 (a) Context score based ranking of URLs on topic Crane Bird 172 B.3 (b) Page Rank based ranking of URLs on topic Crane Bird 172 B.4 (a) Context score based ranking of URLs on topic Crane Machine 173 B.4 (b) Page Rank based ranking of URLs on topic Crane Machine 173 B.5 (a) Context score based ranking of URLs on topic Java Lang. 174 xvii
B.5 (b) Page Rank based ranking of URLs on topic Java Lang. 174 B.6 (a) Context score based ranking of URLs on topic Java Island 175 B.6 (b) Page Rank based ranking of URLs on topic Java Island 175 B.7 (a) Context score based ranking of URLs on topic Java Coffee 176 B.7 (b) Page Rank based ranking of URLs on topic Java Coffee 176 B.8 (a) Context score based ranking of URLs on topic Lion Animal 177 B.8 (b) Page Rank based ranking of URLs on topic Lion Animal 177 B.9 (a) Context score based ranking of URLs on topic Java Lang. 179 B.9 (b) Google s ranking of URLs on topic Java Lang. 179 B.10 (a) Context score based ranking of URLs on topic Java Island 180 B.10 (b) Google s ranking of URLs on topic Java Island 180 B.11 (a) Context score based ranking of URLs on topic Java Coffee 181 B.11 (b) Google s ranking of URLs on topic Java Coffee 181 B.12 (a) Context score based ranking of URLs on topic Crane Bird 182 B.12 (b) Google s ranking of URLs on topic Crane Bird 182 B.13 (a) Context score based ranking of URLs on topic Crane Machine 183 B.13 (b) Google s ranking of URLs on topic Crane Machine 183 B.14 (a) Context score based ranking of URLs on topic Lion Animal 184 B.14 (b) Google s ranking of URLs on topic Lion Animal 184 B.15 (a) Context score based ranking of URLs on topic Colt Young Horse 185 B.15 (b) Google s ranking of URLs on topic Colt Young Horse 185 xviii
LIST OF TABLES Table Caption Page 2.1 Inverted Index 20 3.1 Motivating Examples 48 4.1 Words occurrences and their corresponding accumulated weights 74 4.2 Contextual Senses of Word Wood 75 4.3 Comparison for Student 79 4.4 Comparison for Spider 80 4.5 List of Keywords (http://en.wikipedia.org/wiki/mouse_computing) 85 4.6 Contextual senses definition 86 4.7 Results w.r.t sense 3 86 4.8 Result w.r.t sense 6 87 4.9 Context Score for each Contextual Sense of word Mouse 87 4.10 Keywords Contextual Senses and computed context score 87 (CSense/WP) 4.11 URLs Topic and computed context score 88 4.12 Computed context score for links from Google for keyword 90 Mouse 4.13 Filtered document in sense of Computer Mouse 92 5.1 Top 20 URLs and their computed context score Rank on topic 99 Mouse 5.2 Top 20 ranked URLs in descending order of computed rank 100 5.3 Page Rank ordering vs Context based ordering 102 6.1 Structure of the URL table 114 6.2 RS of Back-links 119 xix
6.3 Matched URLs for query keyword Mouse 122 6.4 Back-Links and their computed rank 123 6.5 Combined list of Matched URLs + Back-Links with computed 123 rank 6.6 Top 10 high rank URLs consisting of URL s and back-links 125 7.1 Result analysis of proposed CBFSE with Page Rank 143 7.2 Precision Table 145 A.1 Hyperlinked Structure 164 xx