COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub. Code / Sub. Name : CS6007 INFORMATION RETRIEVAL Unit : I - INTRODUCTION Introduction - History of IR - Components of IR - Issues Open source Search engine Frameworks - The impact of the web on IR - The role of artificial intelligence (AI) in IR IR Versus Web Search - Components of a Search engine - Characterizing the web learn the role of information retrieval in various real-time applications 1 Introduction to Information Retrieval, History of IR and Components of IR 2 Issues in IR T1 (Ch 1 : 1 5) T2 (Ch 1 : 1 3) T3 (Ch 1 : 1 9) T2 (Ch 1 : 3 5) Teaching Aids 3 Open source Search engine Frameworks R1 (Ch 1 : 27 30) 4 The impact of the web on IR T2 (Ch 1 : 8 12) 5 The role of artificial intelligence (AI) in IR R4 6 IR Versus Web Search R1 (Ch 1 : 5 8) 7 Brief history of search engines T4 (Ch 1 : 1 6) 8 Components of a Search engine T3 (Ch 1 : 13 28) T2 (Ch 13 : 373 383) 9 Characterizing the web, Comparing web search to traditional information retrieval T2 (Ch 13 : 367 371) T4 (Ch 2 : 29 32) Content beyond syllabus covered (if any): Library and Information Science - Concerned with effective categorization of human knowledge, citation analysis and bibliometrics (structure of information). * duration: 50 minutes
COURSE DELIVERY PLAN - THEORY Page 2 of 6 Unit : II - INFORMATION RETRIEVAL Boolean and vector-space retrieval models - Term weighting - TF-IDF weighting - cosine similarity Preprocessing - Inverted indices - efficient processing with sparse vectors Language Model based IR - Probabilistic IR Latent Semantic Indexing - Relevance feedback and query expansion. To learn and apply information retrieval models 10 11 12 13 14 Basic IR models & Retrieval strategies Vectorspace model, Probabilistic IR, Language models, Inference Retrieval networks strategies : Extended Boolean retrieval, Latent Semantic Indexing, Neural network, Genetic Term weighting, TF-IDF weighting and cosine similarity in Vector-space model Term weighting, TF-IDF weighting and cosine similarity in Probabilistic retrieval strategies Inverted indices Documents, Counts, Positions, Fields and Extents, Scores and Ordering 15 Language Model based IR 16 Probabilistic IR 17 Latent Semantic Indexing T1 (Ch 1 : 1 15) R2 (Ch 2 : 9 57) T3 (Ch 7 : 233 250) R2 (Ch 2 : 57 84) R2 (Ch 2 : 11 21) R2 (Ch 2 : 21 45) T3 (Ch 5: 129 140) R1 (Ch 4 : 104 131) R2 (Ch 5 : 181 182) T1 (Ch 12: 218 231) R1 (Ch 9 : 286 304) T1 (Ch 11: 201 216) R1 (Ch 8 : 259 281) T1 (Ch 18: 412 417) Teaching Aids 18 Relevance feedback and query expansion T1 (Ch 9 : 162 177) Content beyond syllabus covered (if any): Comparison of Google/Yahoo ranking * duration: 50 mins
COURSE DELIVERY PLAN - THEORY Page 3 of 6 Unit : III WEB SEARCH ENGINE INTRODUCTION AND CRAWLING Web search overview, web structure, the user, paid placement, search engine optimization/ spam. Web size measurement - search engine optimization/spam Web Search Architectures - crawling - meta-crawlers- Focused Crawling - web indexes - Near-duplicate detection - Index Compression XML retrieval To design Web Search Engine Teaching Aids 19 20 21 Web search basics Background and history, Web characteristics, Search user experience, Index size and estimation Web search The structure of the web, Queries and users, Static ranking, Dynamic ranking, Evaluating web search Web structure - The user, paid placement, Search engine optimization / spam T1 (Ch 19 : 385 400) R1 (Ch 15 : 507 540) T4 (Ch 2: 20 23) T4 (Ch 7 : 228 230) 22 Web size measurement - search engine optimization/spam T4 (Ch 5: 91 98) T4 (Ch 7 : 225 230) 23 24 Web Search Architectures Crawling, Meta-crawlers and Focused Crawling Web Crawlers : Crawling the web, Document feeds, Storing documents and detecting duplicates T4 (Ch 4 : 78 85) T3 (Ch 2 : 13 28) T3 (Ch 3 : 31 63) R1 (Ch 15 : 541 549) 25 Index Compression - Statistical properties of terms in IR, Dictionary compression, Postings File compression T1 (Ch 5: 78 96) R3 (Ch 8 : 313 319) 26 XML retrieval Basic XML concepts, Challenges in XML retrieval, A vector space model for XML retrieval T1 (Ch 10 : 178 192) 27 XML retrieval T1 (Ch 10 : 194 198) R1 (Ch 16 : 564 584) Content beyond syllabus covered (if any): IR techniques for the web, including crawling, link-based algorithms, and metadata usage * duration: 50 mins
COURSE DELIVERY PLAN - THEORY Page 4 of 6 Unit : IV WEB SEARCH LINK ANALYSIS AND SPECIALIZED SEARCH Unit Syllabus : Link Analysis hubs and authorities Page Rank and HITS algorithms - Searching and Ranking Relevance Scoring and ranking for Web Similarity - Hadoop & Map Reduce - Evaluation - Personalized search - Collaborative filtering and content-based recommendation of documents and products handling invisible Web - Snippet generation, Summarization, Question Answering, Cross-Lingual Retrieval To be exposed to Link Analysis Understand Hadoop and MapReduce Teachi ng Aids 28 Link Analysis hubs and authorities, Page Rank and HITS algorithms T1 (Ch 21 : 421 439) T3 (Ch 4 : 104 113) 29 Searching and Ranking R4 30 Relevance Scoring and ranking for Web R4 31 Hadoop & Map Reduce Evaluation R4 32 Personalized search R4 33 Collaborative filtering and content-based recommendation of documents and products T4 (Ch 9 : 333-346) T3 (Ch 10 : 432 437) 34 Handling invisible Web, Snippet generation R4 35 Summarization, Question Answering R4 36 Cross-Lingual Retrieval R4 Content beyond syllabus covered (if any): Application : Social Network Analysis * duration: 50 mins
COURSE DELIVERY PLAN - THEORY Page 5 of 6 Unit : V DOCUMENT TEXT MINING Information filtering; organization and relevance feedback Text Mining - Text classification and clustering - Categorization algorithms: naive Bayes; decision trees; and nearest neighbor - Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM). Learn document text mining techniques Teachin g Aids 37 Information filtering R4 38 Organization and relevance feedback R4 39 Text Mining T4 (Ch 7 : 230 237) 40 Text classification and clustering : Categorization algorithms and Clustering T3 (Ch 9 : 339 364) 41 Text classification The text classification problem, Naive Bayes text classification, The Bernoulli model, Properties of Naive Bayes T1 (Ch 13 : 234 251) 42 Text classification Feature selection and Evaluation T1 (Ch 13 : 251 264) 43 Categorization algorithms: Naive Bayes, Decision trees and K-Nearest Neighbor R3 (Ch 7 : 281 294) 44 Agglomerative clustering and K-Means algorithm T3 (Ch 9 : 373 389) 45 Expectation Maximization (EM) algorithm T1(Ch 16: 368 372) Content beyond syllabus covered (if any): Porter Stemming algorithm * duration: 50 mins
COURSE DELIVERY PLAN - THEORY Page 6 of 6 Sub Code / Sub Name: CS6007 INFORMATION RETRIEVAL TEXT BOOKS: 1. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008. 2. Ricardo Baeza -Yates and Berthier Ribeiro - Neto, Modern Information Retrieval: The Concepts and Technology behind Search 2 nd Edition, ACM Press Books 2011 3. Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice, 1 st Edition Addison Wesley, 2009. 4. Mark Levene, An Introduction to Search Engines and Web Navigation, 2 nd Edition Wiley, 2010. REFERENCES: 1. Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, The MIT Press, 2010. 2. Ophir Frieder Information Retrieval: Algorithms and Heuristics: The Information Retrieval Series, 2 nd Edition, Springer, 2004. 3. Manu Konchady, Building Search Applications: Lucene, Ling Pipe, and First Edition, Gate Mustru Publishing, 2008. 4. www.nptel.ac.in