Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Similar documents
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

60-538: Information Retrieval

CS290N Summary Tao Yang

Building Search Applications

Search Engines Information Retrieval in Practice

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 6: Information Retrieval and Web Search. An introduction

Introduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Part I: Data Mining Foundations

Modern Information Retrieval

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Mining Web Data. Lijun Zhang

Introduction & Administrivia

Information Retrieval

Introduction to Information Retrieval

modern database systems lecture 4 : information retrieval

Information Retrieval

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Keyword Extraction by KNN considering Similarity among Features

Mining Web Data. Lijun Zhang

Collective Intelligence in Action

TEXT MINING APPLICATION PROGRAMMING

An Introduction to Search Engines and Web Navigation

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

Natural Language Processing

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

CS54701: Information Retrieval

Encoding Words into String Vectors for Word Categorization

Chapter 2. Architecture of a Search Engine

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Introduction to Information Retrieval

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Information Retrieval and Extraction

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CS371R: Final Exam Dec. 18, 2017

Machine Learning using MapReduce

Information Retrieval and Web Search

: Semantic Web (2013 Fall)

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

SRI VENKATESWARA COLLEGE OF ENGINEERING. COURSE DELIVERY PLAN - THEORY Page 1 of 6

CS 6320 Natural Language Processing

Information Retrieval: Retrieval Models

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Introduction to Information Retrieval. Hongning Wang

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

Search Engines. Information Retrieval in Practice

DATA MINING - 1DL105, 1DL111

Information Retrieval

Information Retrieval. hussein suleman uct cs

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SUB CODE:CS6007 SUB NAME: INFORMATION RETRIEVAL QUESTION BANK BATCH: YEAR/SEMESTER:IV / VII

Content-based Recommender Systems

Models for Document & Query Representation. Ziawasch Abedjan

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

Information Retrieval

SRI VENKATESWARA COLLEGE OF ENGINEERING

The Information Retrieval Series. Series Editor W. Bruce Croft

Favorites-Based Search Result Ordering

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

Palimpsest: Improving Assisted Curation of Loco-specific Literature

Information Retrieval

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

CSE 494: Information Retrieval, Mining and Integration on the Internet

DATA MINING II - 1DL460. Spring 2014"

Introduction to Text Mining. Hongning Wang

Social Search Networks of People and Search Engines. CS6200 Information Retrieval

Making Retrieval Faster Through Document Clustering

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Birkbeck (University of London)

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Chapter 1 AN INTRODUCTION TO TEXT MINING. 1. Introduction. Charu C. Aggarwal. ChengXiang Zhai

Query Answering Using Inverted Indexes

Introduction to Information Retrieval. Lecture Outline

Similarity search in multimedia databases

Information Retrieval and Extraction

Introduction to Information Retrieval

Dynamic Visualization of Hubs and Authorities during Web Search

Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016

Authoritative K-Means for Clustering of Web Search Results

Information Retrieval (Part 1)

Web Mining: A Survey Paper

Table Of Contents: xix Foreword to Second Edition

Large Scale Image Retrieval

Finding Hubs and authorities using Information scent to improve the Information Retrieval precision

Chapter 3 - Text. Management and Retrieval

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

International ejournals

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Information Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University

Oleksandr Kuzomin, Bohdan Tkachenko

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Transcription:

COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub. Code / Sub. Name : CS6007 INFORMATION RETRIEVAL Unit : I - INTRODUCTION Introduction - History of IR - Components of IR - Issues Open source Search engine Frameworks - The impact of the web on IR - The role of artificial intelligence (AI) in IR IR Versus Web Search - Components of a Search engine - Characterizing the web learn the role of information retrieval in various real-time applications 1 Introduction to Information Retrieval, History of IR and Components of IR 2 Issues in IR T1 (Ch 1 : 1 5) T2 (Ch 1 : 1 3) T3 (Ch 1 : 1 9) T2 (Ch 1 : 3 5) Teaching Aids 3 Open source Search engine Frameworks R1 (Ch 1 : 27 30) 4 The impact of the web on IR T2 (Ch 1 : 8 12) 5 The role of artificial intelligence (AI) in IR R4 6 IR Versus Web Search R1 (Ch 1 : 5 8) 7 Brief history of search engines T4 (Ch 1 : 1 6) 8 Components of a Search engine T3 (Ch 1 : 13 28) T2 (Ch 13 : 373 383) 9 Characterizing the web, Comparing web search to traditional information retrieval T2 (Ch 13 : 367 371) T4 (Ch 2 : 29 32) Content beyond syllabus covered (if any): Library and Information Science - Concerned with effective categorization of human knowledge, citation analysis and bibliometrics (structure of information). * duration: 50 minutes

COURSE DELIVERY PLAN - THEORY Page 2 of 6 Unit : II - INFORMATION RETRIEVAL Boolean and vector-space retrieval models - Term weighting - TF-IDF weighting - cosine similarity Preprocessing - Inverted indices - efficient processing with sparse vectors Language Model based IR - Probabilistic IR Latent Semantic Indexing - Relevance feedback and query expansion. To learn and apply information retrieval models 10 11 12 13 14 Basic IR models & Retrieval strategies Vectorspace model, Probabilistic IR, Language models, Inference Retrieval networks strategies : Extended Boolean retrieval, Latent Semantic Indexing, Neural network, Genetic Term weighting, TF-IDF weighting and cosine similarity in Vector-space model Term weighting, TF-IDF weighting and cosine similarity in Probabilistic retrieval strategies Inverted indices Documents, Counts, Positions, Fields and Extents, Scores and Ordering 15 Language Model based IR 16 Probabilistic IR 17 Latent Semantic Indexing T1 (Ch 1 : 1 15) R2 (Ch 2 : 9 57) T3 (Ch 7 : 233 250) R2 (Ch 2 : 57 84) R2 (Ch 2 : 11 21) R2 (Ch 2 : 21 45) T3 (Ch 5: 129 140) R1 (Ch 4 : 104 131) R2 (Ch 5 : 181 182) T1 (Ch 12: 218 231) R1 (Ch 9 : 286 304) T1 (Ch 11: 201 216) R1 (Ch 8 : 259 281) T1 (Ch 18: 412 417) Teaching Aids 18 Relevance feedback and query expansion T1 (Ch 9 : 162 177) Content beyond syllabus covered (if any): Comparison of Google/Yahoo ranking * duration: 50 mins

COURSE DELIVERY PLAN - THEORY Page 3 of 6 Unit : III WEB SEARCH ENGINE INTRODUCTION AND CRAWLING Web search overview, web structure, the user, paid placement, search engine optimization/ spam. Web size measurement - search engine optimization/spam Web Search Architectures - crawling - meta-crawlers- Focused Crawling - web indexes - Near-duplicate detection - Index Compression XML retrieval To design Web Search Engine Teaching Aids 19 20 21 Web search basics Background and history, Web characteristics, Search user experience, Index size and estimation Web search The structure of the web, Queries and users, Static ranking, Dynamic ranking, Evaluating web search Web structure - The user, paid placement, Search engine optimization / spam T1 (Ch 19 : 385 400) R1 (Ch 15 : 507 540) T4 (Ch 2: 20 23) T4 (Ch 7 : 228 230) 22 Web size measurement - search engine optimization/spam T4 (Ch 5: 91 98) T4 (Ch 7 : 225 230) 23 24 Web Search Architectures Crawling, Meta-crawlers and Focused Crawling Web Crawlers : Crawling the web, Document feeds, Storing documents and detecting duplicates T4 (Ch 4 : 78 85) T3 (Ch 2 : 13 28) T3 (Ch 3 : 31 63) R1 (Ch 15 : 541 549) 25 Index Compression - Statistical properties of terms in IR, Dictionary compression, Postings File compression T1 (Ch 5: 78 96) R3 (Ch 8 : 313 319) 26 XML retrieval Basic XML concepts, Challenges in XML retrieval, A vector space model for XML retrieval T1 (Ch 10 : 178 192) 27 XML retrieval T1 (Ch 10 : 194 198) R1 (Ch 16 : 564 584) Content beyond syllabus covered (if any): IR techniques for the web, including crawling, link-based algorithms, and metadata usage * duration: 50 mins

COURSE DELIVERY PLAN - THEORY Page 4 of 6 Unit : IV WEB SEARCH LINK ANALYSIS AND SPECIALIZED SEARCH Unit Syllabus : Link Analysis hubs and authorities Page Rank and HITS algorithms - Searching and Ranking Relevance Scoring and ranking for Web Similarity - Hadoop & Map Reduce - Evaluation - Personalized search - Collaborative filtering and content-based recommendation of documents and products handling invisible Web - Snippet generation, Summarization, Question Answering, Cross-Lingual Retrieval To be exposed to Link Analysis Understand Hadoop and MapReduce Teachi ng Aids 28 Link Analysis hubs and authorities, Page Rank and HITS algorithms T1 (Ch 21 : 421 439) T3 (Ch 4 : 104 113) 29 Searching and Ranking R4 30 Relevance Scoring and ranking for Web R4 31 Hadoop & Map Reduce Evaluation R4 32 Personalized search R4 33 Collaborative filtering and content-based recommendation of documents and products T4 (Ch 9 : 333-346) T3 (Ch 10 : 432 437) 34 Handling invisible Web, Snippet generation R4 35 Summarization, Question Answering R4 36 Cross-Lingual Retrieval R4 Content beyond syllabus covered (if any): Application : Social Network Analysis * duration: 50 mins

COURSE DELIVERY PLAN - THEORY Page 5 of 6 Unit : V DOCUMENT TEXT MINING Information filtering; organization and relevance feedback Text Mining - Text classification and clustering - Categorization algorithms: naive Bayes; decision trees; and nearest neighbor - Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM). Learn document text mining techniques Teachin g Aids 37 Information filtering R4 38 Organization and relevance feedback R4 39 Text Mining T4 (Ch 7 : 230 237) 40 Text classification and clustering : Categorization algorithms and Clustering T3 (Ch 9 : 339 364) 41 Text classification The text classification problem, Naive Bayes text classification, The Bernoulli model, Properties of Naive Bayes T1 (Ch 13 : 234 251) 42 Text classification Feature selection and Evaluation T1 (Ch 13 : 251 264) 43 Categorization algorithms: Naive Bayes, Decision trees and K-Nearest Neighbor R3 (Ch 7 : 281 294) 44 Agglomerative clustering and K-Means algorithm T3 (Ch 9 : 373 389) 45 Expectation Maximization (EM) algorithm T1(Ch 16: 368 372) Content beyond syllabus covered (if any): Porter Stemming algorithm * duration: 50 mins

COURSE DELIVERY PLAN - THEORY Page 6 of 6 Sub Code / Sub Name: CS6007 INFORMATION RETRIEVAL TEXT BOOKS: 1. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008. 2. Ricardo Baeza -Yates and Berthier Ribeiro - Neto, Modern Information Retrieval: The Concepts and Technology behind Search 2 nd Edition, ACM Press Books 2011 3. Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice, 1 st Edition Addison Wesley, 2009. 4. Mark Levene, An Introduction to Search Engines and Web Navigation, 2 nd Edition Wiley, 2010. REFERENCES: 1. Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, The MIT Press, 2010. 2. Ophir Frieder Information Retrieval: Algorithms and Heuristics: The Information Retrieval Series, 2 nd Edition, Springer, 2004. 3. Manu Konchady, Building Search Applications: Lucene, Ling Pipe, and First Edition, Gate Mustru Publishing, 2008. 4. www.nptel.ac.in