Introduction & Administrivia

Size: px
Start display at page:

Download "Introduction & Administrivia"

Transcription

1 Introduction & Administrivia Information Retrieval Evangelos Kanoulas

2 Section 1: Unstructured data Sec

3 Big Data Growth of global data volume data everywhere! Web data: observation, interaction, transaction Smartphones, personal devices, traces in the real world Sensors, internet of things Scientific and technical challenges: how to make sense of data? Data center, virtualization, storage (no-rdbm), mapreduce, indexing & search, large scale machine learning

4 The Rise of Unstructured Data Business 80% of business is conducted on unstructured data Consumer

5 Media & Sources What types of unstructured information exist? Text: Web pages, books, articles, papers, reports, letters, blogs,? Conversational: s, tweets, comments,... Graphics & images, presentations Speech & video Maps & satellite imagery Local business information, yellow pages Mismatch: given representation in specific medium vs. semantic description of information Semantic gap needs to be bridged to establish relevance.

6 Internet Users December 26

7 The Use of Search Engines 70-80% of users use search engines to find Web sites More than 60% of online shoppers use search engines (and many more other search technologies) [compete.com, US

8 Section 2: A Historic Perspective

9 The Library the knowledge repositories of our civilization Library of Alexandria (280 BC): 700,000 scrolls Vatican Library (1500): 3,600 codices Herzog-August-Bibl.(1661): 116,000 books British Museum (1845): 240,000 books Library of Congress (1990): 100,000,000 docs

10 The Library Organise information using a subject catalogue Sort cards by author Sort cards by title Sort cards by subject How to do this? Librarians argued over which was the best subject catalogue to use

11 At the same time While librarians were coping with the information explosion Could machines help? Could computers help?

12 Pioneers: Memex Vannevar Bush, 1945 Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, memex will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

13 Semantic Gap Hans Peter Luhn, 1957 & 1961 Words of similar or related meaning are grouped into notional families Encoding of documents in terms of notional elements Matching by measuring the degree of notional similarity A common language for annotating documents the faculty of interpretation is beyond the talent of machines. Statistical cues extracted by machines to assist human indexer v H. P. Luhn: A statistical approach to mechanical literature searching, New York, IBM Research Center, 1957.

14 Vector Space Model G. Salton, ies Represent queries and documents by a high-dimensional vector in a word vector space Each word can be associated with a weight Underlying mathematical framework: Geometric v G. Salton, Automatic text processing: The transformation, analysis and retrieval of information by computer. Reading, MA:

15 v Robertson, S. E., & Spärck Jones, K.: Relevance weighting of search terms, Journal of the American Society for Information Science, 27: , v Ponte, Jay M., and W. Bruce Croft A language modeling approach to information retrieval. In Proc. SIGIR, pp ACM Press. Probabilistic Relevance Model M. E. Maron and J. L. Kuhns, 1960 S. E. Robertson and K. Spärck Jones, 1976 J.M. Ponte and W.B. Croft, 1998 View documents and queries as probability distribution over underlying word space; match between prob. distributions Underlying mathematical framework: Probabilistic

16 Web Search Engines L. Page, S. Brin, A. Singhal, many more, 2000 today Underlying mathematical framework: Graph theoretic & Markov Chains Exploit link structure of the Web Exploit usage data Most successful company of all times: Google Index the entire Web, Bs of Web pages Query response 200ms, 2 Trillion queries p.a. in 2013 New engineering discipline: data engineering v L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing order to the web, 1999

17 The Future? Can we make information retrieval systems more intelligent? Can they comprehend and combine the information available? machine reading, text understanding statistics + semantics Can they understand (or anticipate) user intention? use of queries, but also context, user preferences

18 Section 4: Your near Future

19 Your IR Team Evangelos Kanoulas Anne Schuth Tomáš Tunys Tom Kenter

20 Lectures: tentative plan (subject to change) Week 1 Monday, Jan 5 Tuesday, Jan 6 Thursday, Jan 8 Week 2 Monday, Jan 12 Tuesday, Jan 13 Thursday, Jan 15 Week 3 Monday, Jan 19 Tuesday, Jan 20 Thursday, Jan 22 Week 4 Monday, Jan 26 Tuesday, Jan 27 Evaluation Introduction & Administrivia Offline Evaluation Online Evaluation Click Models Relevance Models and Scoring Functions Relevance models Topic Models & Semantic Distance (word2vec) Semantic Matching Combining Evidence Offline Learning to rank Online learning to rank Link Analysis Applications of Information Retrieval Question Answering (factoid & not) Temporal Information Retrieval & Contextual Suggestion

21 Work & Credit Two programming assignments Individuals; 30% of your grade Evaluation measures (due Thursday, Jan. 15) Language models (due Thursday, Jan. 22) Three programming projects Groups of 5; 70% of your grade Evaluation (due Thursday, Jan. 15) Relevance models (due Thursday, Jan. 22) Learning to rank (due Thursday, Jan. 29) No final exam

22 Pre-requisites and Outcomes Pre-requisites Python programming skills Basic knowledge in Information Retrieval Crawling, Parsing & Stemming, Indexing, Compression, Scoring Functions Basic knowledge in NLP and Machine Learning Outcomes Practical familiarity with range of text analysis technologies Understanding of theoretical models underlying these tools Competence (and courage!) in reading research literature

23 Learning resources Lecture notes are primary resources No text book as such, but following texts are useful: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schtze, Introduction to Information Retrieval, Cambridge University Press (Available free online) Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press W. Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley Information Retrieval Surveys (Available free online) Citations to other readings will be given as required

Introduction to Information Retrieval. Hongning Wang

Introduction to Information Retrieval. Hongning Wang Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be

More information

Information Retrieval

Information Retrieval Information Retrieval Course presentation João Magalhães 1 Relevance vs similarity Multimedia documents Information retrieval application Query Documents Information side User side What is the best [search

More information

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.

More information

CS290N Summary Tao Yang

CS290N Summary Tao Yang CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines: Information Retrieval in Practice, Publisher: Addison-Wesley, 2010. Book website. [MRS] Christopher

More information

CS506/606 - Topics in Information Retrieval

CS506/606 - Topics in Information Retrieval CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403

More information

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016 CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview

More information

Information Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University

Information Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University Information Retrieval CS6200 Jesse Anderton College of Computer and Information Science Northeastern University What is Information Retrieval? You have a collection of documents Books, web pages, journal

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12 Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency

More information

Vannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17

Vannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17 Information Retrieval Vannevar Bush Director of the Office of Scientific Research and Development (1941-1947) Vannevar Bush,1890-1974 End of WW2 - what next big challenge for scientists? 1 Historic Vision

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 1: Introduction October 23 rd, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig What is Information

More information

Introduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline

Introduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

What is Information Retrieval (IR)? Information Retrieval vs. Databases. What is Information Retrieval (IR)? Why Should I Know about All This?

What is Information Retrieval (IR)? Information Retrieval vs. Databases. What is Information Retrieval (IR)? Why Should I Know about All This? What is Information Retrieval (IR)? Information Retrieval and Web Search Engines Lecture 1: Introduction November 5, 2008 Wolf-Tilo Balke with Joachim Selke Institut für Informationssysteme Technische

More information

CS 4317: Human-Computer Interaction

CS 4317: Human-Computer Interaction September 8, 2017 Tentative Syllabus CS 4317: Human-Computer Interaction Spring 2017 Tuesday & Thursday, 9:00-10:20, Psychology Building, room 308 Instructor: Nigel Ward Office: CCS 3.0408 Phone: 747-6827

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four

More information

Lecture 1: Course Introduction

Lecture 1: Course Introduction Lecture 1: Course Introduction CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mike Freedman & Amin Vahdat Logistics Instructor: Alex C. Snoeren Office hours Friday 10:00-11:00am or by

More information

COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang

COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era Spring 2016 Instructor: Grace Hui Yang The Web provides abundant information which allows us to live more conveniently and

More information

: Semantic Web (2013 Fall)

: Semantic Web (2013 Fall) 03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet

More information

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen

More information

Search Engines Information Retrieval in Practice

Search Engines Information Retrieval in Practice Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. ----- PEARSON Boston Columbus Indianapolis

More information

Information Retrieval and Extraction

Information Retrieval and Extraction Information Retrieval and Extraction Berlin Chen (Picture from the TREC web site) Textbooks Textbook and References R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman,

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST) Programming Concepts & Algorithms Course Syllabus Course Title Course Code Computer Department Pre-requisites Course Code Course Instructor Programming Concepts & Algorithms + lab CPE 405C Computer Department

More information

Search Engine Architecture. Hongning Wang

Search Engine Architecture. Hongning Wang Search Engine Architecture Hongning Wang CS@UVa CS@UVa CS4501: Information Retrieval 2 Document Analyzer Classical search engine architecture The Anatomy of a Large-Scale Hypertextual Web Search Engine

More information

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Praveen Pathak Michael Gordon Weiguo Fan Purdue University University of Michigan pathakp@mgmt.purdue.edu mdgordon@umich.edu

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14

Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 The Course Lecturers Klaus Berberich kberberi@mpi-inf.mpg.de Teaching Assistants Pauli Miettinen pmiettin@mpi-inf.mpg.de

More information

University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE)

University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) Course Outline Program: Course Title: Computer Science and Engineering (CSE) Object Oriented Programming I: Java Course

More information

Abstract. 1. Introduction

Abstract. 1. Introduction A Visualization System using Data Mining Techniques for Identifying Information Sources on the Web Richard H. Fowler, Tarkan Karadayi, Zhixiang Chen, Xiaodong Meng, Wendy A. L. Fowler Department of Computer

More information

Development of Search Engines using Lucene: An Experience

Development of Search Engines using Lucene: An Experience Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience

More information

Information Retrieval and Extraction

Information Retrieval and Extraction Information Retrieval and Extraction Berlin Chen (Picture from the TREC web site) Objectives of this Course Elaborate on the fundamentals of information retrieval (IR), a almost fifty-year-old discipline

More information

21. Search Models and UIs for IR

21. Search Models and UIs for IR 21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

Dynamic Visualization of Hubs and Authorities during Web Search

Dynamic Visualization of Hubs and Authorities during Web Search Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American

More information

Information Retrieval

Information Retrieval Information Retrieval Overview and Introduction All slides unless specifically mentioned are copyright Anton Leuski & Donald Metzler 1 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions

More information

Semi-Parametric and Non-parametric Term Weighting for Information Retrieval

Semi-Parametric and Non-parametric Term Weighting for Information Retrieval Semi-Parametric and Non-parametric Term Weighting for Information Retrieval Donald Metzler 1 and Hugo Zaragoza 1 Yahoo! Research {metzler,hugoz}@yahoo-inc.com Abstract. Most of the previous research on

More information

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University A Machine Learning Approach for Information Retrieval Applications Luo Si Department of Computer Science Purdue University Why Information Retrieval: Information Overload: Since the introduction of digital

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Web Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University

Web Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction

More information

Documents Retrieval Using the Combination of Two Keywords

Documents Retrieval Using the Combination of Two Keywords Documents Retrieval Using the Combination of Two Keywords Rohitash Chandra Bhensle, Saikiran Chepuri, Menta Snjeeva Avinash M. Tech. Scholar (Software Technology) VIT University Vellore, Tmilnadu, India

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Window Extraction for Information Retrieval

Window Extraction for Information Retrieval Window Extraction for Information Retrieval Samuel Huston Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA, 01002, USA sjh@cs.umass.edu W. Bruce Croft Center

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and

More information

INF5890 IT and Management. Introduction 16 th January Margunn Aanestad, Bendik Bygstad, Mikael Hailu Gebremariam, Mwiza Kumwenda

INF5890 IT and Management. Introduction 16 th January Margunn Aanestad, Bendik Bygstad, Mikael Hailu Gebremariam, Mwiza Kumwenda INF5890 IT and Management Introduction 16 th January 2017 Margunn Aanestad, Bendik Bygstad, Mikael Hailu Gebremariam, Mwiza Kumwenda About the course Practicalities Course overview (format, important dates

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

1DL321: Kompilatorteknik I (Compiler Design 1)

1DL321: Kompilatorteknik I (Compiler Design 1) Administrivia 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/ht16

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval WS 2008/2009 25.11.2008 Information Systems Group Mohammed AbuJarour Contents 2 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML)

More information

Definitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy

Definitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Definitions Instructor: Walid Magdy 19-Sep-2017 Lecture Objectives Learn about main concepts in IR Document Information need Query Index BOW 2 1 IR in a nutshell

More information

Human-Computer Interaction (CS4317/5317)

Human-Computer Interaction (CS4317/5317) August 4, 2006 Syllabus Human-Computer Interaction (CS4317/5317) Fall 2006 Tuesday & Thursday, 3:00 4:20, Computer Science room 321 Instructor: Nigel Ward Office: Comp 206 Phone: 747-6827 E-mail nigel@cs.utep.edu

More information

Lecture 5: Information Retrieval using the Vector Space Model

Lecture 5: Information Retrieval using the Vector Space Model Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

Course Design Document: IS202 Data Management. Version 4.5

Course Design Document: IS202 Data Management. Version 4.5 Course Design Document: IS202 Data Management Version 4.5 Friday, October 1, 2010 Table of Content 1. Versions History... 4 2. Overview of the Data Management... 5 3. Output and Assessment Summary... 6

More information

Semantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September

Semantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,

More information

CPSC 2380 Data Structures and Algorithms

CPSC 2380 Data Structures and Algorithms CPSC 2380 Data Structures and Algorithms Spring 2014 Department of Computer Science University of Arkansas at Little Rock 2801 South University Avenue Little Rock, Arkansas 72204-1099 Class Hours: Tuesday

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree Accreditation & Quality Assurance Center Curriculum for Doctorate Degree 1. Faculty King Abdullah II School for Information Technology 2. Department Computer Science الدكتوراة في علم الحاسوب (Arabic).3

More information

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection

More information

CS-490WIR Web Information Retrieval and Management. Luo Si

CS-490WIR Web Information Retrieval and Management. Luo Si CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces

More information

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Administrivia Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/h18

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Focused Retrieval Using Topical Language and Structure

Focused Retrieval Using Topical Language and Structure Focused Retrieval Using Topical Language and Structure A.M. Kaptein Archives and Information Studies, University of Amsterdam Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands a.m.kaptein@uva.nl Abstract

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

Looking back: On relevance, probabilistic indexing and information retrieval

Looking back: On relevance, probabilistic indexing and information retrieval Available online at www.sciencedirect.com Information Processing and Management 44 (2008) 963 970 www.elsevier.com/locate/infoproman Looking back: On relevance, probabilistic indexing and information retrieval

More information

CSCE 441 Computer Graphics Fall 2018

CSCE 441 Computer Graphics Fall 2018 CSCE 441 Computer Graphics Fall 2018 Meetings: Monday, Wednesday, Friday 9:10-10:00 a.m. Location: HRBB 113 Instructor: Dr. John Keyser Office: 527C, H.R. Bright Building Phone: 458-0167 Email: keyser@cse.tamu.edu

More information

CS 200, Section 1, Programming I, Fall 2017 College of Arts & Sciences Syllabus

CS 200, Section 1, Programming I, Fall 2017 College of Arts & Sciences Syllabus Northeastern Illinois University CS 200, Section 1, Programming I, Fall 2017 Syllabus, Page 1 of 7 CS 200, Section 1, Programming I, Fall 2017 College of Arts & Sciences Syllabus COURSE INFORMATION: Credit

More information

How to Use Google Scholar An Educator s Guide

How to Use Google Scholar An Educator s Guide http://scholar.google.com/ How to Use Google Scholar An Educator s Guide What is Google Scholar? Google Scholar provides a simple way to broadly search for scholarly literature. Google Scholar helps you

More information

Google technology for teachers

Google technology for teachers Google technology for teachers Sandhya Digambar Shinde Assistant Professor, Department of Library and Information Science, Jayakar Library, University of Pune-411007 Pune, Maharashtra, India srmaharnor@unipune.ac.in

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

Part A: Course Outline

Part A: Course Outline University of Macau Faculty of Science and Technology Course Title: Department of Electrical and Computer Engineering Part A: Course Outline Communication System and Data Network Course Code: ELEC460 Year

More information

University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) Course Outline

University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) Course Outline University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) Course Outline Program: Course Title: Computer Networks Sessional Course Code: CSE 448 Semester: Level: Spring-2018

More information

Implementation of the common phrase index method on the phrase query for information retrieval

Implementation of the common phrase index method on the phrase query for information retrieval Implementation of the common phrase index method on the phrase query for information retrieval Triyah Fatmawati, Badrus Zaman, and Indah Werdiningsih Citation: AIP Conference Proceedings 1867, 020027 (2017);

More information

CS/INFO 1305 Summer 2009

CS/INFO 1305 Summer 2009 Information Retrieval Information Retrieval (Search) IR Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945

More information

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Social Information Retrieval

Social Information Retrieval Social Information Retrieval Sebastian Marius Kirsch kirschs@informatik.uni-bonn.de th November 00 Format of this talk about my diploma thesis advised by Prof. Dr. Armin B. Cremers inspired by research

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and

More information

GDSA - Audiovisual Signal Management and Distribution

GDSA - Audiovisual Signal Management and Distribution Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2018 205 - ESEIAAT - Terrassa School of Industrial, Aerospace and Audiovisual Engineering 739 - TSC - Department of Signal Theory

More information

University of Asia Pacific (UAP) Department of Electrical and Electronics Engineering (EEE) Course Outline

University of Asia Pacific (UAP) Department of Electrical and Electronics Engineering (EEE) Course Outline University of Asia Pacific (UAP) Department of Electrical and Electronics Engineering (EEE) Course Outline Program: Course Title: Electrical and Electronics Engineering (EEE) Computer Networks Course Code:

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

Oleksandr Kuzomin, Bohdan Tkachenko

Oleksandr Kuzomin, Bohdan Tkachenko International Journal "Information Technologies Knowledge" Volume 9, Number 2, 2015 131 INTELLECTUAL SEARCH ENGINE OF ADEQUATE INFORMATION IN INTERNET FOR CREATING DATABASES AND KNOWLEDGE BASES Oleksandr

More information

Research Topics in Information Retrieval

Research Topics in Information Retrieval Research Topics in Information Retrieval Cristina Ribeiro Sérgio Nunes FEUP / INESC TEC Information Systems Research Group http://infolab.fe.up.pt Information Retrieval "Information retrieval (IR) is finding

More information

CIS 120. Introduction to Programming

CIS 120. Introduction to Programming CIS 120 Introduction to Programming Approved: May 6, 2011 EFFECTIVE DATE: Fall 2011 COURSE PACKAGE FORM Contact Person (s) Matt Butcher, Andra Goldberg, Dave White, Steve Sorden Date of proposal to Curriculum

More information

Online the Library

Online the Library WWW.TCD.IE/LIBRARY Online Resources @ the Library Michaelmas Term 2012 Trinity College Library Dublin 1 Learning Outcomes By the end of today s session you will know more about: Online resources in TCD

More information

USC Viterbi School of Engineering

USC Viterbi School of Engineering Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

BEng (Hons) Mechanical Engineering - E440 (Under Review)

BEng (Hons) Mechanical Engineering - E440 (Under Review) BEng (Hons) Mechanical Engineering - E440 (Under Review) 1.0 Introduction Mechanical Engineering is the historical root of engineering practice. It gave its name to the realm of technology-based problem-solving,

More information

CS 3030 Scripting Languages Syllabus

CS 3030 Scripting Languages Syllabus General Information CS 3030 Scripting Languages Semester: Fall 2017 Textbook: Location: Instructor Info: None. We will use freely available resources from the Internet. Online Ted Cowan tedcowan@weber.edu

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Palimpsest: Improving Assisted Curation of Loco-specific Literature

Palimpsest: Improving Assisted Curation of Loco-specific Literature Palimpsest: Improving Assisted Curation of Loco-specific Literature Beatrice Alex, Claire Grover, Jon Oberlander, Ke Zhou, Uta Hinrichs* ILCC, School of Informatics, University of Edinburgh [balex][grover][jon]@inf.ed.ac.uk,

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015 San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015 Course and Contact Information Instructor: Ron Gutman Office Location:

More information