Information Retrieval
|
|
- Hillary Parker
- 6 years ago
- Views:
Transcription
1 Information Retrieval Overview and Introduction All slides unless specifically mentioned are copyright Anton Leuski & Donald Metzler 1
2 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions of IR Course goals 2
3 What? CSCI 599. Special Topics. Applications of Natural Language Processing: Information Retrieval Two other Applications of Natural Language Processing (NLP) courses Machine Translation Information Extraction Related courses CSCI 544. Natural Language Processing CSCI 562. Empirical Methods in Natural Language Processing CSCI 572. Information Retrieval and Web Search Engines CSCI 599. Data Mining and Statistical Inference CSCI 599. Social Media Analysis 3
4 Who? Anton Leuski Institute for Creative Technologies Donald Metzler Information Sciences Institute 4
5 Where? Here: GFS 118 Web: nld/ir-class/ schedule lecture notes homework assignments discussions 5
6 When? Every Tuesday and Thursday, 3:30-4:50 PM. Office hours: after each lecture See the schedule on the web site for more details 6
7 Grading 3 programming/homework assignments: 30% Midterm exam: 20% Final exam: 20% Final project: 25% Discussion participation: 5% 7
8 Assignments Homework tasks might include modifying the "ranking function" or "indexer" of an open source information retrieval toolkit (Lucene) for some search task writing code to cluster documents based on their similarity writing code to automatically evaluate the quality of search results developing a system to automatically summarize a stream of Twitter messages Framework: Lucene Java-based, open source search engine Final project we would announce a number of topics to choose from at the middle of the semester you could create your own project, but the topic has to be approved by us 8
9 Reading Books W. B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice S. Buettcher, C. L. A. Clarke, G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines C. D. Manning, P. Raghavan and H. Schütze. Introduction to Information Retrieval C. J. van Rijsbergen. Information Retrieval I. H. Witten, A. Moffat, T. C. Bell. Managing Gigabytes A. Moffat, J. Zobel, D. Hawking. Recommended Reading for IR Research Students Papers TBA 9
10 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions of IR Course goals 10
11 Example: Web Search Document (web page) retrieval in response to a query Quite effective (at some things) Highly visible (mostly) Commercially successful (some of them) Is that it?.. 11
12 Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information. (Salton, 1968) To solve the information overload problem IR is interdisciplinary computer sciences mathematics information science information architecture cognitive psychology linguistics statistics 12
13 History Since the beginning of the written word people tried to organize information 3rd BC: Library of Alexandria 1689: Vincentius Placcius invented a note-taking machine. 1880s-1890s: Herman Hollerith invents the recording of data on a machine readable medium 1920s-1930s: Emanuel Goldberg submits patents for his "Statistical Machine a document search engine that used photoelectric cells and pattern recognition to search the metadata on rolls of microfilmed documents. 13
14 History 1945: MEMory EXtender by Vannevar Bush hypothetical electro-mechanical system Proto-hypertext lateral browsing of microfilms following links between individual frames associative trails Features extending storing consulting the records Missing features search metadata 14
15 History 1950: The term information retrieval appears to have been coined by Calvin Mooers. 1950s-1960s: First automated IR systems. SMART MEDLARS MeSH 1970s: First online IR systems. MEDLINE. Lockheed's Dialog. Hypertext. 1978: 1st SIGIR conference 1989: WWW proposals 1992: 1st TREC conference late 1990s: First Web search engines 15
16 Knowledge Navigator An IR system mockup by Apple Computers from 1988 A device that can access a large networked database of hypertext information, and use software agents to assist searching for information v=qrh8eimu_20 16
17 17
18 IR is not databases Data structured Databases unstructured IR Fields well-defined semantics (SSN, age,...) Queries well-formed (relational algebra, SQL) Matching exact (results are always correct ) SELECT * FROM Accounts WHERE balance > 50,000 ORDER BY name; no well-defined semantics (text fields) free text, some fuzzy operators imprecise bank scandals in California 18
19 Question: Is grep an IR system? 19
20 Example: Text Matching How do you measure aboutness? Exact matching of words is not enough Many different ways to write the same thing in a natural language like English e.g., does a news story containing the text bank director in LA steals funds match the query bank scandals in California? Some stories will be better matches than others 20
21 Search Process Information need query text objects representation representation indexed objects indexed objects comparison evaluation/ feedback retrieved objects 21
22 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions of IR Course goals 22
23 Issues in IR relevance Information need query text objects Information need and user interaction Relevance Representation representation indexed objects representation indexed objects Comparison Evaluation comparison evaluation/feedback retrieved objects 23
24 Users and Information Needs Search is user centered Keyword queries are often poor descriptions of actual information needs Interaction and context are important for understanding user intent Query refinement techniques such as query expansion, query suggestion, relevance feedback improve ranking 24
25 Relevance What is it? Simple (and simplistic) definition: A relevant document contains the information that a person was looking for when they submitted a query to the search engine Many factors influence a personʼs decision about what is relevant: e.g., task, context, novelty, style Topical relevance (same topic) vs. user relevance (everything else) Retrieval models define a view of relevance Relevance Ranking algorithms used in search engines are based on retrieval models Most models describe statistical properties of text rather than linguistic i.e. counting simple text features such as words instead of parsing and analyzing the sentences Statistical approach to text processing started with Luhn in the 50s Linguistic features can be part of a statistical model 25
26 Representation Most successful approaches are statistical directly, or an effort to capture and use word probabilities Why not natural language understanding? computer understands documents and query and matches them state of the art is brittle in unrestricted domains can be highly successful in predictable settings, though information extraction on terrorism/takeovers (MUC) medical or legal settings with restricted vocabulary Could use manually assigned headings e.g., Library of Congress headings, Dewey Decimal headings expensive and human agreement is not good hard to predict what headings are interesting Statistical and not lexical count words lexical information plays secondary role 26
27 Example: Bag of Words Ignoring the word order Popular and effective Similar vocabulary similar content Consider reordering words in a headline Random: beating takes points falling another Dow 355 Alphabetical: 355 another beating Dow falling points Interesting : Dow points beating falling 355 another Original: Dow takes another beating, falling 355 points 27
28 What is this about? 16 said$ 14 McDonalds 12 fat$ 11 fries 8 new" 6 company french nutrition 5 food oil percent reduce taste Tuesday 4 amount change health Henstenburg make obesity 3 acids consumer fatty polyunsaturated US 2 amounts artery Beemer cholesterol clogging director down eat estimates expert fast formula impact initiative moderate plans restaurant saturated trans win 1... added addition adults advocate affect afternoon age Americans Asia battling beef bet brand Britt Brook Browns calorie center chain chemically crispy customers cut vegetable weapon weeks Wendys Wootan worldwide years York Copyright James Allan 28
29 The start of the original McDonald's slims down spuds Fast-food chain to reduce certain types of fat in its french fries with new cooking oil. NEW YORK (CNN/Money) - McDonald's Corp. is cutting the amount of "bad" fat in its french fries nearly in half, the fast-food chain said Tuesday as it moves to make all its fried menu items healthier. But does that mean the popular shoestring fries won't taste the same? The company says no. "It's a win-win for our customers because they are getting the same great french-fry taste along with an even healthier nutrition profile," said Mike Roberts, president of McDonald's USA. But others are not so sure. McDonald's will not specifically discuss the kind of oil it plans to use, but at least one nutrition expert says playing with the formula could mean a different taste. Shares of Oak Brook, Ill.-based McDonald's (MCD: down $0.54 to $23.22, Research, Estimates) were lower Tuesday afternoon. It was unclear Tuesday whether competitors Burger King and Wendy's International (WEN: down $0.80 to $34.91, Research, Estimates) would follow suit. Neither company could immediately be reached for comment Copyright James Allan 29
30 The Point? Basis of most IR is a very simple approach find words in documents compare them to words in a query this approach is very effective! Other types of features are often used phrases link structure named entities (people, locations, organizations) special features (chemical names, product names) Focus is on improving accuracy, speed and on extending ideas elsewhere 30
31 Comparison Retrieval model provide a mathematical framework for defining the matching process includes explanation of assumptions basis of many ranking algorithms can be implicit Some models that we will cover boolean vector space inference networks language models relevance models 31
32 Evaluation Experimental procedures and measures for comparing system output with user expectations Originated in Cranfield experiments in the 60s IR evaluation methods now used in many fields Typically use test collection of documents, queries, and relevance judgments Most commonly used are TREC collections Recall and precision are two examples of effectiveness measures 32
33 IR is not Search Engines A search engine is the practical application of information retrieval techniques to large scale text collections Information Retrieval Information needs User interaction Relevance Effective ranking Representation How to represent things Comparison How to match things Evaluation Testing and measuring Search Engines Performance Efficient search and indexing Incorporating new data Coverage and freshness Scalability Growing with data and users Adaptability Tuning for applications Specific problems e.g. Spam 33
34 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions of IR Course goals 34
35 Dimensions of IR IR is not just for the Web IR is not just search 3 dimensions: data application/domain task 35
36 Data Text Multiple languages accessing Chinese collection using English Scanned Text (handwritten or typed) either word images or OCRed text with errors Images features? Video features? Speech (audio) ASR output (with errors) Music features? 36
37 Application Web Enterprise like web, but smaller, more focused, more controlled Desktop smaller scale; different file formats; very user-centered Forums shorter than web; threads; typos; Social/twitter short; threads; typos; P2P distributed aspects 37
38 Application (continued) Literature the original domain; cross-references; citations Legal specific language; well-defined guidelines; Medical similar to legal; unusual vocabulary; Personal Information Management (PIM) contacts and schedules 38
39 Tasks Search collection is static, queries are dynamic Filtering & Routing think newswire; query is static, documents are dynamic Detection & Tracking newswire again; new topic discovery and tracking Classification & Clustering grouping similar documents together for analysis Summarization locating most important pieces Question answering factual information; Collaborative recommender systems; think Amazon reviews. multi-agent search 39
40 Dimensions of IR Content Applications Tasks Text Web Search Multiple languages Enterprise Filtering & Routing Scanned Text (handwritten or typed) Desktop Detection & tracking Images Forum Classification Video P2P Question answering Speech (audio) Literature Summarization Music Legal Collaborative PIM 40
41 Assignment Watch the Knowledge Navigator video Think about how would you build such a system what are the tasks the system performs? what are the challenges? Write down what IR dimensions that we mentioned are covered in the video? what dimensions are covered that we did not mention? 41
42 Administrativa What is Information Retrieval (IR)? Issues in IR Dimensions of IR Course goals 42
43 Course Goals Understand what IR is Analyze core issues... and how they vary under different conditions... Consider core solutions... and how they can be applied under different conditions... Acquire some practical skills how to apply that knowledge 43
44 Schedule Core IR search engines architecture text processing indexes retrieval models evaluation user modeling Topics in IR filtering multimedia: image & audio cross-lingual web search & advertising distributed & p2p question answering social semi-structured 44
45 Summary IR is a large interdisciplinary field with a long history IR deals with many different data types, applications, and tasks At the core of IR is the match or comparison operation 45
Information Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationInformation Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University
Information Retrieval CS6200 Jesse Anderton College of Computer and Information Science Northeastern University What is Information Retrieval? You have a collection of documents Books, web pages, journal
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationCS-490WIR Web Information Retrieval and Management. Luo Si
CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces
More informationMap Reduce.
Map Reduce dacosta@irit.fr Divide and conquer at PaaS Second Third Fourth 100 % // Fifth Sixth Seventh Cliquez pour 2 Typical problem Second Extract something of interest from each MAP Third Shuffle and
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationCS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University
CS490W: Web Information Search & Management CS-490W Web Information Search and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces between
More informationCSCI 5417 Information Retrieval Systems! What is Information Retrieval?
CSCI 5417 Information Retrieval Systems! Lecture 1 8/23/2011 Introduction 1 What is Information Retrieval? Information retrieval is the science of searching for information in documents, searching for
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationBuilding Test Collections. Donna Harman National Institute of Standards and Technology
Building Test Collections Donna Harman National Institute of Standards and Technology Cranfield 2 (1962-1966) Goal: learn what makes a good indexing descriptor (4 different types tested at 3 levels of
More informationCS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University
CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen
More informationCS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016
CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"
CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationText Technologies. What you will learn. Information Retrieval (IR) Overview. How to build a search engine. How to evaluate a search algorithm
What you will learn Text Technologies Victor Lavrenko vlavrenk@inf Monday 16 th September 2013 How to build a search engine which search results to rank at the top how to do it fast and on a massive scale
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationDefinitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Definitions Instructor: Walid Magdy 19-Sep-2017 Lecture Objectives Learn about main concepts in IR Document Information need Query Index BOW 2 1 IR in a nutshell
More informationDevelopment of Search Engines using Lucene: An Experience
Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationDepartment of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _
COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.
More informationInformation Retrieval
Information Retrieval Course presentation João Magalhães 1 Relevance vs similarity Multimedia documents Information retrieval application Query Documents Information side User side What is the best [search
More informationEnhancing applications with Cognitive APIs IBM Corporation
Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationSciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus
Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationUser-Centered Analysis & Design
User-Centered Analysis & Design Section Topic Slides Study Guide Quick References (QR) Introduction UCA vs. UT 1-26 12 Comparing Analysis and Testing Approaches ROI 1-29 7 Formulas for Calculating ROI
More informationJames Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!
James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation
More informationUnstructured Data. CS102 Winter 2019
Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data
More informationContent Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.
Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything
More informationQuery Refinement and Search Result Presentation
Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the
More informationINFSCI 2140 Information Storage and Retrieval Lecture 1: Introduction. INFSCI 2140 and your program
INFSCI 2140 Information Storage and Retrieval Lecture 1: Introduction Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ INFSCI 2140 and your program Foundation course One of the key courses
More informationCLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval
DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 1: Introduction October 23 rd, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig What is Information
More informationCMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi
CMPT 354 Database Systems I Spring 2012 Instructor: Hassan Khosravi Textbook First Course in Database Systems, 3 rd Edition. Jeffry Ullman and Jennifer Widom Other text books Ramakrishnan SILBERSCHATZ
More informationInteraction Style Categories. COSC 3461 User Interfaces. What is a Command-line Interface? Command-line Interfaces
COSC User Interfaces Module 2 Interaction Styles What is a Command-line Interface? An interface where the user types commands in direct response to a prompt Examples Operating systems MS-DOS Unix Applications
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationUniversity of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE)
University of Asia Pacific (UAP) Department of Computer Science and Engineering (CSE) Course Outline Program: Course Title: Computer Science and Engineering (CSE) Object Oriented Programming I: Java Course
More informationMaximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009
Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationFall Principles of Knowledge Discovery in Databases. University of Alberta
Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55
More informationPromoting Website CS 4640 Programming Languages for Web Applications
Promoting Website CS 4640 Programming Languages for Web Applications [Jakob Nielsen and Hoa Loranger, Prioritizing Web Usability, Chapter 5] [Sean McManus, Web Design, Chapter 15] 1 Search Engine Optimization
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending
More informationBeyond Content-Based Retrieval:
Beyond Content-Based Retrieval: Modeling Domains, Users, and Interaction Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais IEEE ADL 99 - May 21, 1998 Research in IR at MS Microsoft
More informationDL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza
DL User Interfaces Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Delos work on DL interfaces Delos Cluster 4: User interfaces and visualization Cluster s goals:
More informationInformation Retrieval
Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationIS 331-Fall 2017 Database Design, Management and Applications
Instructor: Todd Will Office: GITC 5100 IS 331-Fall 2017 Database Design, Management and Applications E-Mail: todd.will@njit.edu Office Hours: Course Date/Time: Moodle Tuesdays and Thursdays, 5 to 6PM,
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationCSIS 104 Introduction to Computer Science
CSIS 104 Introduction to Computer Science Lecture 1: Administrative Stuff The Definition of Computer Science Informal and Formal Definitions of Algorithms Prof. Dr. Slim Abdennadher slim.abdennadher@guc.edu.eg
More informationInformation Retrieval
Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More informationWhat is Information Retrieval (IR)? Information Retrieval vs. Databases. What is Information Retrieval (IR)? Why Should I Know about All This?
What is Information Retrieval (IR)? Information Retrieval and Web Search Engines Lecture 1: Introduction November 5, 2008 Wolf-Tilo Balke with Joachim Selke Institut für Informationssysteme Technische
More informationTowards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation
Question ing Overview and task definition History Open-domain question ing Basic system architecture Watson s architecture Techniques Predictive indexing methods Pattern-matching methods Advanced techniques
More informationPrior Art Retrieval Using Various Patent Document Fields Contents
Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id
More informationSEARCH TECHNIQUES: BASIC AND ADVANCED
17 SEARCH TECHNIQUES: BASIC AND ADVANCED 17.1 INTRODUCTION Searching is the activity of looking thoroughly in order to find something. In library and information science, searching refers to looking through
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationDepartment of Electronic Engineering FINAL YEAR PROJECT REPORT
Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:
More informationChapter 3: Google Penguin, Panda, & Hummingbird
Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites
More informationCOSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang
COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era Spring 2016 Instructor: Grace Hui Yang The Web provides abundant information which allows us to live more conveniently and
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationINFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE
15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find
More informationDocument Clustering for Mediated Information Access The WebCluster Project
Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationCSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601
CSC 261/461 Database Systems Fall 2017 MW 12:30 pm 1:45 pm CSB 601 Agenda Administrative aspects Brief overview of the course Introduction to databases and SQL ADMINISTRATIVE ASPECTS Teaching Staff Instructor:
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationINF 315E Introduction to Databases School of Information Fall 2015
INF 315E Introduction to Databases School of Information Fall 2015 Class Hours: Tuesday & Thursday10:30 am-12:00 pm Instructor: Eunyoung Moon Email: eymoon@utexas.edu Course Description Almost every website
More informationThis session will provide an overview of the research resources and strategies that can be used when conducting business research.
Welcome! This session will provide an overview of the research resources and strategies that can be used when conducting business research. Many of these research tips will also be applicable to courses
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Graph Data & Introduction to Information Retrieval Huan Sun, CSE@The Ohio State University 11/21/2017 Slides adapted from Prof. Srinivasan Parthasarathy @OSU 2 Chapter 4
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationYork University at CLEF ehealth 2015: Medical Document Retrieval
York University at CLEF ehealth 2015: Medical Document Retrieval Andia Ghoddousi Jimmy Xiangji Huang Information Retrieval and Knowledge Management Research Lab Department of Computer Science and Engineering
More informationInformation Retrieval and Extraction
Information Retrieval and Extraction Berlin Chen (Picture from the TREC web site) Objectives of this Course Elaborate on the fundamentals of information retrieval (IR), a almost fifty-year-old discipline
More informationModelling Structures in Data Mining Techniques
Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationInformation Retrieval and Extraction
Information Retrieval and Extraction Berlin Chen (Picture from the TREC web site) Textbooks Textbook and References R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman,
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationCC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018
CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 Lecture 1: Introduction Aidan Hogan aidhog@gmail.com THE VALUE OF DATA Soho, London, 1854 Cholera: What we know now Cholera: What we knew in 1854 1854:
More information