Modern Information Retrieval

Size: px
Start display at page:

Download "Modern Information Retrieval"

Transcription

1

2 Modern Information Retrieval The Concepts and Technology behind Search Ricardo Baeza-Yates Berthier Ribeiro-Neto Second edition Addison-Wesley Harlow, England Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Amsterdam Bonn Sydney Singapore Tokyo Madrid San Juan Milan Mexico City Seoul Taipei

3 To be filled by Pearson

4 To Helena, Rosa, and our children Amo los libros exploradores, libros con bosque o nieve, profundidad o cielo Un libro, un libro lleno de contactos humanos, de camisas, un libro sin soledad, con hombres y herramientas, un libro es la victoria. de Oda al Libro (I) y (II), en Odas Elementales, Pablo Neruda Ilovebooksthatexplore, books with a forest or snow, depth or sky Abook,abookfull of human contacts, of shirts, abookwithoutsolitude, with people and tools, abookisthevictory. from Ode to the Book (I) and (II), in Elemental Odes, Pablo Neruda território de homens livres que será nossopaís eserápátria de todos. Irmãos, cantai esse mundo que não verei, mas virá um dia, dentro de mil anos, talvez mais... não tenho pressa. de Cidade Prevista no livro ARosadoPovo,1945. Carlos Drummond de Andrade territory of free men that will be our country and will be the nation of all Brothers, sing that world which I ll not see, but which will come one day, in a thousand years, maybe more... no hurry. from Prevised City in the book The Rose of the People, Carlos Drummond de Andrade

5

6 Contents Preface to the Second Edition Preface to the First Edition Author s Acknowledgements to the Second Edition Author s Acknowledgements to the First Edition Publisher s Acknowledgements xix xxi xxiii xxv xxvii 1 Introduction Information Retrieval Early Developments Information Retrieval in Libraries and Digital Libraries IR at the Center of the Stage The IR Problem The User s Task Information versus Data Retrieval The IR System Software Architecture of the IR System The Retrieval and Ranking Processes The Web A Brief History The e-publishing Era How the Web Changed Search Practical Issues on the Web Organization of the Book Focus of the Book Book Contents The Book Web Site: A Teaching Resource Bibliographic Discussion User Interfaces for Search 21 by Marti Hearst 2.1 Introduction How People Search vii

7 viii CONTENTS Information Lookup versus Exploratory Search Classic versus Dynamic Model of Information Seeking Navigation versus Search Observations of the Search Process Search Interfaces Today Getting Started Query Specification Query Specification Interfaces Retrieval Results Display Query Reformulation Organizing Search Results Visualization in Search Interfaces Visualizing Boolean Syntax Visualizing Query Terms within Retrieval Results Visualizing Relationships Among Words and Documents Visualization for Text Mining Design and Evaluation of Search Interfaces Trends and Research Issues Bibliographic Discussion Modeling IR Models Modeling and Ranking Characterization of an IR Model A Taxonomy of IR Models Classic Information Retrieval Basic Concepts The Boolean Model Term Weighting TF-IDF Weights Document Length Normalization The Vector Model The Probabilistic Model Brief Comparison of Classic Models Alternative Set Theoretic Models Set-Based Model Extended Boolean Model Fuzzy Set Model Alternative Algebraic Models Generalized Vector Space Model Latent Semantic Indexing Model Neural Network Model Alternative Probabilistic Models BM Language Models Divergence from Randomness Bayesian Network Models

8 CONTENTS ix 3.6 Other Models The Hypertext Model Web based Models Structured Text Retrieval Multimedia Retrieval Enterprise and Vertical Search Trends and Research Issues Bibliographic Discussion Retrieval Evaluation Introduction The Cranfield Paradigm A Brief History Reference Collections Retrieval Metrics Precision and Recall Single Value Summaries: P@n, MAP, MRR, F User-Oriented Measures DCG: Discounted Cumulated Gain BPREF: Binary Preferences Rank Correlation Metrics Reference Collections The TREC Collections Other Reference Collections Other Small Test Collections User-Based Evaluation Human Experimentation in the Lab Side-by-Side Panels A/B Testing Crowdsourcing Evaluation using Clickthrough Data Practical Caveats Trends and Research Issues Bibliographic Discussion Relevance Feedback and Query Expansion Introduction A Framework for Feedback Methods Explicit Relevance Feedback Relevance Feedback for the Vector Model: Rocchio Method Relevance Feedback for the Probabilistic Model Evaluation of Relevance Feedback Explicit Feedback Through Clicks Eye Tracking and Relevance Judgements User Behavior Clicks as a Metric of User Preferences Implicit Feedback Through Local Analysis

9 x CONTENTS Implicit Feedback Through Local Clustering Implicit Feedback through Local Context Analysis Implicit Feedback Through Global Analysis Query Expansion based on a Similarity Thesaurus Query Expansion based on a Statistical Thesaurus Trends and Research Issues Bibliographic Discussion Documents: Languages & Properties 203 with Gonzalo Navarro and Nivio Ziviani 6.1 Introduction Metadata Document Formats Text Multimedia Graphics and Virtual Reality Markup Languages SGML HTML XML RDF: Resource Description Framework HyTime Text Properties Information Theory Modeling Natural Language Text Similarity Document Preprocessing Lexical Analysis of the Text Elimination of Stopwords Stemming Keyword Selection Thesauri Organizing Documents Taxonomies Folksonomies Text Compression Basic Concepts Statistical Methods Statistical Methods: Modeling Statistical Methods: Coding Dictionary Methods Preprocessing for Compression Comparing Text Compression Techniques Structured Text Compression Trends and Research Issues Bibliographical Discussion

10 CONTENTS xi 7 Queries: Languages & Properties 257 with Gonzalo Navarro 7.1 Query Languages Keyword-Based Querying Beyond Keywords Structural Queries Query Protocols Query Properties Characterizing Web Queries User Search Behavior Query Intent Query Topic Query Sessions and Missions Query Difficulty Trends and Research Issues Bibliographical Discussion Text Classification 283 with Marcos Gonçalves 8.1 Introduction A Characterization of Text Classification Machine Learning The Text Classification Problem Text Classification Algorithms Unsupervised Algorithms Clustering Naive Text Classification Supervised Algorithms Decision Trees The k-nn Classifier The Rocchio Classifier Probabilistic Naive Bayes Document Classification The SVM Classifier Ensemble Classifiers Final Remarks on Supervised Algorithms Feature Selection or Dimensionality Reduction Term Class Incidence Table Term Document Frequency TF-IDF Weights Mutual Information Information Gain Chi Square Impact of Feature Selection Evaluation Metrics Contingency Table Accuracy and Error Precision and Recall

11 xii CONTENTS F-measure and F Cross-Validation Standard Collections Organizing the Classes Building Taxonomies Trends and Research Issues Bibliographic Discussion Indexing and Searching 339 with Gonzalo Navarro 9.1 Introduction Inverted Indexes Basic Concepts Full Inverted Indexes Searching Ranking Construction Compressed Inverted Indexes Structural Queries Signature Files Suffix Trees and Suffix Arrays Structure: Tries and Suffix Trees Searching for Simple Strings Searching for Complex Patterns Construction Compressed Suffix Arrays Sequential Searching Simple Strings: Horspool Complex Patterns: Automata and Bit-Parallelism Faster Bit-Parallel Algorithms Regular Expressions Multiple Patterns Approximate Searching Searching Compressed Text Multi-dimensional Indexing Trends and Research Issues Bibliographic Discussion Parallel and Distributed IR 401 with Eric Brown 10.1 Introduction A Taxonomy of Distributed IR Systems Data Partitioning Collection Partitioning Collection Selection Inverted Index Partitioning Partitioning other Indexes Parallel IR

12 CONTENTS xiii Introduction Parallel IR on MIMD Architectures Parallel IR on SIMD Architectures Cluster-based IR Distributed IR Introduction Indexing Query Processing Web Issues Federated Search Retrieval in Peer-to-Peer Networks Trends and Research Issues Bibliographic Discussion Web Retrieval 449 with Yoelle Maarek 11.1 Introduction A Challenging Problem The Web Characteristics Structure of the Web Graph Modeling the Web Link Analysis Search Engine Architectures Basic Architecture Cluster-based Architecture Caching Multiple Indexes Distributed Architectures Search Engine Ranking Ranking Signals Link-based Ranking Simple Ranking Functions Learning to Rank Learning the Ranking Function Quality Evaluation Web Spam Managing Web Data Assigning Identifiers to Documents Metadata Compressing the Web Graph Handling Duplicated Data Search Engine User Interaction The Search Rectangle Paradigm The Search Engine Result Page Educating the User Browsing

13 xiv CONTENTS Flat Browsing Structure Guided Browsing and Web Directories Beyond Browsing Hypertext and the Web Combining Searching with Browsing Web Query Languages Dynamic Search Related Problems Computational Advertising Web Mining Metasearch Trends and Research Issues Beyond Static Text Data Current Challenges Bibliographical Discussion Web Crawling 519 with Carlos Castillo 12.1 Introduction Applications of a Web Crawler General Web Search Topical Crawling Web Characterization Mirroring Web Site Analysis A Taxonomy of Crawlers Types of Web Pages Architecture and Implementation Crawler Architecture Practical Issues Parallel Crawling Scheduling Algorithms Selection Policy Revisit Policy Politeness Policy Combining Policies Evaluation Evaluating Network Usage Evaluating Long-term Scheduling Trends and Research Issues Crawling the Hidden Web Crawling with the Help of Web Sites Distributed Crawling Bibliographic Discussion

14 CONTENTS xv 13 Structured Text Retrieval 549 with Mounia Lalmas 13.1 Introduction Structuring Power Explicit vs. Implicit Structure Static vs. Dynamic Structure Single Hierarchy vs. Multiple Hierarchies Early Text Retrieval Models Model Based on Non-Overlapping Lists Model Based on Proximal Nodes Ranking Structured Text Results XML Retrieval Challenges in XML Retrieval Indexing Strategies Ranking Strategies Removing Overlaps XML Retrieval Evaluation Document Collections Topics Retrieval Tasks Relevance Measures Query Languages Characteristics Classification of XML Query Languages Examples of XML Query Languages Trends and Research Issues Bibliographic Discussion Multimedia Information Retrieval 591 by Dulce Ponceleón and Malcolm Slaney 14.1 Introduction What is Multimedia? Multimedia IR Text IR versus Multimedia IR The Challenges The Semantic Gap Feature Ambiguity Machine-generated Data Content-based Image Retrieval Color-Based Retrieval Texture Salient Points Audio and Music Retrieval Fingerprinting Speech Recognition Speaker Identification

15 xvi CONTENTS Spoken Document Retrieval Audio Basics Retrieving and Browsing Video Video Abstracts Static Summaries Mosaics and Salient Stills Dynamic Summaries Interactive Summaries Visual vs. Audio Browsing Evaluating Summaries Fusion Models: Combining it All Naming Faces Naming Images Naming Audio Combining Audio and Video for AVSR Combining Audio and Video for Multimedia Segmentation A Video Segmentation Example Segmentation Schemes for Video Video Segmentation with Edges Speech Segmentation Segmentation Evaluation Compression and MPEG Standards Intensity and Sampling Color Lossy Compression Lossless Compression Temporal Redundancy Motion Prediction MPEG Standards Trends and Research Issues Bibliographic Discussion Enterprise Search 645 by David Hawking 15.1 Introduction Characteristics and Applications of Enterprise Search Enterprise Search Software Workplace Search Enterprise Search Tasks Examples of Search-Supported Tasks Search Types Studying Enterprise Search Architecture of Enterprise Search Systems Gathering Extracting Indexing

16 CONTENTS xvii Indexing Textual Annotations Query Processing Presentation of Search Results Security Models Federation/Metasearch Enterprise Search Evaluation Published Test Collections for Enterprise Search Internal Enterprise Search Evaluations Enterprise Search Tuning What is it Reasonable to Expect? Potential Reasons for Dissatisfaction Context and Personalization Controls and Levers for Contextualization Contextualization: Local, Enterprise or Global? Privacy of Profiles Defining, Creating and Maintaining a Profile User Modeling Implicit Measures Information Filtering Social Recommender Systems Trends and Research Issues Bibliographic Discussion Library Systems 687 by Edie Rasmussen 16.1 The Information Environment in the Library Online Public Access Catalogues OPACs and Bibliographic Records Information Retrieval from the ILS Integrating the Hybrid Library OPACs and End Users ILS: Vendors and Products IR Systems and Document Databases Bibliographic and Full-text Databases Content of Database Records The Online Industry: Database Vendors Information Retrieval from Document Databases Information Retrieval in Organizations Trends and Research Issues Bibliographic Discussion Digital Libraries 713 by Marcos Gonçalves 17.1 Introduction Defining Digital Libraries A General Architecture Fundamentals

17 xviii CONTENTS Digital Objects and Collections Metadata and Catalogs Repositories/Archives Services Social-Economical Issues Social Issues Economical Issues Software Systems Greenstone Eprints DSpace Fedora Open Digital Libraries The 5S Suite DL Case Studies The Networked DL of Theses and Dissertations The National Science Digital Library The ETANA-DL Archaeological Digital Library Trends and Research Issues Evaluation Integration Other Research Challenges Bibliographic Discussion A Open Source Search Engines 739 with Christian Middleton A.1 Introduction A.2 Search Engines A.2.1 Preliminary Selection of Search Engines A.2.2 Features A.2.3 Evaluation A.3 Methodology A.3.1 Document Collections A.3.2 Evaluation Tests A.3.3 Experimental Setup A.4 Experimental Results A.4.1 Test A Indexing A.4.2 Test B Incremental Indexing A.4.3 Test C Search Performance A.4.4 Global Evaluation A.5 Conclusions B Biographies 757 References 765

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

An Introduction to Search Engines and Web Navigation

An Introduction to Search Engines and Web Navigation An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Search Engines Information Retrieval in Practice

Search Engines Information Retrieval in Practice Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. ----- PEARSON Boston Columbus Indianapolis

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Introductory logic and sets for Computer scientists

Introductory logic and sets for Computer scientists Introductory logic and sets for Computer scientists Nimal Nissanke University of Reading ADDISON WESLEY LONGMAN Harlow, England II Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

Automatic Text Processing

Automatic Text Processing Automatic Text Processing The Transformation, Analysis, and Retrieval of Information by Computer Gerard Salton Cornell University Technlsche Univerariat Darmstadt FACHBEREICH1NFORMATJK BIBLIOTHE.K Invented.:

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

FUNDAMENTALS OF. Database S wctpmc. Shamkant B. Navathe College of Computing Georgia Institute of Technology. Addison-Wesley

FUNDAMENTALS OF. Database S wctpmc. Shamkant B. Navathe College of Computing Georgia Institute of Technology. Addison-Wesley FUNDAMENTALS OF Database S wctpmc SIXTH EDITION Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

ACM Press New York. Addison-Wesley. Modern Information Retrieval. Ricardo Baeza-Yates. Berthier Ribeiro-Neto. Harlow, England Reading, Massachusetts

ACM Press New York. Addison-Wesley. Modern Information Retrieval. Ricardo Baeza-Yates. Berthier Ribeiro-Neto. Harlow, England Reading, Massachusetts Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press New York Addison-Wesley Harlow, England Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Amsterdam

More information

Win32 Network Programming

Win32 Network Programming Win32 Network Programming Windows 95 and Windows NT Network Programming Using MFC Ralph Davis TT Addison-Wesley Developers Press Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario

More information

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Glossary. ASCII: Standard binary codes to represent occidental characters in one byte.

Glossary. ASCII: Standard binary codes to represent occidental characters in one byte. Glossary ASCII: Standard binary codes to represent occidental characters in one byte. Ad hoc retrieval: standard retrieval task in which the user specifies his information need through a query which initiates

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the

More information

Preface...xi Coverage of this edition...xi Acknowledgements...xiii

Preface...xi Coverage of this edition...xi Acknowledgements...xiii Contents Preface...xi Coverage of this edition...xi Acknowledgements...xiii 1 Basic concepts of information retrieval systems...1 Introduction...1 Features of an information retrieval system...2 Elements

More information

Designing the User Interface

Designing the User Interface Designing the User Interface Strategies for Effective Human-Computer Interaction Second Edition Ben Shneiderman The University of Maryland Addison-Wesley Publishing Company Reading, Massachusetts Menlo

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

The Essential Guide to Video Processing

The Essential Guide to Video Processing The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Data base 7\,T"] Systems:;-'./'--'.; r Modelsj Languages, Design, and Application Programming Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant

More information

TEXT MINING APPLICATION PROGRAMMING

TEXT MINING APPLICATION PROGRAMMING TEXT MINING APPLICATION PROGRAMMING MANU KONCHADY CHARLES RIVER MEDIA Boston, Massachusetts Contents Preface Acknowledgments xv xix Introduction 1 Originsof Text Mining 4 Information Retrieval 4 Natural

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Collective Intelligence in Action

Collective Intelligence in Action Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez SQL Queries for Mere Mortals Third Edition A Hands-On Guide to Data Manipulation in SQL John L. Viescas Michael J. Hernandez r A TT TAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

Human-Computer Information Retrieval

Human-Computer Information Retrieval Human-Computer Information Retrieval Gary Marchionini University of North Carolina at Chapel Hill march@ils.unc.edu CSAIL MIT November 12, 2004 Message IR and HCI are related fields that have strong (staid?)

More information

An Introduction to Object-Oriented Programming

An Introduction to Object-Oriented Programming An Introduction to Object-Oriented Programming Timothy Budd Oregon State University TT Addison-Wesley Publishing Company Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham,

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

Business Intelligence Roadmap HDT923 Three Days

Business Intelligence Roadmap HDT923 Three Days Three Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students are

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT 5 LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS xxi

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT 5 LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS xxi ix TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT 5 LIST OF TABLES xv LIST OF FIGURES xviii LIST OF SYMBOLS AND ABBREVIATIONS xxi 1 INTRODUCTION 1 1.1 INTRODUCTION 1 1.2 WEB CACHING 2 1.2.1 Classification

More information

Mathematica for Scientists and Engineers

Mathematica for Scientists and Engineers Mathematica for Scientists and Engineers Thomas B. Bahder Addison-Wesley Publishing Company Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham, England Amsterdam Bonn Paris

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

CS290N Summary Tao Yang

CS290N Summary Tao Yang CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines: Information Retrieval in Practice, Publisher: Addison-Wesley, 2010. Book website. [MRS] Christopher

More information

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY Programming In Ada 2005 JOHN BARNES... TT ADDISON-WESLEY An imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong Kong Seoul Taipei New Delhi

More information

Mining the Web 2.0 to improve Search

Mining the Web 2.0 to improve Search Mining the Web 2.0 to improve Search Ricardo Baeza-Yates VP, Yahoo! Research Agenda The Power of Data Examples Improving Image Search (Faceted Clusters) Searching the Wikipedia (Correlator) Understanding

More information

Toward Human-Computer Information Retrieval

Toward Human-Computer Information Retrieval Toward Human-Computer Information Retrieval Gary Marchionini University of North Carolina at Chapel Hill march@ils.unc.edu Samuel Lazerow Memorial Lecture The Information School University of Washington

More information

CHAPTER 8 Multimedia Information Retrieval

CHAPTER 8 Multimedia Information Retrieval CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

Introduction to Information Retrieval. Lecture Outline

Introduction to Information Retrieval. Lecture Outline Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION vi TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION iii xii xiii xiv 1 INTRODUCTION 1 1.1 WEB MINING 2 1.1.1 Association Rules 2 1.1.2 Association Rule Mining 3 1.1.3 Clustering

More information

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Mahout in Action MANNING ROBIN ANIL SEAN OWEN TED DUNNING ELLEN FRIEDMAN. Shelter Island

Mahout in Action MANNING ROBIN ANIL SEAN OWEN TED DUNNING ELLEN FRIEDMAN. Shelter Island Mahout in Action SEAN OWEN ROBIN ANIL TED DUNNING ELLEN FRIEDMAN II MANNING Shelter Island contents preface xvii acknowledgments about this book xx xix about multimedia extras xxiii about the cover illustration

More information

INFORMATION HIDING IN COMMUNICATION NETWORKS

INFORMATION HIDING IN COMMUNICATION NETWORKS 0.8125 in Describes information hiding in communication networks, and highlights its important issues, challenges, trends, and applications. Highlights development trends and potential future directions

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Beyond Ten Blue Links Seven Challenges

Beyond Ten Blue Links Seven Challenges Beyond Ten Blue Links Seven Challenges Ricardo Baeza-Yates VP of Yahoo! Research for EMEA & LatAm Barcelona, Spain Thanks to Andrei Broder, Yoelle Maarek & Prabhakar Raghavan Agenda Past and Present Wisdom

More information

The Power of Events. An Introduction to Complex Event Processing in Distributed Enterprise Systems. David Luckham

The Power of Events. An Introduction to Complex Event Processing in Distributed Enterprise Systems. David Luckham The Power of Events An Introduction to Complex Event Processing in Distributed Enterprise Systems David Luckham AAddison-Wesley Boston San Francisco New York Toronto Montreal London Munich Paris Madrid

More information

How to Build a Digital Library

How to Build a Digital Library How to Build a Digital Library Ian H. Witten & David Bainbridge Contents Preface Acknowledgements i iv 1. Orientation: The world of digital libraries 1 One: Supporting human development 1 Two: Pushing

More information

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen

More information

[Contents. Sharing. sqlplus. Storage 6. System Support Processes 15 Operating System Files 16. Synonyms. SQL*Developer

[Contents. Sharing. sqlplus. Storage 6. System Support Processes 15 Operating System Files 16. Synonyms. SQL*Developer ORACLG Oracle Press Oracle Database 12c Install, Configure & Maintain Like a Professional Ian Abramson Michael Abbey Michelle Malcher Michael Corey Mc Graw Hill Education New York Chicago San Francisco

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Ajloun National University

Ajloun National University Study Plan Guide for the Bachelor Degree in Computer Information System First Year hr. 101101 Arabic Language Skills (1) 101099-01110 Introduction to Information Technology - - 01111 Programming Language

More information

Essentials of Database Management

Essentials of Database Management Essentials of Database Management Jeffrey A. Hoffer University of Dayton Heikki Topi Bentley University V. Ramesh Indiana University PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four

More information

Computers as Components Principles of Embedded Computing System Design

Computers as Components Principles of Embedded Computing System Design Computers as Components Principles of Embedded Computing System Design Third Edition Marilyn Wolf ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

Summary of Contents LIST OF FIGURES LIST OF TABLES

Summary of Contents LIST OF FIGURES LIST OF TABLES Summary of Contents LIST OF FIGURES LIST OF TABLES PREFACE xvii xix xxi PART 1 BACKGROUND Chapter 1. Introduction 3 Chapter 2. Standards-Makers 21 Chapter 3. Principles of the S2ESC Collection 45 Chapter

More information

Real-Time Systems and Programming Languages

Real-Time Systems and Programming Languages Real-Time Systems and Programming Languages Ada, Real-Time Java and C/Real-Time POSIX Fourth Edition Alan Burns and Andy Wellings University of York * ADDISON-WESLEY An imprint of Pearson Education Harlow,

More information

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17 Table of Contents 1 Introduction...1 1.1 Common Problem...1 1.2 Data Integration and Data Management...3 1.2.1 Information Quality Overview...3 1.2.2 Customer Data Integration...4 1.2.3 Data Management...8

More information

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City The Complete Reference Christopher Adamson Mc Grauu LlLIJBB New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents Acknowledgments

More information

Chapter 3 - Text. Management and Retrieval

Chapter 3 - Text. Management and Retrieval Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 3 - Text Management and Retrieval Literature: Baeza-Yates, R.;

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Automatic Identification of User Goals in Web Search [WWW 05]

Automatic Identification of User Goals in Web Search [WWW 05] Automatic Identification of User Goals in Web Search [WWW 05] UichinLee @ UCLA ZhenyuLiu @ UCLA JunghooCho @ UCLA Presenter: Emiran Curtmola@ UC San Diego CSE 291 4/29/2008 Need to improve the quality

More information

Outline. Lecture 3: EITN01 Web Intelligence and Information Retrieval. Query languages - aspects. Previous lecture. Anders Ardö.

Outline. Lecture 3: EITN01 Web Intelligence and Information Retrieval. Query languages - aspects. Previous lecture. Anders Ardö. Outline Lecture 3: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University February 5, 2013 A. Ardö, EIT Lecture 3: EITN01 Web Intelligence

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Information Search and Retrieval System in Libraries

Information Search and Retrieval System in Libraries Information Search and Retrieval System in Libraries N Rupsing Naik A Madhava Rao Abstract A digital library comprises diverse collections of digital objects representing text, sound, maps, videos, photos,

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington T V ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Third Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive PEARSON Prentice Hall Pearson Education International Contents Preface xv Acknowledgments

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

Mathematics Shape and Space: Polygon Angles

Mathematics Shape and Space: Polygon Angles a place of mind F A C U L T Y O F E D U C A T I O N Department of Curriculum and Pedagogy Mathematics Shape and Space: Polygon Angles Science and Mathematics Education Research Group Supported by UBC Teaching

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information