INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY

Size: px
Start display at page:

Download "INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY"

Transcription

1 INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY Ayşe Göker City University London, UK John Davies BT, UK A John Wiley and Sons, Ltd., Publication

2

3 INFORMATION RETRIEVAL

4

5 INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY Ayşe Göker City University London, UK John Davies BT, UK A John Wiley and Sons, Ltd., Publication

6 This edition first published John Wiley & Sons, Ltd. Except for Chapter 3, Multimedia Resource Discovery 2009 Stefan Rüger. Registered office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at The right of the authors to be identified as the authors of this work has been asserted in accordance with the Copyright, Designs and Patents Act All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Information retrieval : searching in the 21st century / [edited by] Ayşe Göker, John Davies. p. cm. Includes bibliographical references and index. ISBN Information retrieval. I. Göker, Ayşe. II. Davies, J. (N. John) ZA3075.I dc A catalogue record for this book is available from the British Library. ISBN: (H/B) Set in 9/11 Times New Roman by Laserwords Private Ltd, Chennai, India. Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire.

7 Dedication I would like to thank my husband, parents and wider family for all their support to my efforts to reach my full potential. This book is one of the tangible outcomes of this process. My father, in particular, would have loved to have seen it. Ayşe Göker

8

9 Contents Foreword Preface About the Editors List of Contributors Introduction xiii xv xvii xix xxi 1 Information Retrieval Models 1 Djoerd Hiemstra 1.1 Introduction Terminology What is a model? Outline Exact Match Models The Boolean model Region models Discussion Vector Space Approaches The vector space model Positioning the query in vector space Term weighting and other caveats Probabilistic Approaches The probabilistic indexing model The probabilistic retrieval model The 2-Poisson model Bayesian network models Language models Google s PageRank model Summary and Further Reading 15 Exercises 16 References 18 2 User-centred Evaluation of Information Retrieval Systems 21 Pia Borlund 2.1 Introduction Background Chapter outline The MEDLARS Test 23

10 viii Contents Description of Lancaster s test of MEDLARS Evaluation characteristics The Okapi Project The objectives of Okapi Okapi at TREC The impact of Okapi The Interactive IR Evaluation Model The cognitive IR approach The three parts of the IIR evaluation model Summary 31 Exercises 34 References 35 3 Multimedia Resource Discovery 39 Stefan Rüger 3.1 Introduction Basic Multimedia Search Technologies Piggy-back text retrieval Automated annotation Content-based retrieval Fingerprinting Challenges of Automated Visual Indexing Added Services Video summaries New paradigms in information visualisation Visual search and relevance feedback Browsing: Lateral and Geotemporal Summary 55 Exercises 57 References 59 4 Image Users Needs and Searching Behaviour 63 Stina Westman 4.1 Introduction Image Attributes and Users Needs Image attributes Image attributes in queries Attributes beyond queries Image needs Image Searching Behaviour Search process Search strategies Relevance criteria New Directions for Image Access Social tagging Images in context Visualisations Workspaces Summary 78 Exercise 80 References 81

11 Contents ix 5 Web Information Retrieval 85 Nick Craswell and David Hawking 5.1 Introduction Distinctive Characteristics of the Web Web data Web structure User behaviour User interaction data Three Ranking Problems Retrieval Selective crawling Index organisation Other Web IR Issues Stemming Treatment of near-duplicate content Spelling suggestions Spam rejection Adult content filtering genre classification Query-targeted advertisement generation Snippet generation Context and web information retrieval Evaluation of Web Search Effectiveness TREC-9 Web Track: Realistic queries, rich link structure, traditional IR task Evaluation using web-specific tasks Future directions for web IR evaluations Comparing results lists in context Evaluation by commercial web search companies Summary 97 Exercises 98 References 99 6 Mobile Search 103 David Mountain, Hans Myrhaug and Ayşe Göker 6.1 Introduction: Mobile Search Why Now? Technological drivers Predicted demand for mobile search Information for Mobile Search Linking information to physical space The storage of information The ownership of information Designing for Mobile Search Characteristics of mobile usage Filters as a framework for mobile search Manual versus automatic filtering Using filters to push information Case Studies Oslo airport AmbieSense The Swiss Alps WebPark Summary 125 Exercises 127 References 129

12 x Contents 7 Context and Information Retrieval 131 Ayşe Göker, Hans Myrhaug and Ralf Bierig 7.1 Introduction What is Context? Whose context? Context in Information Retrieval Context in the wider sense Perceptions of context in related fields Example: context and images Context Modelling and Representation Context modelling User models and their relationship to context Past, present and future contexts Context and Content Representation of context Capturing context Searching with context information Context templates Related Topics Personalisation and context Mobility and context Evaluating Context-aware IR Systems Principles of methodology Summary 150 Exercises 151 References Text Categorisation and Genre in Information Retrieval 159 Stuart Watt 8.1 Introduction: What is Text Categorisation? Purpose of categorisation How to Build a Text Categorisation System The classifier component The machine learning component The feature selection component Evaluating Text Categorisation Systems Genre: Text Structure and Purpose An overview of genre Text categorisation and genre The importance of layout Related Techniques: Information Filtering Applications of Text Categorisation Summary and the Future of Text Categorisation 174 Exercises 175 References Semantic Search 179 John Davies, Alistair Duke and Atanas Kiryakov 9.1 Introduction Limitations of current search technology Ontologies Knowledge bases and semantic repositories 182

13 Contents xi 9.2 Semantic Web Semantic web and semantic search Basic semantic web standards: RDF(S) and OWL Metadata and Annotations Semantic Annotations: the Fibres of the Semantic Web Semantic Annotation of Named Entities Named entities Semantic annotation model and representation Semantic Indexing and Retrieval Indexing with respect to lexical concepts Indexing with respect to named entities Retrieval as spreading activation over semantic network Semantic Search Tools Searching through document-level RDF annotations QuizRDF Exploiting massive background knowledge TAP Character-level annotations and massive world knowledge KIM Squirrel Other approaches Summary 208 Exercises 210 References The Role of Natural Language Processing in Information Retrieval: Searching for Meaning and Structure 215 Tony Russell-Rose and Mark Stevenson 10.1 Introduction Natural Language Processing Techniques Named entity recognition Information extraction WordNet Word sense disambiguation Evaluation Applications of Natural Language Processing in Information Retrieval Text mining Question answering Discussion Summary 226 Exercises 228 References Cross-Language Information Retrieval 233 Daqing He and Jianqiang Wang 11.1 Introduction Major Approaches and Challenges in CLIR Identifying Translation Units Tokenisation Stemming Phrase identification Stop-words Obtaining Translation Knowledge Obtaining bilingual dictionaries and corpora Extracting translation knowledge 238

14 xii Contents Dealing with out-of-vocabulary terms Using Translation Knowledge Translation disambiguation Weighting translation alternatives Using translation probabilities in term weighting Interactivity in CLIR Interactive CLIR Query translation in interactive CLIR Document selection in interactive CLIR Evaluation of CLIR Systems Cranfield-based evaluation framework Evaluations on interactive CLIR Current CLIR evaluation frameworks Summary and Future Directions Current achievements in CLIR Future directions for CLIR Further reading 246 Exercises 247 References Performance Issues in Parallel Computing for Information Retrieval 255 Andrew MacFarlane 12.1 Introduction Why Parallel IR? Review of Previous Work Distribution Methods for Inverted File Data On-the-fly distribution Inverted file replication Inverted file partitioning Tasks in Information Retrieval The indexing task The probabilistic search task The passage retrieval task The routing/filtering task The index update task A Synthetic Model of Performance for Parallel Information Retrieval Empirical Examination of Synthetic Model Comparative results using indexing models Comparative results using search models Comparative results using passage retrieval models Comparative results using term selection models Comparative results using index update model Summary and Further Research 269 Exercises 270 References 271 Solutions to Exercises 273 Index 285

15 Foreword In the forty years since I started working in the field, and indeed for some years before that (almost since Calvin Mooers coined the term information storage and retrieval in the 1950s), there have been a significant number of books on information retrieval. Even if we ignore the more specialist research monographs and the readers of previously published papers, I can find on my shelves or in my mental library many books that attempt (probably with the IR student in mind) to construct a coherent and systematic way of defining and presenting information retrieval as a field of study and of application. Often such a book is the work of a single author, or perhaps a pair working together. Such works can clearly have an advantage in respect of coherence; the field is necessarily presented from a single viewpoint. On the other hand, they can also suffer for the same reason. The IR field is rich (more so now than it has ever been), and it is difficult within a single viewpoint to do justice to this richness. Readers, on the other hand, have to be constructed out of the materials to hand: the published papers, each of which has taken its own view, probably with a much narrower field of vision, and different from that of the other chosen papers. The present book attempts the tricky task of combining the breadth of vision of multiple authors with the coherence of a single integrated work. The richness of the field is apparent in the range of chapters: from formal mathematical modelling to user context, from parallel computation to semantic search. The topics covered also vary greatly in their historical association with the field. Categorisation,for example, has been around as an IR technique for quite a long time though Stuart Watt brings a new perspective. Mobile search (David Mountain, Hans Myrhaug and Ayşe Göker), however, is a relatively recent development. The use of formal models (information retrieval models, Djoerd Hiemstra) goes back almost to the beginning, as does experimental evaluation (user-centred evaluation of information retrieval systems, Pia Borlund), though in both cases there have been huge changes in the past decade. This same decade has witnessed the huge growth of the World Wide Web, and the developing dominance of web search engines (web information retrieval, Nick Craswell and David Hawking) as the glue which holds the web together. For many people today, IR is web search. It is true that there has been a huge amount of influence in both directions: search engines are largely based on techniques from both the IR research community and from previous operational systems, while IR research and practice in other environments has learnt a great deal from the forcing-house that is the web search space. This dominance of the web as the domain of interest is well reflected in many of the chapters in the present volume. It is important, however, to remember that IR is not all about web search, and that the web space presents both problems and opportunities which differ from those in other domains. The desktop, the enterprise, specialist collections such as scientific papers are all examples of different domains for which search functionality is a fundamental requirement. There are references to several of these throughout the book, but specific domains with their own chapters are multimedia resource discovery (Stefan Rüger) and image users needs and searching behaviour (Stina Westman). The user theme is taken further in the context and information retrieval (Ayşe Göker, Hans Myrhaug and Ralf Bierig). More generic problem areas are addressed in cross-language information retrieval (Daqing He and Jianqiang Wang), in semantic search (John Davies, Alistair Duke and Atanas Kiryakov) and in the

INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY

INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY Ayşe Göker City University London, UK John Davies BT, UK A John Wiley and Sons, Ltd., Publication INFORMATION RETRIEVAL INFORMATION RETRIEVAL: SEARCHING

More information

FUZZY LOGIC WITH ENGINEERING APPLICATIONS

FUZZY LOGIC WITH ENGINEERING APPLICATIONS FUZZY LOGIC WITH ENGINEERING APPLICATIONS Third Edition Timothy J. Ross University of New Mexico, USA A John Wiley and Sons, Ltd., Publication FUZZY LOGIC WITH ENGINEERING APPLICATIONS Third Edition FUZZY

More information

COMPUTATIONAL DYNAMICS

COMPUTATIONAL DYNAMICS COMPUTATIONAL DYNAMICS THIRD EDITION AHMED A. SHABANA Richard and Loan Hill Professor of Engineering University of Illinois at Chicago A John Wiley and Sons, Ltd., Publication COMPUTATIONAL DYNAMICS COMPUTATIONAL

More information

Exploiting Distributed Resources in Wireless, Mobile and Social Networks Frank H. P. Fitzek and Marcos D. Katz

Exploiting Distributed Resources in Wireless, Mobile and Social Networks Frank H. P. Fitzek and Marcos D. Katz MOBILE CLOUDS Exploiting Distributed Resources in Wireless, Mobile and Social Networks Frank H. P. Fitzek and Marcos D. Katz MOBILE CLOUDS MOBILE CLOUDS EXPLOITING DISTRIBUTED RESOURCES IN WIRELESS,

More information

Inside Symbian SQL. Lead Authors Ivan Litovski with Richard Maynard. Head of Technical Communications, Symbian Foundation Jo Stichbury

Inside Symbian SQL. Lead Authors Ivan Litovski with Richard Maynard. Head of Technical Communications, Symbian Foundation Jo Stichbury Inside Symbian SQL A Mobile Developer s Guide to SQLite Lead Authors Ivan Litovski with Richard Maynard With James Aley, Philip Cheung, James Clarke, Lorraine Martin, Philip Neal, Mike Owens, Martin Platts

More information

SDH/SONET Explained in Functional Models

SDH/SONET Explained in Functional Models SDH/SONET Explained in Functional Models Modeling the Optical Transport Network Huub van Helvoort Networking Consultant, the Netherlands SDH/SONET Explained in Functional Models SDH/SONET Explained in

More information

Next Generation Networks Perspectives and Potentials. Dr Jingming Li Salina LiSalina Consulting, Switzerland Pascal Salina Swisscom SA, Switzerland

Next Generation Networks Perspectives and Potentials. Dr Jingming Li Salina LiSalina Consulting, Switzerland Pascal Salina Swisscom SA, Switzerland Next Generation Networks Perspectives and Potentials Dr Jingming Li Salina LiSalina Consulting, Switzerland Pascal Salina Swisscom SA, Switzerland Next Generation Networks Next Generation Networks Perspectives

More information

QoS OVER HETEROGENEOUS NETWORKS

QoS OVER HETEROGENEOUS NETWORKS QoS OVER HETEROGENEOUS NETWORKS Mario Marchese Department of Communications, Computer and System Science University of Genoa, Italy QoS OVER HETEROGENEOUS NETWORKS QoS OVER HETEROGENEOUS NETWORKS Mario

More information

Speech in Mobile and Pervasive Environments

Speech in Mobile and Pervasive Environments Speech in Mobile and Pervasive Environments Wiley Series on Wireless Communications and Mobile Computing Series Editors: Dr Xuemin (Sherman) Shen, University of Waterloo, Canada Dr Yi Pan, Georgia State

More information

SHORT MESSAGE SERVICE (SMS)

SHORT MESSAGE SERVICE (SMS) SHORT MESSAGE SERVICE (SMS) THE CREATION OF PERSONAL GLOBAL TEXT MESSAGING Friedhelm Hillebrand (Editor) Hillebrand & Partners, Germany Finn Trosby Telenor, Norway Kevin Holley Telefónica Europe, UK Ian

More information

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the

More information

SIMPLY EXCEL by Paul McFedries. A John Wiley and Sons, Ltd, Publication

SIMPLY EXCEL by Paul McFedries. A John Wiley and Sons, Ltd, Publication SIMPLY EXCEL 2010 by Paul McFedries A John Wiley and Sons, Ltd, Publication First published under the title Excel 2010 Simplified, ISBN 978-0-470-57763-9 by Wiley Publishing, Inc., 10475 Crosspoint Boulevard,

More information

GSM Architecture, Protocols and Services Third Edition

GSM Architecture, Protocols and Services Third Edition GSM Architecture, Protocols and Services Third Edition GSM Architecture, Protocols and Services Third Edition 2009 John Wiley & Sons, Ltd. ISBN: 978-0- 470-03070- 7 J. E be rs pä c he r, H. -J. Vöge l,

More information

Beginning Transact-SQL with SQL Server 2000 and Paul Turley with Dan Wood

Beginning Transact-SQL with SQL Server 2000 and Paul Turley with Dan Wood Beginning Transact-SQL with SQL Server 2000 and 2005 Paul Turley with Dan Wood Beginning Transact-SQL with SQL Server 2000 and 2005 Beginning Transact-SQL with SQL Server 2000 and 2005 Paul Turley with

More information

The Internet of Things

The Internet of Things The Internet of Things The Internet of Things Connecting Objects to the Web Edited by Hakima Chaouchi First published 2010 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

More information

S60 Programming A Tutorial Guide

S60 Programming A Tutorial Guide S60 Programming A Tutorial Guide S60 Programming A Tutorial Guide Paul Coulton, Reuben Edwards With Helen Clemson Reviewed by Alex Wilbur, Alastair Milne, Filippo Finelli, Graeme Duncan, Iain Campbell,

More information

Multimedia Messaging Service

Multimedia Messaging Service Multimedia Messaging Service An Engineering Approach to MMS Gwenaël Le Bodic Alcatel, France Multimedia Messaging Service Multimedia Messaging Service An Engineering Approach to MMS Gwenaël Le Bodic

More information

Network Performance Analysis

Network Performance Analysis Network Performance Analysis Network Performance Analysis Thomas Bonald Mathieu Feuillet Series Editor Pierre-Noël Favennec First published 2011 in Great Britain and the United States by ISTE Ltd and

More information

Semantic Web Technologies Trends and Research in Ontology-based Systems

Semantic Web Technologies Trends and Research in Ontology-based Systems Semantic Web Technologies Trends and Research in Ontology-based Systems John Davies BT, UK Rudi Studer University of Karlsruhe, Germany Paul Warren BT, UK John Wiley & Sons, Ltd Contents Foreword xi 1.

More information

Fundamentals of Operating Systems. Fifth Edition

Fundamentals of Operating Systems. Fifth Edition Fundamentals of Operating Systems Fifth Edition Fundamentals of Operating Systems A.M. Lister University of Queensland R. D. Eager University of Kent at Canterbury Fifth Edition Springer Science+Business

More information

COSO Enterprise Risk Management

COSO Enterprise Risk Management COSO Enterprise Risk Management COSO Enterprise Risk Management Establishing Effective Governance, Risk, and Compliance Processes Second Edition ROBERT R. MOELLER John Wiley & Sons, Inc. Copyright # 2007,

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Cloud Phone Systems. Andrew Moore. Making Everything Easier! Nextiva Special Edition. Learn:

Cloud Phone Systems. Andrew Moore. Making Everything Easier! Nextiva Special Edition. Learn: Making Everything Easier! Nextiva Special Edition Cloud Phone Systems Learn: What cloud phone systems are and how they can benefit your company About the many advantages a cloud phone system offers Features

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

GSM Architecture, Protocols and Services

GSM Architecture, Protocols and Services GSM Architecture, Protocols and Services Third Edition Jörg Eberspächer Technische Universität München, Germany Hans-Jörg Vögel BMW Group Research & Technology, Germany Christian Bettstetter University

More information

A Developer s Guide to the Semantic Web

A Developer s Guide to the Semantic Web A Developer s Guide to the Semantic Web von Liyang Yu 1. Auflage Springer 2011 Verlag C.H. Beck im Internet: www.beck.de ISBN 978 3 642 15969 5 schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Professional ASP.NET 2.0 Databases. Thiru Thangarathinam

Professional ASP.NET 2.0 Databases. Thiru Thangarathinam Professional ASP.NET 2.0 Databases Thiru Thangarathinam Professional ASP.NET 2.0 Databases Professional ASP.NET 2.0 Databases Thiru Thangarathinam Professional ASP.NET 2.0 Databases Published by Wiley

More information

TASK SCHEDULING FOR PARALLEL SYSTEMS

TASK SCHEDULING FOR PARALLEL SYSTEMS TASK SCHEDULING FOR PARALLEL SYSTEMS Oliver Sinnen Department of Electrical and Computer Engineering The University of Aukland New Zealand TASK SCHEDULING FOR PARALLEL SYSTEMS TASK SCHEDULING FOR PARALLEL

More information

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach A JOHN WILEY & SONS, INC., PUBLICATION Relational Database Index Design and the Optimizers

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be

More information

Copyright protected. Use is for Single Users only via a VHP Approved License. For information and printed versions please see

Copyright protected. Use is for Single Users only via a VHP Approved License. For information and printed versions please see TOGAF 9 Certified Study Guide 4th Edition The Open Group Publications available from Van Haren Publishing The TOGAF Series: The TOGAF Standard, Version 9.2 The TOGAF Standard Version 9.2 A Pocket Guide

More information

TEXT MINING APPLICATION PROGRAMMING

TEXT MINING APPLICATION PROGRAMMING TEXT MINING APPLICATION PROGRAMMING MANU KONCHADY CHARLES RIVER MEDIA Boston, Massachusetts Contents Preface Acknowledgments xv xix Introduction 1 Originsof Text Mining 4 Information Retrieval 4 Natural

More information

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Organizing Information. Organizing information is at the heart of information science and is important in many other

Organizing Information. Organizing information is at the heart of information science and is important in many other Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

MOBILE PEER TO PEER (P2P)

MOBILE PEER TO PEER (P2P) MOBILE PEER TO PEER (P2P) A TUTORIAL GUIDE Frank H. P. Fitzek, University of Aalborg, Denmark Hassan Charaf, Budapest University of Technology, Hungary A John Wiley and Sons, Ltd., Publication MOBILE

More information

0 Mastering Microsoft Office

0 Mastering Microsoft Office 0 Mastering Microsoft Office MACMILLAN MASTER SERIES Accounting Advanced English Language Advanced Pure Mathematics Arabic Banking Basic Management Biology British Politics Business Administration Business

More information

Network Convergence. Services, Applications, Transport, and Operations Support. Hu Hanrahan. John Wiley & Sons, Ltd

Network Convergence. Services, Applications, Transport, and Operations Support. Hu Hanrahan. John Wiley & Sons, Ltd Network Convergence Network Convergence Services, Applications, Transport, and Operations Support Hu Hanrahan University of the Witwatersrand, Johannesburg, South Africa John Wiley & Sons, Ltd Copyright

More information

Semantic Web Technologies Trends and Research in Ontology-based Systems

Semantic Web Technologies Trends and Research in Ontology-based Systems Semantic Web Technologies Trends and Research in Ontology-based Systems John Davies BT, UK Rudi Studer University of Karlsruhe, Germany Paul Warren BT, UK Semantic Web Technologies Semantic Web Technologies

More information

Information Retrieval: SciFinder

Information Retrieval: SciFinder Information Retrieval: SciFinder Information Retrieval: SciFinder Second Edition DAMON D. RIDLEY School of Chemistry, The University of Sydney A John Wiley and Sons, Ltd., Publication This edition first

More information

Agile Database Techniques Effective Strategies for the Agile Software Developer. Scott W. Ambler

Agile Database Techniques Effective Strategies for the Agile Software Developer. Scott W. Ambler Agile Database Techniques Effective Strategies for the Agile Software Developer Scott W. Ambler Agile Database Techniques Effective Strategies for the Agile Software Developer Agile Database Techniques

More information

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Designing Security Architecture Solutions Jay Ramachandran Wiley Computer Publishing John Wiley & Sons, Inc. Designing Security Architecture Solutions Designing Security Architecture Solutions Jay Ramachandran

More information

Mastering UNIX Shell Scripting

Mastering UNIX Shell Scripting Mastering UNIX Shell Scripting Bash, Bourne, and Korn Shell Scripting for Programmers, System Administrators, and UNIX Gurus Second Edition Randal K. Michael Wiley Publishing, Inc. Mastering UNIX Shell

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

Linux Command Line and Shell Scripting Bible

Linux Command Line and Shell Scripting Bible Linux Command Line and Shell Scripting Bible Richard Blum Wiley Publishing, Inc. Linux Command Line and Shell Scripting Bible Linux Command Line and Shell Scripting Bible Richard Blum Wiley Publishing,

More information

Contents. viii. List of figures. List of tables. OGC s foreword. 3 The ITIL Service Management Lifecycle core of practice 17

Contents. viii. List of figures. List of tables. OGC s foreword. 3 The ITIL Service Management Lifecycle core of practice 17 iii Contents List of figures List of tables OGC s foreword Chief Architect s foreword Preface vi viii ix x xi 2.7 ITIL conformance or compliance practice adaptation 13 2.8 Getting started Service Lifecycle

More information

Study Guide. Robert Schmidt Dane Charlton

Study Guide. Robert Schmidt Dane Charlton Study Guide Study Guide Robert Schmidt Dane Charlton Senior Acquisitions Editor: Kenyon Brown Development Editor: Candace English Technical Editors: Eric Biller and Brian Atkinson Production Editor: Christine

More information

SAP Jam Communities What's New 1808 THE BEST RUN. PUBLIC Document Version: August

SAP Jam Communities What's New 1808 THE BEST RUN. PUBLIC Document Version: August PUBLIC Document Version: August 2018 2018-10-26 2018 SAP SE or an SAP affiliate company. All rights reserved. THE BEST RUN Content 1 Release Highlights....3 1.1 Anonymous access to public communities....4

More information

Beginning Web Programming with HTML, XHTML, and CSS. Second Edition. Jon Duckett

Beginning Web Programming with HTML, XHTML, and CSS. Second Edition. Jon Duckett Beginning Web Programming with HTML, XHTML, and CSS Second Edition Jon Duckett Beginning Web Programming with HTML, XHTML, and CSS Introduction............................................... xxiii Chapter

More information

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS

More information

COMPONENT-ORIENTED PROGRAMMING

COMPONENT-ORIENTED PROGRAMMING COMPONENT-ORIENTED PROGRAMMING COMPONENT-ORIENTED PROGRAMMING ANDY JU AN WANG KAI QIAN Southern Polytechnic State University Marietta, Georgia A JOHN WILEY & SONS, INC., PUBLICATION Copyright 2005 by John

More information

Encyclopedia of Information Science and Technology

Encyclopedia of Information Science and Technology Encyclopedia of Information Science and Technology Second Edition Mehdi Khosrow-Pour Information Resources Management Association, USA Volume IV G-Internet INFORMATION SCIENCE REFERENCE Hershey New York

More information

AAM Guide for Authors

AAM Guide for Authors ISSN: 1932-9466 AAM Guide for Authors Application and Applied Mathematics: An International Journal (AAM) invites contributors from throughout the world to submit their original manuscripts for review

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Desktop Crawls. Document Feeds. Document Feeds. Information Retrieval

Desktop Crawls. Document Feeds. Document Feeds. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Web crawlers Retrieving web pages Crawling the web» Desktop crawlers» Document feeds File conversion Storing the documents Removing noise Desktop Crawls! Used

More information

Summary of Contents LIST OF FIGURES LIST OF TABLES

Summary of Contents LIST OF FIGURES LIST OF TABLES Summary of Contents LIST OF FIGURES LIST OF TABLES PREFACE xvii xix xxi PART 1 BACKGROUND Chapter 1. Introduction 3 Chapter 2. Standards-Makers 21 Chapter 3. Principles of the S2ESC Collection 45 Chapter

More information

Preface...xi Coverage of this edition...xi Acknowledgements...xiii

Preface...xi Coverage of this edition...xi Acknowledgements...xiii Contents Preface...xi Coverage of this edition...xi Acknowledgements...xiii 1 Basic concepts of information retrieval systems...1 Introduction...1 Features of an information retrieval system...2 Elements

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

7 Windows Tweaks. A Comprehensive Guide to Customizing, Increasing Performance, and Securing Microsoft Windows 7. Steve Sinchak

7 Windows Tweaks. A Comprehensive Guide to Customizing, Increasing Performance, and Securing Microsoft Windows 7. Steve Sinchak Take control of Windows 7 Unlock hidden settings Rev up your network Disable features you hate, for good Fine-tune User Account control Turbocharge online speed Master the taskbar and start button Customize

More information

Digital Electronics A Practical Approach with VHDL William Kleitz Ninth Edition

Digital Electronics A Practical Approach with VHDL William Kleitz Ninth Edition Digital Electronics A Practical Approach with VHDL William Kleitz Ninth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Libraries, Repositories and Metadata. Lara Whitelaw, Metadata Development Manager, The Open University. 22 Oct 2003

Libraries, Repositories and Metadata. Lara Whitelaw, Metadata Development Manager, The Open University. 22 Oct 2003 Libraries, Repositories and Metadata Lara Whitelaw, Metadata Development Manager, The Open University. 22 Oct 2003 Libraries Libraries are the traditional realm of metadata Use well defined standards and

More information

The Semantic Web Explained

The Semantic Web Explained The Semantic Web Explained The Semantic Web is a new area of research and development in the field of computer science, aimed at making it easier for computers to process the huge amount of information

More information

A Structured Programming Approach to Data

A Structured Programming Approach to Data A Structured Programming Approach to Data Macmillan Computer Science Series Consulting Editor: Professor F. H. Sumner, University of Manchester J. K. Buckle, The ICL 2900 Series Andrew J. T. Colin, Programming

More information

DIGITAL VIDEO DISTRIBUTION IN BROADBAND, TELEVISION, MOBILE AND CONVERGED NETWORKS

DIGITAL VIDEO DISTRIBUTION IN BROADBAND, TELEVISION, MOBILE AND CONVERGED NETWORKS DIGITAL VIDEO DISTRIBUTION IN BROADBAND, TELEVISION, MOBILE AND CONVERGED NETWORKS TRENDS, CHALLENGES AND SOLUTIONS Sanjoy Paul, Ph.D Formerly of Bell Labs and WINLAB, Rutgers University, USA, now of Infosys

More information

Information Retrieval May 15. Web retrieval

Information Retrieval May 15. Web retrieval Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically

More information

Research on Industrial Security Theory

Research on Industrial Security Theory Research on Industrial Security Theory Menggang Li Research on Industrial Security Theory Menggang Li China Centre for Industrial Security Research Beijing, People s Republic of China ISBN 978-3-642-36951-3

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

FUNDAMENTALS OF COMPUTER PROGRAMMING AND IT

FUNDAMENTALS OF COMPUTER PROGRAMMING AND IT FUNDAMENTALS OF COMPUTER PROGRAMMING AND IT SALIENT FEATURES OF THE PRESENT EDITION Motivates the unmotivated and provides the teachers an unequaled approach that allows them to teach students with a disparity

More information

ClickLearn Studio. - A technical guide 9/18/2017

ClickLearn Studio. - A technical guide 9/18/2017 ClickLearn Studio - A technical guide 9/18/2017 All products and companies mentioned in this document are or may be registered trademarks of their respective companies or owners. ClickLearn ApS reserves

More information

or: How to be your own Good Fairy

or: How to be your own Good Fairy Librarian E-Services in a Changing Information Continuum Challenges, Opportunities and the Need for 'Open' Approaches or: How to be your own Good Fairy Dr. Stefan Gradmann Regionales Rechenzentrum der

More information

TECHNICAL TRANSLATION

TECHNICAL TRANSLATION TECHNICAL TRANSLATION Technical Translation Usability Strategies for Translating Technical Documentation JODY BYRNE University of Sheffield, UK A C.I.P. Catalogue record for this book is available from

More information

From Open Data to Data- Intensive Science through CERIF

From Open Data to Data- Intensive Science through CERIF From Open Data to Data- Intensive Science through CERIF Keith G Jeffery a, Anne Asserson b, Nikos Houssos c, Valerie Brasse d, Brigitte Jörg e a Keith G Jeffery Consultants, Shrivenham, SN6 8AH, U, b University

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Information Retrieval and Knowledge Organisation

Information Retrieval and Knowledge Organisation Information Retrieval and Knowledge Organisation Knut Hinkelmann Content Information Retrieval Indexing (string search and computer-linguistic aproach) Classical Information Retrieval: Boolean, vector

More information

Multi-Core Programming

Multi-Core Programming Multi-Core Programming Increasing Performance through Software Multi-threading Shameem Akhter Jason Roberts Intel PRESS Copyright 2006 Intel Corporation. All rights reserved. ISBN 0-9764832-4-6 No part

More information

SEARCH TECHNIQUES: BASIC AND ADVANCED

SEARCH TECHNIQUES: BASIC AND ADVANCED 17 SEARCH TECHNIQUES: BASIC AND ADVANCED 17.1 INTRODUCTION Searching is the activity of looking thoroughly in order to find something. In library and information science, searching refers to looking through

More information

Object Oriented Programming

Object Oriented Programming Unit 19: Object Oriented Unit code: K/601/1295 QCF Level 4: BTEC Higher National Credit value: 15 Aim To provide learners with an understanding of the principles of object oriented programming as an underpinning

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

DATA VISUALIZATION WITH FLASH BUILDER

DATA VISUALIZATION WITH FLASH BUILDER DATA VISUALIZATION WITH FLASH BUILDER DESIGNING RIA AND AIR APPLICATIONS WITH REMOTE DATA SOURCES CESARE ROCCHI First published 2011 by Focal Press Published 2017 by Routledge 2 Park Square, Milton Park,

More information

Terminologies Services Strawman

Terminologies Services Strawman Terminologies Services Strawman Background This document was drafted for discussion for a meeting at the Metropolitan Museum of Art on September 12, 2007. This document was not intended to represent a

More information

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

BBK3253 Knowledge Management Prepared by Dr Khairul Anuar

BBK3253 Knowledge Management Prepared by Dr Khairul Anuar BBK3253 Knowledge Management Prepared by Dr Khairul Anuar L7: Developing and Managing Knowledge Repositories www.notes638.wordpress.com 1. Describe the key features of an effective knowledge repository

More information

Introduction & Administrivia

Introduction & Administrivia Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,

More information

"Charting the Course... MOC A: SharePoint 2016 Site Collections and Site Owner Administration. Course Summary

Charting the Course... MOC A: SharePoint 2016 Site Collections and Site Owner Administration. Course Summary MOC 55234 A: 2016 Site Collections Course Summary Description This five-day instructor-led course is intended for power users and IT professionals who are tasked with working within the 2016 environment

More information

Indexing and subject organisation

Indexing and subject organisation Indexing and subject organisation Madely du Preez Dept of Information Science University of South Africa (UNISA) LIASA IGBIS WORKSHOP 2018: 16-18 August, Centurion Lake Hotel. Menu Subject organisation

More information

How to use indexing languages in searching

How to use indexing languages in searching Indexing, searching, and retrieval 6.3.1. How to use indexing languages in searching Overview This module explains how you can become a better searcher by exploiting the power of indexing and indexing

More information

Outline. Structures for subject browsing. Subject browsing. Research issues. Renardus

Outline. Structures for subject browsing. Subject browsing. Research issues. Renardus Outline Evaluation of browsing behaviour and automated subject classification: examples from KnowLib Subject browsing Automated subject classification Koraljka Golub, Knowledge Discovery and Digital Library

More information

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17 Table of Contents 1 Introduction...1 1.1 Common Problem...1 1.2 Data Integration and Data Management...3 1.2.1 Information Quality Overview...3 1.2.2 Customer Data Integration...4 1.2.3 Data Management...8

More information

LOGICAL DATA MODELING

LOGICAL DATA MODELING LOGICAL DATA MODELING INTEGRATED SERIES IN INFORMATION SYSTEMS Professor Ramesh Sharda Oklahoma State University Series Editors Prof. Dr. Stefan VoB Universitat Hamburg Expository and Research Monographs

More information

Join the p2p.wrox.com. Wrox Programmer to Programmer. Beginning PHP 5.3. Matt Doyle

Join the p2p.wrox.com. Wrox Programmer to Programmer. Beginning PHP 5.3. Matt Doyle Join the discussion @ p2p.wrox.com Wrox Programmer to Programmer Beginning PHP 5.3 Matt Doyle Programmer to Programmer Get more out of WROX.com Interact Take an active role online by participating in our

More information

Data is the new Oil (Ann Winblad)

Data is the new Oil (Ann Winblad) Data is the new Oil (Ann Winblad) Keith G Jeffery keith.jeffery@keithgjefferyconsultants.co.uk 20140415-16 JRC Workshop Big Open Data Keith G Jeffery 1 Data is the New Oil Like oil has been, data is Abundant

More information