Articulating Information Needs in

Size: px
Start display at page:

Download "Articulating Information Needs in"

Transcription

1 Articulating Information Needs in XML Query Languages Jaap Kamps, Maarten Marx, Maarten de Rijke and Borkur Sigurbjornsson Gonçalo Antunes

2 Motivation Users have increased access to documents with additional semantic information through XML markup. A new approach for querying documentcentric XML is needed. Understand which approach to query XML documents can be more effective in satisfying the information needs of users.

3 Problem How do users exploit the additional expressive power of structural constraints in their queries, what queries do users formulate, and what is the meaning of these queries? What is the effect on retrieval performance of adding structural constraints to queries? What is the appropriate query language for XML retrieval?

4 Results Structural constraints are mainly used as search hints, not as strict requirements: hierarchical structure of documents used in one third of the queries Three quarters of the queries put constraints on the context of the element to retrieve Adding structural constraints has a positive effect on early precision and a negative effect on overall recall A typology of the different uses of content and structure queries, and intuitive mathematical models of users knowledge of a set of XML documents, and the formulation of query languages which exactly fit this knowledge are provided

5 INEX -Initiative For The Evaluation Of XML Retrieval Initiative to evaluate the retrieval methods of participants Uniform scoring procedures and comparison of results Large testbed of XML documents (circa xml documents) Queries: CO queries: the retrieval system identifies the most appropriate XML elements to return to the system CAS queries: structural constraints explicitly stated and can refer the types of elements to retrieve

6 CAS topic (2003)

7 CO topic (2003)

8 NEXI Query Language (2004) Based on Xpath Uses only the descendent axis, from the current node (e.g. // ) Uses only the booleans ande or in filter expressions Should contain at least one about function free text search The rightmost filter should be an about function

9 NEXI Query Language Examples //sec[about(., vector space model )] //article[about(., web search engine )]//sec[about (., vector space model )]

10 How do users exploit the additional expressive power of structural constraints in their queries?

11 Requested Elements Queries allow the users to specify the types of elements that should be return as answers.

12 Elements Judged Relevant Relevance assessments are made focusing on highly specific and highly exaustive elements.

13 Requested versus Relevant Elements Investigate how often the element that is judged relevant actually has the tag name specified in the query. Frequency of relevant elements on columns Elements with tag names on rows

14 So, how do users exploit the additional expressive power of structural constraints in their queries? Assessors felt that their information needs where also satisfied by elements not respecting the target constraints In most cases, elements satisfying the target constraints are the largest category Element names as requested in the query can only be considered a retrieval hint, and not a strict constraint on the output.

15 What is the effect on retrieval performance of adding structural constraints to queries?

16 Queries Structured query: e.g., //article[about(.//abs, sorting)]//sec[about(., heap sort)] Target-only query: e.g., //article//sec[about(., sorting heap sort)] Content-only query: e.g., //*[about(., sorting heap sort)]

17 Experimental Setup Evaluation performed using trec_evaland EvalJ Three runs using the queries Runs differ in the amount of structure, ranging from no structured constraints used to all structured constraints used.

18 Processing Decomposition: The query is decomposed into a sequence of pairs of the form (location path, content description), one for each about function. (//article//abs, sorting ) (//article//sec, heap sort ) Retrieval: For each pair, XML elements satisfying the location path are scored using a language model retrieval approach. Mixture: For each element satisfying the target constraints, it is considered other elements satisfying the tree pattern of the query (consider the corresponding abstract elements for a particular section element). Result: the sum of scores of the aboutfunctions for elements satisfying the target constraints.

19 Retrieval Model (for the Retrieval Step) Probability of generating ti given element e Probability of generating ti given the collection Interpolation factor (smoothing) B=1,5 if target element B= 0 if other element Sum of the tf s of all the terms of the element e

20 Results Bigger precision on contentonly queries Bigger precision on the first five results

21 So, what is the effect on retrieval performance of adding structural constraints to queries? Structured queries do not lead to improved mean average precision scores, at higher recall levels. Structured queries lead to significantly superior early precision scores; Structural constraints function as a precision enhancing device. In general, content-only queries outperform structured queries.

22 What are the typical sorts of content- and-structure queries that users formulate in the NEXI query language?

23 Dimensions Hierarchy: whether the query uses hierarchical information about the documents. Context: whether the query puts content constraints on text occurring outside the element to be returned.

24 Categories //sec[about(., xxx )] //sec[about(., yyy ) and about(//abs, xxx )] //sec[about(., xxx ) and about(.//thm, yyy )] //sec[about(., xxx ) and about(.//thm, yyy ) and about(//abs, zzz )]

25 How structure is used?

26 So, what are the typical sorts of content-and-structure queries that users formulate in the NEXI query language? The hierarchical nature of the documents is used in one third of the examined queries. Almost three quarters of the queries use content constraints on particular elements occurring in the context of elements to be returned.

27 Is the NEXI language the most appropriate one for XML retrieval?

28 NEXI query language Is a restricted form of Xpath Two competing forces: safety, which reduces expressive power, and completeness, which asks for as much expressivity as possible. User profiles: Structure-Unaware users and Hierarchy-Aware users

29 Structure-Unaware Users The typical queries are restricted search and contextual content information (they only know tag names) Structure-unaware queries (Bisimulation property): //tag[p], where P is a predicate created using and, or, and not from location paths self::tag and queries of the form //tag[p] it simply says that somewhere on the document there is a tag element making P true. e.g.: //section[//abstract]

30 Hierarchy-Aware Users Have some clue about the hierarchical structure of the documents E.g. know that paragraphs are below sections, but need not know of elements in between. Vertical simulation property. //tag[p], P=.//tag[Q] ; //section[.//paragraph]]

31 Results Structural constraints are mainly used as search hints, not as strict requirements: hierarchical structure of documents used in one third of the queries Three quarters of the queries put constraints on the context of the element to retrieve Adding structural constraints has a positive effect on early precision and a negative effect on overall recall A typology of the different uses of content and structure queries, and intuitive mathematical models of users knowledge of a set of XML documents, and the formulation of query languages which exactly fit this knowledge are provided

Structured Queries in XML Retrieval

Structured Queries in XML Retrieval Structured Queries in XML Retrieval Jaap Kamps 1,2 Maarten Marx 2 Maarten de Rijke 2 Börkur Sigurbjörnsson 2 1 Archives and Information Studies, University of Amsterdam, Amsterdam, The Netherlands 2 Informatics

More information

The Effect of Structured Queries and Selective Indexing on XML Retrieval

The Effect of Structured Queries and Selective Indexing on XML Retrieval The Effect of Structured Queries and Selective Indexing on XML Retrieval Börkur Sigurbjörnsson 1 and Jaap Kamps 1,2 1 ISLA, Faculty of Science, University of Amsterdam 2 Archives and Information Studies,

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 13 Structured Text Retrieval with Mounia Lalmas Introduction Structuring Power Early Text Retrieval Models Evaluation Query Languages Structured Text Retrieval, Modern

More information

Score Region Algebra: Building a Transparent XML-IR Database

Score Region Algebra: Building a Transparent XML-IR Database Vojkan Mihajlović Henk Ernst Blok Djoerd Hiemstra Peter M. G. Apers Score Region Algebra: Building a Transparent XML-IR Database Centre for Telematics and Information Technology (CTIT) Faculty of Electrical

More information

The Interpretation of CAS

The Interpretation of CAS The Interpretation of CAS Andrew Trotman 1 and Mounia Lalmas 2 1 Department of Computer Science, University of Otago, Dunedin, New Zealand andrew@cs.otago.ac.nz, 2 Department of Computer Science, Queen

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom,

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, XML Retrieval Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, mounia@acm.org Andrew Trotman, Department of Computer Science, University of Otago, New Zealand,

More information

Component ranking and Automatic Query Refinement for XML Retrieval

Component ranking and Automatic Query Refinement for XML Retrieval Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

Self Managing Top-k (Summary, Keyword) Indexes in XML Retrieval

Self Managing Top-k (Summary, Keyword) Indexes in XML Retrieval Self Managing Top-k (Summary, Keyword) Indexes in XML Retrieval Mariano P. Consens Xin Gu Yaron Kanza Flavio Rizzolo University of Toronto {consens, xgu, yaron, flavio}@cs.toronto.edu Abstract Retrieval

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

The Importance of Length Normalization for XML Retrieval

The Importance of Length Normalization for XML Retrieval The Importance of Length Normalization for XML Retrieval Jaap Kamps, (kamps@science.uva.nl) Maarten de Rijke (mdr@science.uva.nl) and Börkur Sigurbjörnsson (borkur@science.uva.nl) Informatics Institute,

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

The Heterogeneous Collection Track at INEX 2006

The Heterogeneous Collection Track at INEX 2006 The Heterogeneous Collection Track at INEX 2006 Ingo Frommholz 1 and Ray Larson 2 1 University of Duisburg-Essen Duisburg, Germany ingo.frommholz@uni-due.de 2 University of California Berkeley, California

More information

Efficient, Effective and Flexible XML Retrieval Using Summaries

Efficient, Effective and Flexible XML Retrieval Using Summaries Efficient, Effective and Flexible XML Retrieval Using Summaries M. S. Ali, Mariano Consens, Xin Gu, Yaron Kanza, Flavio Rizzolo, and Raquel Stasiu University of Toronto {sali, consens, xgu, yaron, flavio,

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

A Voting Method for XML Retrieval

A Voting Method for XML Retrieval A Voting Method for XML Retrieval Gilles Hubert 1 IRIT/SIG-EVI, 118 route de Narbonne, 31062 Toulouse cedex 4 2 ERT34, Institut Universitaire de Formation des Maîtres, 56 av. de l URSS, 31400 Toulouse

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,

More information

Overview of the INEX 2010 Ad Hoc Track

Overview of the INEX 2010 Ad Hoc Track Overview of the INEX 2010 Ad Hoc Track Paavo Arvola 1 Shlomo Geva 2, Jaap Kamps 3, Ralf Schenkel 4, Andrew Trotman 5, and Johanna Vainio 1 1 University of Tampere, Tampere, Finland paavo.arvola@uta.fi,

More information

Focused Information Access using XML Element Retrieval

Focused Information Access using XML Element Retrieval Focused Information Access using XML Element Retrieval Börkur Sigurbjörnsson Promotor: Prof.dr. Maarten de Rijke Co-promotor: Dr.ir. Jaap Kamps Committee: Prof.Dr.-Ing. Norbert Fuhr Prof. Mounia Lalmas

More information

Structural Feedback for Keyword-Based XML Retrieval

Structural Feedback for Keyword-Based XML Retrieval Structural Feedback for Keyword-Based XML Retrieval Ralf Schenkel and Martin Theobald Max-Planck-Institut für Informatik, Saarbrücken, Germany {schenkel, mtb}@mpi-inf.mpg.de Abstract. Keyword-based queries

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Identifying and Ranking Relevant Document Elements

Identifying and Ranking Relevant Document Elements Identifying and Ranking Relevant Document Elements Andrew Trotman and Richard A. O Keefe Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz, ok@otago.ac.nz ABSTRACT

More information

University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks

University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks Marijn Koolen 1, Rianne Kaptein 1, and Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University

More information

Semantic Characterizations of XPath

Semantic Characterizations of XPath Semantic Characterizations of XPath Maarten Marx Informatics Institute, University of Amsterdam, The Netherlands CWI, April, 2004 1 Overview Navigational XPath is a language to specify sets and paths in

More information

Focused Retrieval Using Topical Language and Structure

Focused Retrieval Using Topical Language and Structure Focused Retrieval Using Topical Language and Structure A.M. Kaptein Archives and Information Studies, University of Amsterdam Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands a.m.kaptein@uva.nl Abstract

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Structural Features in Content Oriented XML retrieval

Structural Features in Content Oriented XML retrieval Structural Features in Content Oriented XML retrieval Georgina Ramírez Thijs Westerveld Arjen P. de Vries georgina@cwi.nl thijs@cwi.nl arjen@cwi.nl CWI P.O. Box 9479, 19 GB Amsterdam, The Netherlands ABSTRACT

More information

Lab 2 Test collections

Lab 2 Test collections Lab 2 Test collections Information Retrieval, 2017 Goal Introduction The objective of this lab is for you to get acquainted with working with an IR test collection and Lemur Indri retrieval system. Instructions

More information

European Web Retrieval Experiments at WebCLEF 2006

European Web Retrieval Experiments at WebCLEF 2006 European Web Retrieval Experiments at WebCLEF 2006 Stephen Tomlinson Hummingbird Ottawa, Ontario, Canada stephen.tomlinson@hummingbird.com http://www.hummingbird.com/ August 20, 2006 Abstract Hummingbird

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Reducing Redundancy with Anchor Text and Spam Priors

Reducing Redundancy with Anchor Text and Spam Priors Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

UMass at TREC 2006: Enterprise Track

UMass at TREC 2006: Enterprise Track UMass at TREC 2006: Enterprise Track Desislava Petkova and W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst, MA 01003 Abstract

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO INDEX Proposal Recap Implementation Evaluation Future Works Proposal Recap Keyword Visualizer (chrome

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Jovan Pehcevski, James Thom, Anne-Marie Vercoustre To cite this version: Jovan Pehcevski, James Thom, Anne-Marie Vercoustre.

More information

Comparative Analysis of Clicks and Judgments for IR Evaluation

Comparative Analysis of Clicks and Judgments for IR Evaluation Comparative Analysis of Clicks and Judgments for IR Evaluation Jaap Kamps 1,3 Marijn Koolen 1 Andrew Trotman 2,3 1 University of Amsterdam, The Netherlands 2 University of Otago, New Zealand 3 INitiative

More information

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete and Carlos Martín-Dancausa Departamento de Ciencias de

More information

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University

More information

CISC689/ Information Retrieval Midterm Exam

CISC689/ Information Retrieval Midterm Exam CISC689/489-010 Information Retrieval Midterm Exam You have 2 hours to complete the following four questions. You may use notes and slides. You can use a calculator, but nothing that connects to the internet

More information

Configurable Indexing and Ranking for XML Information Retrieval

Configurable Indexing and Ranking for XML Information Retrieval Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCL Computer Science Department, Los ngeles, C, US 90095 {sliu, zou, wwc}@cs.ucla.edu BSTRCT

More information

DCU and 2010: Ad-hoc and Data-Centric tracks

DCU and 2010: Ad-hoc and Data-Centric tracks DCU and ISI@INEX 2010: Ad-hoc and Data-Centric tracks Debasis Ganguly 1, Johannes Leveling 1, Gareth J. F. Jones 1 Sauparna Palchowdhury 2, Sukomal Pal 2, and Mandar Mitra 2 1 CNGL, School of Computing,

More information

The University of Amsterdam at the CLEF 2008 Domain Specific Track

The University of Amsterdam at the CLEF 2008 Domain Specific Track The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl

More information

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen)

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) Accessing XML documents: The INEX initiative Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) XML documents Book Chapters Sections World Wide Web This is only only another

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

A Fusion Approach to XML Structured Document Retrieval

A Fusion Approach to XML Structured Document Retrieval A Fusion Approach to XML Structured Document Retrieval Ray R. Larson School of Information Management and Systems University of California, Berkeley Berkeley, CA 94720-4600 ray@sims.berkeley.edu 17 April

More information

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT)

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval Information Retrieval Information retrieval (IR) is the science of searching for information in

More information

Sound ranking algorithms for XML search

Sound ranking algorithms for XML search Sound ranking algorithms for XML search Djoerd Hiemstra 1, Stefan Klinger 2, Henning Rode 3, Jan Flokstra 1, and Peter Apers 1 1 University of Twente, 2 University of Konstanz, and 3 CWI hiemstra@cs.utwente.nl,

More information

XPath with transitive closure

XPath with transitive closure XPath with transitive closure Logic and Databases Feb 2006 1 XPath with transitive closure Logic and Databases Feb 2006 2 Navigating XML trees XPath with transitive closure Newton Institute: Logic and

More information

KNOW At The Social Book Search Lab 2016 Suggestion Track

KNOW At The Social Book Search Lab 2016 Suggestion Track KNOW At The Social Book Search Lab 2016 Suggestion Track Hermann Ziak and Roman Kern Know-Center GmbH Inffeldgasse 13 8010 Graz, Austria hziak, rkern@know-center.at Abstract. Within this work represents

More information

THE weighting functions of information retrieval [1], [2]

THE weighting functions of information retrieval [1], [2] A Comparative Study of MySQL Functions for XML Element Retrieval Chuleerat Jaruskulchai, Member, IAENG, and Tanakorn Wichaiwong, Member, IAENG Abstract Due to the ever increasing information available

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 10: XML Retrieval Hinrich Schütze, Christina Lioma Center for Information and Language Processing, University of Munich 2010-07-12

More information

Heading-aware Snippet Generation for Web Search

Heading-aware Snippet Generation for Web Search Heading-aware Snippet Generation for Web Search Tomohiro Manabe and Keishi Tajima Graduate School of Informatics, Kyoto Univ. {manabe@dl.kuis, tajima@i}.kyoto-u.ac.jp Web Search Result Snippets Are short

More information

CoXML: A Cooperative XML Query Answering System

CoXML: A Cooperative XML Query Answering System CoXML: A Cooperative XML Query Answering System Shaorong Liu 1 and Wesley W. Chu 2 1 IBM Silicon Valley Lab, San Jose, CA, 95141, USA shaorongliu@gmail.com 2 UCLA Computer Science Department, Los Angeles,

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval

More information

Birkbeck (University of London)

Birkbeck (University of London) Birkbeck (University of London) MSc Examination for Internal Students Department of Computer Science and Information Systems Information Retrieval and Organisation (COIY64H7) Credit Value: 5 Date of Examination:

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1.

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1. 7.2 Binary Min-Heaps A heap is a tree-based structure, but it doesn t use the binary-search differentiation between the left and right sub-trees to create a linear ordering. Instead, a binary heap only

More information

INEX REPORT. Report on INEX 2012

INEX REPORT. Report on INEX 2012 INEX REPORT Report on INEX 2012 P. Bellot T. Chappell A. Doucet S. Geva S. Gurajada J. Kamps G. Kazai M. Koolen M. Landoni M. Marx A. Mishra V. Moriceau J. Mothe M. Preminger G. Ramírez M. Sanderson E.

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Applying the IRStream Retrieval Engine to INEX 2003

Applying the IRStream Retrieval Engine to INEX 2003 Applying the IRStream Retrieval Engine to INEX 2003 Andreas Henrich, Volker Lüdecke University of Bamberg D-96045 Bamberg, Germany {andreas.henrich volker.luedecke}@wiai.unibamberg.de Günter Robbert University

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

Contents. 1 Introduction Basic XML concepts Historical perspectives Query languages Contents... 2

Contents. 1 Introduction Basic XML concepts Historical perspectives Query languages Contents... 2 XML Retrieval 1 2 Contents Contents......................................................................... 2 1 Introduction...................................................................... 5 2 Basic

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

The University of Amsterdam at INEX 2008: Ad Hoc, Book, Entity Ranking, Interactive, Link the Wiki, and XML Mining Tracks

The University of Amsterdam at INEX 2008: Ad Hoc, Book, Entity Ranking, Interactive, Link the Wiki, and XML Mining Tracks The University of Amsterdam at INEX 2008: Ad Hoc, Book, Entity Ranking, Interactive, Link the Wiki, and XML Mining Tracks Khairun Nisa Fachry 1, Jaap Kamps 1,2, Rianne Kaptein 1, Marijn Koolen 1, and Junte

More information

XPath Inverted File for Information Retrieval

XPath Inverted File for Information Retrieval XPath Inverted File for Information Retrieval Shlomo Geva Centre for Information Technology Innovation Faculty of Information Technology Queensland University of Technology GPO Box 2434 Brisbane Q 4001

More information

DWMJL. i Mrs. Rouse carried a small in- Board of T r a d e to adopt or s p o n - of Hastings.

DWMJL. i Mrs. Rouse carried a small in- Board of T r a d e to adopt or s p o n - of Hastings. XXX Y Y 9 3 Q - % Y < < < - Q 6 3 3 3 Y Y 7 - - - - - - Y 93 ; - ; z ; x - 77 ; q ; - 76 3; - x - 37 - - x - - - - - q - - - x - - - q - - ) - - Y - ; ] x x x - z q - % Z Z # - - 93 - - x / } z x - - {

More information

Searching Image Databases Containing Trademarks

Searching Image Databases Containing Trademarks Searching Image Databases Containing Trademarks Sujeewa Alwis and Jim Austin Department of Computer Science University of York York, YO10 5DD, UK email: sujeewa@cs.york.ac.uk and austin@cs.york.ac.uk October

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

Edit Distance for XML Information Retrieval : Some Experiments on the Datacentric Track of INEX 2011

Edit Distance for XML Information Retrieval : Some Experiments on the Datacentric Track of INEX 2011 Edit Distance for XML Information Retrieval : Some Experiments on the Datacentric Track of INEX 2011 Cyril Laitang, Karen Pinel-Sauvagnat, and Mohand Boughanem IRIT-SIG, 118 route de Narbonne, 31062 Toulouse

More information

Form Identifying. Figure 1 A typical HTML form

Form Identifying. Figure 1 A typical HTML form Table of Contents Form Identifying... 2 1. Introduction... 2 2. Related work... 2 3. Basic elements in an HTML from... 3 4. Logic structure of an HTML form... 4 5. Implementation of Form Identifying...

More information

System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms

System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms SysCon 2008 IEEE International Systems Conference Montreal, Canada, April 7 10, 2008 System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms Joseph J. Simpson 1, Dr. Cihan

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University

More information

Specificity Aboutness in XML Retrieval

Specificity Aboutness in XML Retrieval Specificity Aboutness in XML Retrieval Tobias Blanke and Mounia Lalmas Department of Computing Science, University of Glasgow tobias.blanke@dcs.gla.ac.uk mounia@acm.org Abstract. This paper presents a

More information

Comp 336/436 - Markup Languages. Fall Semester Week 9. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 9. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2018 - Week 9 Dr Nick Hayward DEV Week assessment Course total = 25% project outline and introduction developed using a chosen markup language consider and

More information

Content Creation and Management System. External User Guide 1 Logging in to CCMS

Content Creation and Management System. External User Guide 1 Logging in to CCMS Content Creation and Management System External User Guide 1 Logging in to CCMS External User Guide 1 OCR August 2016 CONTENTS 1. INTRODUCING THE SYSTEM AND ACCESS... 3 1.1. Audience... 3 1.2. Background...

More information

Questions Total Points Score

Questions Total Points Score HKUST Department of Computer Science and Engineering # COMP3711H: Design and Analysis of Algorithms Fall 2016 Final Examination Date: Friday December 16, 2016 Time: 16:30-19:30 Venue: LG3 Multipurpose

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Midterm Exam Search Engines ( / ) October 20, 2015

Midterm Exam Search Engines ( / ) October 20, 2015 Student Name: Andrew ID: Seat Number: Midterm Exam Search Engines (11-442 / 11-642) October 20, 2015 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Face recognition algorithms: performance evaluation

Face recognition algorithms: performance evaluation Face recognition algorithms: performance evaluation Project Report Marco Del Coco - Pierluigi Carcagnì Institute of Applied Sciences and Intelligent systems c/o Dhitech scarl Campus Universitario via Monteroni

More information

INEX REPORT. Report on INEX 2011

INEX REPORT. Report on INEX 2011 INEX REPORT Report on INEX 2011 P. Bellot T. Chappell A. Doucet S. Geva J. Kamps G. Kazai M. Koolen M. Landoni M. Marx V. Moriceau J. Mothe G. Ramírez M. Sanderson E. Sanjuan F. Scholer X. Tannier M. Theobald

More information

Report on the SIGIR 2008 Workshop on Focused Retrieval

Report on the SIGIR 2008 Workshop on Focused Retrieval WORKSHOP REPORT Report on the SIGIR 2008 Workshop on Focused Retrieval Jaap Kamps 1 Shlomo Geva 2 Andrew Trotman 3 1 University of Amsterdam, Amsterdam, The Netherlands, kamps@uva.nl 2 Queensland University

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information