LaHC at CLEF 2015 SBS Lab

Similar documents
DANCer: Dynamic Attributed Network with Community Structure Generator

Multimodal Social Book Search

Word Embedding for Social Book Suggestion

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Setup of epiphytic assistance systems with SEPIA

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

UJM at ImageCLEFwiki 2008

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Linked data from your pocket: The Android RDFContentProvider

Service Reconfiguration in the DANAH Assistive System

Using a Medical Thesaurus to Predict Query Difficulty

Relabeling nodes according to the structure of the graph

Tacked Link List - An Improved Linked List for Advance Resource Reservation

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Taking Benefit from the User Density in Large Cities for Delivering SMS

Malware models for network and service management

Multimedia CTI Services for Telecommunication Systems

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

X-Kaapi C programming interface

Verbose Query Reduction by Learning to Rank for Social Book Search Track

Comparison of spatial indexes

Syrtis: New Perspectives for Semantic Web Adoption

From medical imaging to numerical simulations

YAM++ : A multi-strategy based approach for Ontology matching task

XML Document Classification using SVM

Comparator: A Tool for Quantifying Behavioural Compatibility

SDLS: a Matlab package for solving conic least-squares problems

Change Detection System for the Maintenance of Automated Testing

How to simulate a volume-controlled flooding with mathematical morphology operators?

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Mokka, main guidelines and future

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

Cultural micro-blog Contextualization 2016 Workshop Overview: data and pilot tasks

Every 3-connected, essentially 11-connected line graph is hamiltonian

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

The optimal routing of augmented cubes.

SIM-Mee - Mobilizing your social network

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

HySCaS: Hybrid Stereoscopic Calibration Software

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

Efficient implementation of interval matrix multiplication

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Privacy-preserving carpooling

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

RecordMe: A Smartphone Application for Experimental Collections of Large Amount of Data Respecting Volunteer s Privacy

Structuring the First Steps of Requirements Elicitation

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Deformetrica: a software for statistical analysis of anatomical shapes

Assisted Policy Management for SPARQL Endpoints Access Control

Handwritten Digit Recognition using Edit Distance-Based KNN

Scan chain encryption in Test Standards

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories

The Connectivity Order of Links

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

A Methodology for Improving Software Design Lifecycle in Embedded Control Systems

Experimental Evaluation of an IEC Station Bus Communication Reliability

Synthesis of fixed-point programs: the case of matrix multiplication

An Experimental Assessment of the 2D Visibility Complex

Linux: Understanding Process-Level Power Consumption

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

Unifying the Shift and Narrow Strategies in Focus+Context Exploratory Search

Simulations of VANET Scenarios with OPNET and SUMO

Representation of Finite Games as Network Congestion Games

Natural Language Based User Interface for On-Demand Service Composition

Learning Object Representations for Visual Object Class Recognition

A Voronoi-Based Hybrid Meshing Method

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Comparison of radiosity and ray-tracing methods for coupled rooms

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

QAKiS: an Open Domain QA System based on Relational Patterns

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

A Multi-purpose Objective Quality Metric for Image Watermarking

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

RETIN AL: An Active Learning Strategy for Image Category Retrieval

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Implementing an Automatic Functional Test Pattern Generation for Mixed-Signal Boards in a Maintenance Context

Traffic Grooming in Bidirectional WDM Ring Networks

Emerging and scripted roles in computer-supported collaborative learning

Kernel perfect and critical kernel imperfect digraphs structure

Catalogue of architectural patterns characterized by constraint components, Version 1.0

Efficient Gradient Method for Locally Optimizing the Periodic/Aperiodic Ambiguity Function

Technical documents classification

Evolutionary Cutting Planes

XBenchMatch: a Benchmark for XML Schema Matching Tools

Moveability and Collision Analysis for Fully-Parallel Manipulators

A 64-Kbytes ITTAGE indirect branch predictor

Fuzzy sensor for the perception of colour

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Impact of Visual Information on Text and Content Based Image Retrieval

Kinematic skeleton extraction based on motion boundaries for 3D dynamic meshes

Transcription:

LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer, Mathias Géry To cite this version: Nawal Ould-Amer, Mathias Géry. LaHC at CLEF 2015 SBS Lab. Conference and Labs of the Evaluation Forum, Sep 2015, Toulouse, France. 2015. <ujm-01219458> HAL Id: ujm-01219458 https://hal-ujm.archives-ouvertes.fr/ujm-01219458 Submitted on 22 Oct 2015 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer 1 and Mathias Géry 2 1 Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France CNRS, LIG, F-38000 Grenoble, France 2 Université de Lyon, F-42023, Saint-Étienne, France, CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000, Saint-Étienne, France Université de Saint-Étienne, Jean-Monnet, F-42000, Saint-Étienne, France Nawal.Ould-Amer@imag.fr, Mathias.Gery@univ-st-etienne.fr Abstract. This paper describes the work of the LaHC lab of Saint- Étienne for the Social Book Search lab at CLEF 2015. Our goals were i) to study a field-based retrieval model (BM25F), exploiting various topics and documents fields, in order to build a strong baseline for further experiments, ii) to compare it with a Log logistic (LGD) retrieval model, and iii) to exploit some documents related to each topic (i.e. the documents given as negative or positive examples for a topic). The official results show that LGD outperforms BM25F, and that our approaches exploiting documents related to the topic requesters are based on a different interpretation of this additional information than the interpretation of the Social Book Search organizers. Keywords: Field-based Information Retrieval, Re-ranking, Relevance Feedback 1 Introduction This paper describes the work of the LaHC lab of Saint-Étienne for the Suggestion track of the Social Book Search lab at CLEF 2015. The goal is to investigate techniques to support users in searching and navigating the full texts of digitised books and complementary social media as well as providing a forum for the exchange of research ideas and contributions [3]. Participants to this track have to suggest books based on rich search requests combining several topical and contextual relevance signals, as well as user profiles and real-world relevance judgments. The dataset is based on 1.5 million books descriptions and metadata, some of them user-generated, crawled from Amazon and LibraryThing. Our participation to the Social Book Search lab at CLEF 2014 [2] has shown us that the SBS dataset contains many various kind of data, and that before experimenting our Social Information Retrieval models, we need a better understanding of how to represent and how to exploit non-social (but nevertheless complex) data using classic models. Especially, our work for the Social Book Search lab at CLEF 2015 focuses on:

study a field-based retrieval model (BM25F [5]), exploiting various topics and documents fields, in order to build a strong baseline for further experiments; compare BM25F with some other Information Retrieval models, especially the Log logistic (LGD [1]) retrieval model; exploit some non-social data (but nevertheless related to the users), especially the documents given as negative or positive examples for a topic. Our experiments were conducted using the Terrier Information Retrieval System 3 [4], that implements various IR models and especially LGD and also some field-based models as BM25F. The paper is organized as follows: Section 2 presents briefly the Information Retrieval models used. Then, Section 3 details our approaches aiming at exploiting the positive or negative documents related to each topic. Finally, Section 4 presents the official results obtained, before concluding in section 5. 2 BM25F vs LGD (runs UJM 1 and UJM 2) The Social Book Search 2015 dataset contains many various data describing or related to the documents, the topics and the users. Among all these information, we have used the Terrier Information Retrieval System implementation of the field-based models BM25F [5], in order to exploit the following fields from documents and topics: the fields title, summary, content and tags from the documents; the fields title, mediated query and narrative from the topics; BM25F was used with the parameters values presented in Table 1, taken from our participation to Social Book Search lab 2014 [2], generating our run named UJM 1. Table 1. BM25F parameters Fields title summary content tags Parameter c 1.0 0.10 0.45 0.00 Weight w 1 2 1 6 LGD is the Terrier implementation of the Log logistic model [1]. Grid optimization of parameter c on SBS 2014 data led to fix it at 0.2, generating our run named UJM 2. 3 Terrier: http://www.terrier.org

3 Documents given as examples (runs UJM 3, UJM 4, UJM 5, UJM 6) Our last goal was to exploit some non-social data (but nevertheless related to the users): the list of documents given as negative or positive examples for each topic. Our idea was that a user might be interested (respectively unsatisfied) if he finds as answers documents that he has read and that he appreciated (respectively disliked) and thus defined as positive (respectively negative) example for the topic. We implemented this hypothesis in two ways: Re-ranking (RR): Achieve a re-ranking where the document a user likes are boosted, and the documents he dislikes are removed for the result. After the score normalization between 0 and 1, we add 1 to the normalized score of document that the user likes and to set the score to 0 for the document that he dislikes. This process is then a post-processing of an existing run. It is worth noting that, if several retrieved documents are liked by the user, their relative initial ranking is preserved; It has been applied on our BM25F run UJM 1 (generating our run UJM 4) and also on our Log logistic run LGD run UJM 2 (generating our run UJM 6); Relevance Feedback (RF): Define a relevance feedback, positive for the documents that the topic user likes, and negative for the documents he does not like. We achieved such relevance feedback on our BM25F UJM 1 run (generating our run UJM 5), and also on our Log logistic LGD run UJM 2 (generating our run UJM 3). The relevance feedback uses all the positive documents and selects the top 10 terms according to the default selection of Terrier [4]. 4 Results The Table 2 presents the official results obtained by our 6 runs. Log logistic (LGD) outperforms BM25F, regarding the official ndcg@10 measure as well as regarding the 3 other measures, despite the fact that BM25F is designed to take into account and to weight the different fields describing the documents. Our approaches Re-ranking (RR) and Relevance Feedback (RF), both exploiting the list of documents given as negative or positive examples for each topic, lower the quality of the results. These approaches are based on a different interpretation of this list of documents than the interpretation of the Social Book Search organizers. Actually, these examples documents (the negative ones as well as the positive ones) are not considered as relevant by the organizers. Thus, re-ranking positively the positive examples (or using them as relevant documents in a relevance feedback process) can only lower the results. On the other hand, removing the negative examples from our runs (or using them as irrelevant documents in a relevance feedback process) may sometimes improve the results. All in all, the quality of our results is lowered.

Table 2. Official results for the 6 UJM runs Rank Run ndcg@10 MRR MAP R@1000 Profiles 17 UJM 2 (LGD) 0.088 0.174 0.065 0.483 no 20 UJM 6 (LGD + RR) 0.084 0.160 0.060 0.483 no 24 UJM 1 (BM25F) 0.081 0.167 0.056 0.471 no 29 UJM 3 (LGD + RF) 0.079 0.155 0.059 0.485 no 30 UJM 4 (BM25F + RR) 0.079 0.158 0.055 0.471 no 33 UJM 5 (BM25F + RF) 0.074 0.150 0.054 0.471 no 5 Conclusion This paper describes the work of the LaHC lab of Saint-Étienne for the Social Book Search lab at CLEF 2015. Our quite basic experiments show that Log logistic (LGD) outperforms BM25F. Four of our six runs were based on a misinterpretation of the documents given as negative or positive examples for a topic. Our 2015 participation allows us to build a strong basis in order to experiment in the future some more advanced some Personalized Information Retrieval approaches. Acknowledgment This work is supported by Région Rhône-Alpes through the ReSPIr project. References 1. Clinchant, S., Gaussier, E.: A Log-Logistic Model for Information Retrieval. In: Conference on Information and Knowledge Management (CIKM 09). Hong-Kong, China (2009) 2. Hafsi, M., Géry, M., Beigbeder, M.: LaHC at INEX 2014: Social Book Search Track. In: Working Notes for CLEF 2014 Conference. pp. 514 520 (2014) 3. Koolen, M., Bogers, T., Kamps, J., Kazai, G., Preminger, M.: Overview of the INEX 2014 Social Book Search Track. In: Working Notes for CLEF 2014 Conference. pp. 462 479 (2014) 4. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: SIGIR Workshop on Open Source Information Retrieval (OSIR 06) (2006) 5. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Conference on Information and Knowledge Management. pp. 42 49. CIKM 04, New York, NY, USA (2004)