Book Recommendation based on Social Information

Similar documents
Verbose Query Reduction by Learning to Rank for Social Book Search Track

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

RSLIS at INEX 2012: Social Book Search Track

UMass at TREC 2017 Common Core Track

Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization

Reducing Redundancy with Anchor Text and Spam Priors

INEX REPORT. Report on INEX 2012

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

INEX REPORT. Report on INEX 2011

CADIAL Search Engine at INEX

University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks

Multimodal Social Book Search

On Duplicate Results in a Search Session

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

LaHC at CLEF 2015 SBS Lab

Experiments with ClueWeb09: Relevance Feedback and Web Tracks

Robust Relevance-Based Language Models

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

INEX 2011 Workshop Pre-proceedings

From Passages into Elements in XML Retrieval

Comparative Analysis of Clicks and Judgments for IR Evaluation

INEX REPORT. Report on INEX 2008

KNOW At The Social Book Search Lab 2016 Suggestion Track

External Query Reformulation for Text-based Image Retrieval

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Retrieval and Feedback Models for Blog Distillation

Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books

On Duplicate Results in a Search Session

Effective metadata for social book search from a user perspective Huurdeman, H.C.; Kamps, J.; Koolen, M.H.A.

RMIT University at TREC 2006: Terabyte Track

INEX REPORT. Report on INEX 2010

Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu

INEX REPORT. Report on INEX 2013

UMass at TREC 2006: Enterprise Track

Processing Structural Constraints

Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?

Seven years of INEX interactive retrieval experiments lessons and challenges

Window Extraction for Information Retrieval

Aalborg Universitet. Tagging vs. Controlled Vocabulary Which is More Helpful for Book Search? Bogers, Antonius Marinus; Petras, Vivien

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Combining Language Models with NLP and Interactive Query Expansion.

The Interpretation of CAS

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

Indri at TREC 2005: Terabyte Track (Notebook Version)

Report on the SIGIR 2008 Workshop on Focused Retrieval

Overview of the INEX 2008 Ad Hoc Track

ISART: A Generic Framework for Searching Books with Social Information

Modeling Query Term Dependencies in Information Retrieval with Markov Random Fields

Overview of the ICDAR 2013 Competition on Book Structure Extraction

Focused Retrieval Using Topical Language and Structure

The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage

Document and Query Expansion Models for Blog Distillation

Overview of the INEX 2007 Book Search Track (BookSearch 07)

AComparisonofRetrievalModelsusingTerm Dependencies

ResPubliQA 2010

Informativeness for Adhoc IR Evaluation:

Language Modeling Based Local Set Re-ranking using Manual Relevance Feedback

Overview of the INEX 2009 Link the Wiki Track

Search Engines. Information Retrieval in Practice

TREC 2017 Dynamic Domain Track Overview

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT)

Overview of the INEX 2007 Book Search Track (BookSearch 07)

Real-time Query Expansion in Relevance Models

Practical Relevance Ranking for 10 Million Books.

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags

A Deep Relevance Matching Model for Ad-hoc Retrieval

Entity-Based Information Retrieval System: A Graph-Based Document and Query Model

The University of Illinois Graduate School of Library and Information Science at TREC 2011

Overview of the INEX 2008 Ad Hoc Track

Wikipedia Pages as Entry Points for Book Search

This is an author-deposited version published in : Eprints ID : 13273

Component ranking and Automatic Query Refinement for XML Retrieval

Prior Art Retrieval Using Various Patent Document Fields Contents

Indri at TREC 2004: Terabyte Track

University of TREC 2009: Indexing half a billion web pages

Phrase Detection in the Wikipedia

Estimating Embedding Vectors for Queries

Query Reformulation for Clinical Decision Support Search

Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India

Indri at TREC 2005: Terabyte Track

INLS W: Information Retrieval Systems Design and Implementation. Fall 2009.

Enhanced Information Retrieval Using Domain- Specific Recommender Models

Evaluation of Automatically Assigned MeSH Terms for Retrieval of Medical Images

Charles University at CLEF 2007 CL-SR Track

Formulating XML-IR Queries

A New Measure of the Cluster Hypothesis

Edit Distance for XML Information Retrieval : Some Experiments on the Datacentric Track of INEX 2011

Chapter 6: Information Retrieval and Web Search. An introduction

Improving Difficult Queries by Leveraging Clusters in Term Graph

Overview of the NTCIR-13 OpenLiveQ Task

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Conceptual Language Models for Context-Aware Text Retrieval

The University of Amsterdam at INEX 2008: Ad Hoc, Book, Entity Ranking, Interactive, Link the Wiki, and XML Mining Tracks

A Query Expansion Method based on a Weighted Word Pairs Approach

Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track

DCU and 2010: Ad-hoc and Data-Centric tracks

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary

Structural Features in Content Oriented XML retrieval

Using XML Logical Structure to Retrieve (Multimedia) Objects

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Transcription:

Book Recommendation based on Social Information Chahinez Benkoussas and Patrice Bellot LSIS Aix-Marseille University chahinez.benkoussas@lsis.org patrice.bellot@lsis.org Abstract : In this paper, we present our contribution in INEX 2013 Social Book Search Track. This track aim to explore social information (users reviews, ratings, etc...) for the librarything and Amazon collections of real books. In our submissions for SBSTrack, we rerank books by combining the Sequential Dependence Model (SDM) and the use of social component that takes into account both ratings and helpful votes. Keywords : XML retrieval, controlled metadata, book recommendation, re-ranking. 1 Introduction Previous editions of the INEX Book Track focused on the retrieval of real out- of-copyright books [1]. These books were written almost a century ago and the collection consisted of the OCR content of over 50, 000 books. The topics and the books of the collection have a different vocabulary and writing style. Information Retrieval systems had difficulties to found relevant information, and assessors had difficulties judging the documents. The document collection is composed of the Amazon 1 pages of real books. IR must search through editorial data, user reviews and ratings for each book, instead of searching through the whole content of the book. The topics were extracted from LibraryThing 2 forums and represent real request from real users. We have chosen to use a Language Modeling approach to retrieval. For our recommendation runs, we used the reviews and the ratings attributed to books by Amazon users. We computed a social score for 1 http://www.amazon.com/ 2 http://www.librarything.com/ 1

each book, considering the amount of reviews and the ratings. This score was then interpolated with scores obtained by a Marcov Random Field (MRF) baseline. We also used the helpfulvotes and totalvotes values for each rating given by users to modify the ranking obtained by the combination of social and MRF scores. The rest of the paper is organized as follows. The following Section gives an insight into the document collection whereas 2 describes the our retrieval framework. Finally, we describe our runs in 3. 2 Retrieval model 2.1 Sequential Dependence Model We used a language modeling approach to retrieval [2]. We use Metzler and Croft s Markov Random Field (MRF) model [3] to integrate multi word phrases in the query. Specifically, we use the Sequential Dependence Model (SDM), which is a special case of MRF. In this model three features are considered: single term features (standard unigram language model features, f T ), exact phrase features (words appearing in sequence, f O ) and unordered window features (require words to be close together, but not necessarily in an exact sequence order, f U ). Finally, documents are ranked according to the following scoring function: SDM(Q, D) = λ T f T (q, D) q Q Q 1 + λ O i=1 Q 1 + λ U i=1 f O (q i, q i+1, D) f U (q i, q i+1, D) where the feature weights are set according to the author s recommendation (λ T = 0.85, λ O = 0.1, λ U = 0.05). f T, f O and f U are the log maximum likelihood estimates of query terms in document D, computed over the target collection with a Dirichlet smoothing. 2

2.2 Modeling book likeliness We modeled book likeliness basing on the following idea: more the number of reviews it has, more interesting it is reading it (it may not be a good or popular book but a book that has a high impact). r R Likeliness(D) = D r Reviews D where R D is the set of all ratings given by the users for the book D, and Reviews D is the number of reviews. We further rerank books according to a linear interpolation of the previously computed SDM score with the likeliness score, using a coefficient (α) to control the influence of each model. The scoring function of a book D given a query Q is thus defined as follows: SDM Likeliness(Q, D) = α (SDM(Q, D)) + (1 α) (Likeliness(D)) where α is a constant set according to previous results (done on 2011 and 2012 datasets), with a default value of 0,89. 2.3 Modeling usefulness of ratings books Into the collection of books, we have a rating for each review given by users, the rating value can or cannot be useful depending on user votes. we have chosen to weight the value of rating with the value of helpful votes according to this formula: r R Usefulness(D) = D,t T D,h H D r ( t h ) Reviews D where R D, T D, H D are respectively, the sets of all ratings, totalvotes and helpfulvotes given by the users for the book D, and Reviews D is the number of reviews. We further rerank books according to a linear interpolation of the previously computed SDM score with the usefulness score, using a coefficient (β) to control the influence of each model. The scoring function of a book D given a query Q is thus defined as follows: SDM U sef ulness(q, D) = β (SDM(Q, D))+(1 β) (U sef ulness(d)) 3

where β is a constant set according to previous results (done on 2011 and 2012 datasets), with a default value of 0,93. 3 Run We submitted 3 runs for the Social Book Search Task. We used Indri 3 for indexing and searching. We did not remove any stopword and used the standard Krovetz stemmer. Only query part of the topic has been used for the three runs. SDM run : This run is the implementation of the Sequential Dependence Model (SDM) described in Section 2.1. SDM Rating run : This run combine the implementation of the Sequential Dependence Model and the use of social information which is the Ratings given by users. Description is given in Section 2.2 SDM HV run : For the last run we combine the implementation of the Sequential Dependence Model and the use of social information which are ratings, helpful votes and total votes given by users. We weighted the value of rating with the rate of helpful votes as presented in Section 2.3. 4 Conclusion This paper presents our contributions on the INEX 2013 Social Book Search Track. We proposed a simple method for reranking books based on their likeliness and an effective way to take into account user helpful votes. Finally we combine both methods with a linear interpolated function. We are disappointed with the official results this year (ndcg@10 = 0.0571 for the baseline SDM run ) compared to those obtained last year with the the approach that was our baseline this year (ndcg@10 = 0,1295) and we seek for explanations of a software problem. On the other side, the proposed extensions (ndcg@10 = 0.596 for SDM Rating run and ndcg@10 = 0.0576 for SDM HV run ) improved the results of the baseline. 3 http://www.lemurproject.org/ 4

5 Acknowledgements This work was supported by the French program Investissements d Avenir - Développement de l Economie Numérique under the project Inter- Textes #O14751-408983. References [1] Gabriella Kazai, Marijn Koolen, Antoine Doucet, and Monica Landoni. Overview of the INEX 2010 Book Track: At the Mercy of Crowdsourcing. In Shlomo Geva, Jaap Kamps, Ralf Schenkel, and Andrew Trotman, editors, Comparative Evaluation of Focused Retrieval, pages 98 117. Springer Berlin / Heidelberg, 2011. [2] D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40:735 750, September 2004. [3] Donald Metzler and W. Bruce Croft. A markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 05, pages 472 479, New York, NY, USA, 2005. ACM. 5