Learning to Reweight Terms with Distributed Representations
|
|
- Garry Lane
- 5 years ago
- Views:
Transcription
1 Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215
2 Outline Goal: Assign weights to query terms for better retrieval results Motivation Method Evaluation Conclusions
3 Motivation Assigning weights to terms is not a new problem score(q, d) = t q s(t, q, C)f(t, d) (1) Brief summary of existing representative methods: Type Method Performance Features Model Constant bad - - Unsupervised IDF average - - Supervised LtR-methods manual, good non-convex (Rank measure) (WSD) external Supervised Zhao et al average manual requires SVD (Term recall) Can we have something that is directly term weight supervised, can lead to good retrieval performance without manual / expensive feature construction?
4 Probably yes? What might be a good choice of term weights? Term recall The probability of term t of query q appears in relevant documents R q, can be estimated as Rq,t R q if relevance judgments are available True term recall proves to be highly effective as shown by Zhao et al[3, 2] But which features are good to predict that?
5 Distributed word vectors Distributed word vectors learnt from a large corpora can be used to effectively measure semantic similarity vec(king) vec(man) vec(queen) vec(woman) Another important characteristic of word vector is the addition property, i.e., it s often semantically meaningful to construct word vectors for phrases from those of individual terms, such as representing vec(chinese river) vec(chinese) + vec(river)
6 Idea in this work Intuitively, term recall measures the relative importance of a term w.r.t the whole query in determining document-query relevance With the addition property of word vectors, we can construct feature vectors for terms from their word vector representations, which resembles the above intuition of term recall For a term t in query q, we construct the feature vector x(t) given the word vector representation w(t) as x(t) vec(t) vec(q) where vec(q) is the average word vector of all the terms in q. The above defined feature vector is essentially the offset vector of a term to the center of the entire query
7 Feature vector illustration Figure: Demonstration of transforming global 2-dimensional word vectors into feature vectors local to the query.
8 Term recall learning with feature vectors Model term recall as a function of the feature vectors defined Preprocessing: transform term recall r from [,1] to y R by a logit function y = logit(r) Models the transformed term recall y as a linear function of the feature vectors x Employ l 1 norm regularization in learning (l 2 norm also tried) LASSO optimization: 1 ˆβ = arg min β R p 2 M n i (y ij β x ij ) 2 + λ β 1 (2) i=1 j=1 Predict term recall for testing query term with feature vector x as ( ) P(t R) = sigmoid ˆβ x = exp{ ˆβ x } 1 + exp{ ˆβ x } (3)
9 Incorporating predicted weights to query models (I) Base query models: Bag-of-Words (BOW): apple pie recipe Sequential Dependency model (SD): #weight(.8 #combine( apple pie recipe ).1 #combine( #1(apple pie) #1(pie recipe) ).1 #combine( #uw8(apple pie) #uw8(pie recipe) ) )
10 Incorporating predicted weights to query models (II) Term recall enhanced query models: DeepTR-BOW: #weight( P(apple R) apple P(pie R) pie P(recipe R) recipe ) DeepTR-SD: #weight(.8 #weight( P(apple R) apple P(pie R) pie P(recipe R) recipe ).1 #combine(#1(apple pie) #1(pie recipe) ).1 #combine( #uw8(apple pie) #uw8(pie recipe) ) )
11 Evaluation Data sets Collection # Docs # Words TREC Topics ROBUST4 528K 253M , 61-7 WT1g 1,692K 1,76M GOV2 25,25K 24,7M ClueWeb9B 29,38K 23,89M 1-2 Baseline query models: Bag of Words (BOW), Sequential Dependency model (SD), Weighted Sequential Dependency model (WSD) [1] Retrieval models: Language model and BM25 Evaluation metrics: P@1, MAP for all collections, NDCG@1 and ERR@2 for collections with graded relevance judgments
12 Evaluation Word vector source: trained from all above collections with dimensions ranging from 1, 3, 5 pre-trained by Google trained from Google News (1 billion words) with dimension 3
13 Experimental results Retrieval results of DeepTR-BOW with language model (dimension=3) Query Model ROBUST4 WT1g GOV2 ClueWeb9B MAP MAP MAP MAP BOW SD WSD DeepTR-BOW.2591 b b.718 (Corpus) (+3.2/-1.9) (+8.2/+3.5) (+6.3/-1.6) (+2.2/-3.6) DeepTR-BOW.265 b.2111 b.2646 b.741 (GOV2) (+5.5/+.3) (+8.7/+3.9) (+6.3/-1.6) (+5.6/-.5) DeepTR-BOW.2657 b b.718 (ClueWeb9B) (+5.8/+.5) (+9.6/+4.8) (+7.9/-.1) (+2.2/-3.6) DeepTR-BOW.2673 b.2122 b.263 b.732 (Google) (+6.4/+1.2) (+9.3/+4.5) (+5.7/-2.2) (+4.2/-1.8) b : Statistically significant difference with BOW s : Statistically significant difference with SD Weighted BoW queries significantly outperforms unweigted BoW queries and are even comparable to SD queries.
14 Experimental results (cont.) Retrieval results of DeepTR-SD with language model (dimension=3) Query Model ROBUST4 WT1g GOV2 ClueWeb9B MAP MAP MAP MAP BOW SD WSD DeepTR-SD.2754 b s.2182 b s.2831 b s.748 (Corpus) (+9.6/+4.2) (+12.3/+7.4) (+13.8/+5.3) (+6.5/+.5) DeepTR-SD.2781 b s.2223 b s.2831 b s.86 b s (GOV2) (+1.7/+5.2) (+14.4/+9.4) (+13.8/+5.3) (+14.8/+8.2) DeepTR-SD.281 b s.2279 b s.289 b s.748 (ClueWeb9B) (+11.9/+6.3) (+17.3/+12.1) (+16.2/+7.5) (+6.5/+.5) DeepTR-SD.2842 b s.2256 b s.2814 b s.78 b s (Google) (+13.1/+7.5) (+16.1/+11.) (+13.1/+4.7) (+11./+4.7) b : Statistically significant difference with BOW s : Statistically significant difference with SD DeepTR-SD significantly outperforms both BoW and SD queries.
15 Experimental results (cont.) Retrieval results of DeepTR-BOW with BM25 (dimension=3) Query Model ROBUST4 WT1g GOV2 ClueWeb9B MAP MAP MAP MAP BOW SD DeepTR-BOW.2566 b.193 b.243 b.593 b (Corpus) (+4.3/+.8) (+6.7/+4.7) (+4.1/+.9) (+4.7/-1.2) DeepTR-BOW.2561 b.191 b.243 b.596 b (GOV2) (+4.1/+.6) (+7.1/+5.1) (+4.1/+.9) (+5.3/-.7) DeepTR-BOW.259 b.1932 b.2462 b.593 b (ClueWeb9B) (+5.3/+1.8) (+8.3/+6.3) (+5.5/+2.2) (+4.7/-1.2) DeepTR-BOW.26 b.1922 b.2449 b.584 (Google) (+5.7/+2.1) (+7.8/+5.7) (+4.9/+1.7) (+3.1/-2.7) b : Statistically significant difference with BOW s : Statistically significant difference with SD Similar results observed for BM25 retrieval.
16 Experimental results (cont.) Retrieval results of DeepTR-SD with BM25 (dimension=3) Query Model ROBUST4 WT1g GOV2 ClueWeb9B MAP MAP MAP MAP BOW SD DeepTR-SD.2595 b s.1944 b s.2478 b s.613 b s (Corpus) (+5.5/+1.9) (+9./+7.) (+6.1/+2.9) (+8.2/+2.1) DeepTR-SD.269 b s.1941 s.2478 b s.616 b s (GOV2) (+6.1/+2.5) (+8.8/+6.8) (+6.1/+2.9) (+8.8/+2.6) DeepTR-SD.263 b s.1951 b s.252 b s.613 b s (ClueWeb9B) (+6.9/+3.3) (+9.4/+7.3) (+7.2/+3.9) (+8.2/+2.1) DeepTR-SD.2627 b s.1938 s.2461 b.64 (Google) (+6.8/+3.2) (+8.7/+6.7) (+5.4/+2.2) (+6.8/+.7) b : Statistically significant difference with BOW s : Statistically significant difference with SD Similar results observed for BM25 retrieval.
17 Word vector dimensionality vectors trained from ClueWeb9B, Language Model retrieval Query Model ROBUST4 WT1g GOV2 ClueWeb9B MAP MAP MAP MAP BOW SD WSD DeepTR-BOW.2681 b.2239 b.2667 b.727 (1) (+6.7/+1.4) (+15.3/+1.2) (+7.2/-.8) (+3.6/-2.4) DeepTR-BOW.2657 b b.718 (3) (+5.8/+.5) (+9.6/+4.8) (+7.9/-.1) (+2.2/-3.6) DeepTR-BOW.2679 b.2179 b.2655 b.726 (5) (+6.6/+1.4) (+12.1/+7.2) (+6.7/-1.2) (+3.4/-2.5) DeepTR-SD.2851 b s.2373 b s.2848 b s.778 b s (1) (+13.5/+7.9) (+22.1/+16.8) (+14.4/+5.9) (+1.8/+4.5) DeepTR-SD.281 b s.2279 b s.289 b s.748 (3) (+11.9/+6.3) (+17.3/+12.1) (+16.2/+7.5) (+6.5/+.5) DeepTR-SD.2817 b s.236 b s.286 b s.781 b s (5) (+12.2/+6.6) (+18.7/+13.5) (+14.9/+6.4) (+11.3/+4.9) b : Statistically significant difference with BOW s : Statistically significant difference with SD d = 1 works slightly better
18 Experimental results (cont.) ROBUST BOW SD DeepTR SD (ROBUST4 3) DeepTR SD (WT1g 3) DeepTR SD (GOV2 3) DeepTR SD (ClueWeb9B 1) DeepTR SD (ClueWeb9B 3) DeepTR SD (ClueWeb9B 5) DeepTR SD (Google 3).25 MAP LM BM25
19 WT1g BOW SD DeepTR SD (ROBUST4 3) DeepTR SD (WT1g 3) DeepTR SD (GOV2 3) DeepTR SD (ClueWeb9B 1) DeepTR SD (ClueWeb9B 3) DeepTR SD (ClueWeb9B 5) DeepTR SD (Google 3) MAP LM BM25
20 GOV2 BOW SD DeepTR SD (ROBUST4 3) DeepTR SD (WT1g 3) DeepTR SD (GOV2 3) DeepTR SD (ClueWeb9B 1) DeepTR SD (ClueWeb9B 3) DeepTR SD (ClueWeb9B 5) DeepTR SD (Google 3) MAP LM BM25
21 ClueWeb9B.1 BOW SD DeepTR SD (ROBUST4 3) DeepTR SD (WT1g 3) DeepTR SD (GOV2 3) DeepTR SD (ClueWeb9B 1) DeepTR SD (ClueWeb9B 3) DeepTR SD (ClueWeb9B 5) DeepTR SD (Google 3) MAP.5 LM BM25
22 Robustness Number of queries ROBUST4 SD DeepTR SD (trec7) DeepTR SD (wt1g) DeepTR SD (gov2) DeepTR SD (clueweb1) DeepTR SD (clueweb3) DeepTR SD (clueweb5) DeepTR SD (google3) 2 1 < 1% [ 1%, 8%) [ 8%, 6%) [ 6%, 4%) [ 4%, 2%) [ 2%, ) (, 2%) [2%, 4%) [4%, 6%) [6%, 8%) [8%, 1%] >1% Relative Learning gain to Reweight Terms with Distributed Representations
23 Number of queries WT1g SD DeepTR SD (trec7) DeepTR SD (wt1g) DeepTR SD (gov2) DeepTR SD (clueweb1) DeepTR SD (clueweb3) DeepTR SD (clueweb5) DeepTR SD (google3) 1 5 < 1% [ 1%, 8%) [ 8%, 6%) [ 6%, 4%) [ 4%, 2%) [ 2%, ) Relative gain (, 2%) [2%, 4%) [4%, 6%) [6%, 8%) [8%, 1%] >1%
24 Number of queries GOV2 SD DeepTR SD (trec7) DeepTR SD (wt1g) DeepTR SD (gov2) DeepTR SD (clueweb1) DeepTR SD (clueweb3) DeepTR SD (clueweb5) DeepTR SD (google3) 1 < 1% [ 1%, 8%) [ 8%, 6%) [ 6%, 4%) [ 4%, 2%) [ 2%, ) Relative gain (, 2%) [2%, 4%) [4%, 6%) [6%, 8%) [8%, 1%] >1%
25 Number of queries ClueWeb9B SD DeepTR SD (trec7) DeepTR SD (wt1g) DeepTR SD (gov2) DeepTR SD (clueweb1) DeepTR SD (clueweb3) DeepTR SD (clueweb5) DeepTR SD (google3) 1 5 < 1% [ 1%, 8%) [ 8%, 6%) [ 6%, 4%) [ 4%, 2%) [ 2%, ) Relative gain (, 2%) [2%, 4%) [4%, 6%) [6%, 8%) [8%, 1%] >1%
26 Query length Relative improvement over LM Relative improvement over LM ROBUST4 SD DeepTR SD Len 3 (7%) Len 4 5 (27%) Len 6+ (66%) GOV2 SD DeepTR SD Relative improvement over LM Relative improvement over LM WT1g SD DeepTR SD Len 3 (26%) Len 4 5 (39%) Len 6+ (35%) ClueWeb9B SD DeepTR SD Len 3 (11%) Len 4 5 (41%) Len 6+ (48%) Len 3 (26%) Len 4 5 (4%) Len 6+ (34%)
27 Graded relevance Query Model Language Model Retrieval GOV2 ClueWeb9B BOW SD DeepTR-SD.4121 b.1626 b.1743 b.1312 s (GOV2) DeepTR-SD.4146 b.1614 b (ClueWeb9B) DeepTR-SD.4112 b (Google) Query Model BM25 Retrieval GOV2 ClueWeb9B NDCG@2 ERR@2 NDCG@2 ERR@2 BOW SD DeepTR-SD.48 b s.159 b s (GOV2) DeepTR-SD.433 b s.1587 b s (ClueWeb9B) DeepTR-SD.41 b s.1586 s (Google) b : Statistically significant difference with BOW s : Statistically significant difference with SD
28 Conclusions and future work Novel (simple) framework for learning term weightes from features directly constructed from distributed word vectors Extensive experiments with two retrieval models on four test collections demonstrates the effectiveness Completing the table: Type Method Performance Features Model Unsupervised Constant bad - - IDF average - - Supervised LtR-methods manual, good (Rank measure) (WSD) external non-convex Supervised (Term recall) Zhao et al average manual requires SVD Supervised Proposed work good automatic convex, cheap (Term recall) Future directions include predicting term recalls for bigrams and proximity terms, other possible ways to construct feature vectors from distributed vectors, using word vectors for query expansion, theoretical justifications, etc.
29 Acknowledgements and QA Reproducibility: codes, preprocessed queries, experimental setups are available Contact: Thanks to SIGIR for the student travel grant Questions?
A Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationEstimating Embedding Vectors for Queries
Estimating Embedding Vectors for Queries Hamed Zamani Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 zamani@cs.umass.edu
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationModern Retrieval Evaluations. Hongning Wang
Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance
More informationInformativeness for Adhoc IR Evaluation:
Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationNortheastern University in TREC 2009 Million Query Track
Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationFrom Neural Re-Ranking to Neural Ranking:
From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing Hamed Zamani (1), Mostafa Dehghani (2), W. Bruce Croft (1), Erik Learned-Miller (1), and Jaap Kamps (2)
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationLearning Dense Models of Query Similarity from User Click Logs
Learning Dense Models of Query Similarity from User Click Logs Fabio De Bona, Stefan Riezler*, Keith Hall, Massi Ciaramita, Amac Herdagdelen, Maria Holmqvist Google Research, Zürich *Dept. of Computational
More informationA Fixed-Point Method for Weighting Terms in Verbose Informational Queries
A Fixed-Point Method for Weighting Terms in Verbose Informational Queries ABSTRACT Jiaul H. Paik UMIACS University of Maryland College Park, MD 20742 USA jiaul@umd.edu The term weighting and document ranking
More informationLearning Concept Importance Using a Weighted Dependence Model
Learning Concept Importance Using a Weighted Dependence Model Michael Bendersky Dept. of Computer Science University of Massachusetts Amherst, MA bemike@cs.umass.edu Donald Metzler Yahoo! Labs 4401 Great
More informationTREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback
RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationFall Lecture 16: Learning-to-rank
Fall 2016 CS646: Information Retrieval Lecture 16: Learning-to-rank Jiepu Jiang University of Massachusetts Amherst 2016/11/2 Credit: some materials are from Christopher D. Manning, James Allan, and Honglin
More informationModeling Rich Interac1ons in Session Search Georgetown University at TREC 2014 Session Track
Modeling Rich Interac1ons in Session Search Georgetown University at TREC 2014 Session Track Jiyun Luo, Xuchu Dong and Grace Hui Yang Department of Computer Science Georgetown University Introduc:on Session
More informationMicrosoft Research Asia at the Web Track of TREC 2009
Microsoft Research Asia at the Web Track of TREC 2009 Zhicheng Dou, Kun Chen, Ruihua Song, Yunxiao Ma, Shuming Shi, and Ji-Rong Wen Microsoft Research Asia, Xi an Jiongtong University {zhichdou, rsong,
More informationMidterm Exam Search Engines ( / ) October 20, 2015
Student Name: Andrew ID: Seat Number: Midterm Exam Search Engines (11-442 / 11-642) October 20, 2015 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationEnd-to-End Neural Ad-hoc Ranking with Kernel Pooling
End-to-End Neural Ad-hoc Ranking with Kernel Pooling Chenyan Xiong 1,Zhuyun Dai 1, Jamie Callan 1, Zhiyuan Liu, and Russell Power 3 1 :Language Technologies Institute, Carnegie Mellon University :Tsinghua
More informationTerm Frequency Normalisation Tuning for BM25 and DFR Models
Term Frequency Normalisation Tuning for BM25 and DFR Models Ben He and Iadh Ounis Department of Computing Science University of Glasgow United Kingdom Abstract. The term frequency normalisation parameter
More informationTowards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling
Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tie-Yan Liu* Carnegie Mellon University & Microsoft Research* 1
More informationWord Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas
Word Embeddings in Search Engines, Quality Evaluation Eneko Pinzolas Neural Networks are widely used with high rate of success. But can we reproduce those results in IR? Motivation State of the art for
More informationImproving Patent Search by Search Result Diversification
Improving Patent Search by Search Result Diversification Youngho Kim University of Massachusetts Amherst yhkim@cs.umass.edu W. Bruce Croft University of Massachusetts Amherst croft@cs.umass.edu ABSTRACT
More informationAttentive Neural Architecture for Ad-hoc Structured Document Retrieval
Attentive Neural Architecture for Ad-hoc Structured Document Retrieval Saeid Balaneshin 1 Alexander Kotov 1 Fedor Nikolaev 1,2 1 Textual Data Analytics Lab, Department of Computer Science, Wayne State
More informationWebSci and Learning to Rank for IR
WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles
More informationUniversity of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015
University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Federated Search Prof. Chris Clifton 13 November 2017 Federated Search Outline Introduction to federated search Main research problems Resource Representation
More informationarxiv: v1 [cs.ir] 16 Oct 2017
DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, Xueqi Cheng pl8787@gmail.com,{lanyanyan,guojiafeng,junxu,cxq}@ict.ac.cn,xujingfang@sogou-inc.com
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"
CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationFractional Similarity : Cross-lingual Feature Selection for Search
: Cross-lingual Feature Selection for Search Jagadeesh Jagarlamudi University of Maryland, College Park, USA Joint work with Paul N. Bennett Microsoft Research, Redmond, USA Using All the Data Existing
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationWindow Extraction for Information Retrieval
Window Extraction for Information Retrieval Samuel Huston Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA, 01002, USA sjh@cs.umass.edu W. Bruce Croft Center
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationExploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge
Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science
More informationRepresentation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval
Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu 12, Jianfeng Gao 1, Xiaodong He 1 Li Deng 1, Kevin Duh 2, Ye-Yi Wang 1 1
More informationCS646 (Fall 2016) Homework 1
CS646 (Fall 2016) Homework 1 Deadline: 11:59pm, Sep 28th, 2016 (EST) Access the following resources before you start working on HW1: Download the corpus file on Moodle: acm corpus.gz (about 90 MB). Check
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationPseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval
Pseudo-Relevance Feedback and Title Re-Ranking Chinese Inmation Retrieval Robert W.P. Luk Department of Computing The Hong Kong Polytechnic University Email: csrluk@comp.polyu.edu.hk K.F. Wong Dept. Systems
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information
More informationSupervised Hashing for Image Retrieval via Image Representation Learning
Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationRobust Relevance-Based Language Models
Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new
More informationInformation Retrieval
Information Retrieval Learning to Rank Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing Data
More informationSearch Engines Chapter 8 Evaluating Search Engines Felix Naumann
Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition
More informationRetrieval and Feedback Models for Blog Distillation
Retrieval and Feedback Models for Blog Distillation CMU at the TREC 2007 Blog Track Jonathan Elsas, Jaime Arguello, Jamie Callan, Jaime Carbonell CMU s Blog Distillation Focus Two Research Questions: What
More informationover Multi Label Images
IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,
More informationRetrieval and Feedback Models for Blog Distillation
Retrieval and Feedback Models for Blog Distillation Jonathan Elsas, Jaime Arguello, Jamie Callan, Jaime Carbonell Language Technologies Institute, School of Computer Science, Carnegie Mellon University
More informationAn Exploration of Query Term Deletion
An Exploration of Query Term Deletion Hao Wu and Hui Fang University of Delaware, Newark DE 19716, USA haowu@ece.udel.edu, hfang@ece.udel.edu Abstract. Many search users fail to formulate queries that
More informationCS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationOn the automatic classification of app reviews
Requirements Eng (2016) 21:311 331 DOI 10.1007/s00766-016-0251-9 RE 2015 On the automatic classification of app reviews Walid Maalej 1 Zijad Kurtanović 1 Hadeer Nabil 2 Christoph Stanik 1 Walid: please
More informationOptimized Top-K Processing with Global Page Scores on Block-Max Indexes
Optimized Top-K Processing with Global Page Scores on Block-Max Indexes Dongdong Shan 1 Shuai Ding 2 Jing He 1 Hongfei Yan 1 Xiaoming Li 1 Peking University, Beijing, China 1 Polytechnic Institute of NYU,
More informationModeling Higher-Order Term Dependencies in Information Retrieval using Query Hypergraphs
Modeling Higher-Order Term Dependencies in Information Retrieval using Query Hypergraphs ABSTRACT Michael Bendersky Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA bemike@cs.umass.edu
More informationQuery Phrase Expansion using Wikipedia for Patent Class Search
Query Phrase Expansion using Wikipedia for Patent Class Search 1 Bashar Al-Shboul, Sung-Hyon Myaeng Korea Advanced Institute of Science and Technology (KAIST) December 19 th, 2011 AIRS 11, Dubai, UAE OUTLINE
More informationAComparisonofRetrievalModelsusingTerm Dependencies
AComparisonofRetrievalModelsusingTerm Dependencies Samuel Huston and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts Amherst 140 Governors Dr, Amherst, Massachusetts,
More informationQuery Likelihood with Negative Query Generation
Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer
More informationSemantic Matching by Non-Linear Word Transportation for Information Retrieval
Semantic Matching by Non-Linear Word Transportation for Information Retrieval Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft CAS Key Lab of Network Data Science and Technology, Institute of Computing
More informationLearning-based Neuroimage Registration
Learning-based Neuroimage Registration Leonid Teverovskiy and Yanxi Liu 1 October 2004 CMU-CALD-04-108, CMU-RI-TR-04-59 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract
More informationWelcome to the class of Web Information Retrieval!
Welcome to the class of Web Information Retrieval! Tee Time Topic Augmented Reality and Google Glass By Ali Abbasi Challenges in Web Search Engines Min ZHANG z-m@tsinghua.edu.cn April 13, 2012 Challenges
More informationFederated Text Search
CS54701 Federated Text Search Luo Si Department of Computer Science Purdue University Abstract Outline Introduction to federated search Main research problems Resource Representation Resource Selection
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationOn Duplicate Results in a Search Session
On Duplicate Results in a Search Session Jiepu Jiang Daqing He Shuguang Han School of Information Sciences University of Pittsburgh jiepu.jiang@gmail.com dah44@pitt.edu shh69@pitt.edu ABSTRACT In this
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationICTNET at Web Track 2010 Diversity Task
ICTNET at Web Track 2010 Diversity Task Yuanhai Xue 1,2, Zeying Peng 1,2, Xiaoming Yu 1, Yue Liu 1, Hongbo Xu 1, Xueqi Cheng 1 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Federated Search 10 March 2016 Prof. Chris Clifton Outline Federated Search Introduction to federated search Main research problems Resource Representation Resource Selection
More informationInverted List Caching for Topical Index Shards
Inverted List Caching for Topical Index Shards Zhuyun Dai and Jamie Callan Language Technologies Institute, Carnegie Mellon University {zhuyund, callan}@cs.cmu.edu Abstract. Selective search is a distributed
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationImproving Document Ranking for Long Queries with Nested Query Segmentation
Improving Document Ranking for Long Queries with Nested Query Segmentation Rishiraj Saha Roy 1, Anusha Suresh 2, Niloy Ganguly 2, and Monojit Choudhury 3 1 Max Planck Institute for Informatics, Saarbrücken,
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationDocument Clustering Using Locality Preserving Indexing
Document Clustering Using Locality Preserving Indexing Deng Cai Department of Computer Science University of Illinois at Urbana Champaign 1334 Siebel Center, 201 N. Goodwin Ave, Urbana, IL 61801, USA Phone:
More informationThe University of Amsterdam at the CLEF 2008 Domain Specific Track
The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl
More informationA RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH
A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements
More informationNTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task
NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task Chieh-Jen Wang, Yung-Wei Lin, *Ming-Feng Tsai and Hsin-Hsi Chen Department of Computer Science and Information Engineering,
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationPractical Relevance Ranking for 10 Million Books.
Practical Relevance Ranking for 10 Million Books. Tom Burton-West Digital Library Production Service, University of Michigan Library, Ann Arbor, Michigan, US tburtonw@umich.edu Abstract. In this paper
More informationRuslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007
SEMANIC HASHING Ruslan Salakhutdinov and Geoffrey Hinton University of oronto, Machine Learning Group IRGM orkshop July 2007 Existing Methods One of the most popular and widely used in practice algorithms
More informationAn Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments
An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments Hui Fang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Abstract In this paper, we report
More informationOverview of the NTCIR-13 OpenLiveQ Task
Overview of the NTCIR-13 OpenLiveQ Task ABSTRACT Makoto P. Kato Kyoto University mpkato@acm.org Akiomi Nishida Yahoo Japan Corporation anishida@yahoo-corp.jp This is an overview of the NTCIR-13 OpenLiveQ
More informationAutomatic Term Mismatch Diagnosis for Selective Query Expansion
Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao Language Technologies Institute Carnegie Mellon University Pittsburgh, PA, USA lezhao@cs.cmu.edu Jamie Callan Language Technologies
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationOnline Expansion of Rare Queries for Sponsored Search
Online Expansion of Rare Queries for Sponsored Search Peter Ciccolo, Evgeniy Gabrilovich, Vanja Josifovski, Don Metzler, Lance Riedel, Jeff Yuan Yahoo! Research 1 Sponsored Search 2 Sponsored Search in
More informationAnalytic Knowledge Discovery Models for Information Retrieval and Text Summarization
Analytic Knowledge Discovery Models for Information Retrieval and Text Summarization Pawan Goyal Team Sanskrit INRIA Paris Rocquencourt Pawan Goyal (http://www.inria.fr/) Analytic Knowledge Discovery Models
More informationRMIT University at TREC 2006: Terabyte Track
RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction
More informationResearch Article Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach
Computational Intelligence and Neuroscience Volume 215, Article ID 568197, 13 pages http://dx.doi.org/1.1155/215/568197 Research Article Relevance Feedback Based Query Expansion Model Using Borda Count
More informationWITHIN-DOCUMENT TERM-BASED INDEX PRUNING TECHNIQUES WITH STATISTICAL HYPOTHESIS TESTING. Sree Lekha Thota
WITHIN-DOCUMENT TERM-BASED INDEX PRUNING TECHNIQUES WITH STATISTICAL HYPOTHESIS TESTING by Sree Lekha Thota A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the
More informationJames Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!
James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation
More information