Query Session Detection as a Cascade
|
|
- Roland Jennings
- 5 years ago
- Views:
Transcription
1 Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino Rüb Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de SIR 211 Dublin, Ireland April 18, 211 Hagen, Stein, Rüb Query Session Detection as a Cascade 1
2 It s quiz time! Introduction Motivation Hagen, Stein, Rüb Query Session Detection as a Cascade 2
3 It s quiz time! Introduction Motivation What is the user searching? paris hilton Hagen, Stein, Rüb Query Session Detection as a Cascade 2
4 Without context... Introduction Motivation paris hilton source: [ Hilton 3 Crop.jpg] Hagen, Stein, Rüb Query Session Detection as a Cascade 3
5 Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton Hagen, Stein, Rüb Query Session Detection as a Cascade 4
6 Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton sources: [ hotel paris 2.jpg] [ [ mk logo hiltonbrandlogo.jpg] Hagen, Stein, Rüb Query Session Detection as a Cascade 4
7 Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge Hagen, Stein, Rüb Query Session Detection as a Cascade 5
8 Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge The minor issue Users do not announce when querying for a new information need. Hagen, Stein, Rüb Query Session Detection as a Cascade 5
9 A typical query log Introduction Motivation User Query Click domain + Click rank Time 773 istanbul en.wikipedia.org :34: istanbul archeology :2: istanbul archeology :3: istanbul archeology :24:7 773 constantinople ::4 773 constantinople :1:2 773 hurling :3:1 773 hurling en.wikipedia.org :3:5 773 liam mccarthy cup :33:4 773 liam mccarthy cup :33: liam mccarthy cup starbets.ie :42:48 Hagen, Stein, Rüb Query Session Detection as a Cascade 6
10 Introduction Motivation How to determine the break points? User Query Click domain + Click rank Time 773 istanbul en.wikipedia.org :34: istanbul archeology :2: istanbul archeology :3: istanbul archeology :24:7 773 constantinople ::4 773 constantinople :1:2 773 hurling :3:1 773 hurling en.wikipedia.org :3:5 773 liam mccarthy cup :33:4 773 liam mccarthy cup :33: liam mccarthy cup starbets.ie :42:48 Hagen, Stein, Rüb Query Session Detection as a Cascade 7
11 The key is... Introduction The Problem Automatic query session detection Hagen, Stein, Rüb Query Session Detection as a Cascade 8
12 Introduction The Problem Automatic query session detection Usual technique Check for consecutive queries whether same/new information need. Example 773 istanbul :34:17 same 773 istanbul archeology :24:7 same 773 constantinople :1:2 new 773 hurling :3:5 Hagen, Stein, Rüb Query Session Detection as a Cascade 9
13 Typical features Introduction Related Work Temporal thresholds 5 minutes [Silverstein et al., 1999] 1 15 minutes [He and Göker, 2] 3 minutes [Downey et al., 27] user specific [Murray et al., 26] Lexical similarity n-gram overlap [Zhang and Moffat, 26] Levenshtein distance [Jones and Klinkner, 28] Semantic similarity Search results [Radlinski and Joachims, 25] ESA [Lucchese et al., 211] Hagen, Stein, Rüb Query Session Detection as a Cascade 1
14 Introduction Related Work Previous methods Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 29] Hagen, Stein, Rüb Query Session Detection as a Cascade 11
15 Previous methods Introduction Related Work Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 29] Shortcomings All features evaluated simultaneously runtime Geometric method ignores semantics accuracy Examples Subset test suffices hurling same hurling gaa Geometric method fails hurling same mccarthy cup Hagen, Stein, Rüb Query Session Detection as a Cascade 11
16 Cascading Method The Framework We address the shortcomings in a cascade... source: [ Hagen, Stein, Ru b Query Session Detection as a Cascade 12
17 Cascading Method The Framework... well... a small 4-step cascade source: [ Cascade 4 Tier GreenL.jpg] Hagen, Stein, Ru b Query Session Detection as a Cascade 13
18 Cascading Method The Framework... well... a small 4-step cascade Step 1: Subset tests & Step 2: Geometric method & Step 3: ESA similarity. Step 4: Search results source: [ Cascade 4 Tier GreenL.jpg] Basic Idea Increased feature cost (runtime) from step to step. Expensive features only if previous steps unreliable. Hagen, Stein, Ru b Query Session Detection as a Cascade 13
19 Cascading Method Simple string comparison Step 1: Subset tests Criterion Consecutive queries q and q in same session if q sub- or superset of q. Else: Goto Step 2. Example Remarks: Repetition, specialization, or generalization. Time gap = continuing a pending session. Repetition Specialization Generalization hurling same hurling same hurling gaa same hurling hurling gaa hurling Hagen, Stein, Rüb Query Session Detection as a Cascade 14
20 Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 29] For consecutive queries q and q f temp = maximum of and 1 t 24h t is time between q and q f lex = cosine similarity of 3- to 5-grams of q and s s is session of q Hagen, Stein, Rüb Query Session Detection as a Cascade 15
21 Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 29] For consecutive queries q and q f temp = maximum of and 1 t 24h t is time between q and q f lex = cosine similarity of 3- to 5-grams of q and s s is session of q 1. Criterion (original).8 Nearly identical queries at long temporal distance Same session Consecutive queries q and q in same session if ftemp 2 + f lex 2 1. Lexical similarity New session Different queries with no temporal distance Temporal similarity Hagen, Stein, Rüb Query Session Detection as a Cascade 15
22 Cascading Method Step 2: Geometric method Performs well on standard test corpus Lexical similarity.6.4 Lexical similarity Temporal similarity Temporal similarity Same session New session Hagen, Stein, Rüb Query Session Detection as a Cascade 16
23 Cascading Method Step 2: Geometric method... but has some problems on the edge Lexical similarity Temporal similarity Major problems Similar queries, time gap (upper left) Merely a matter of opinion Diff. queries, same semantics (lower right) Incorporate semantics Hagen, Stein, Rüb Query Session Detection as a Cascade 17
24 Cascading Method Step 2: Geometric method... but has some problems on the edge Lexical similarity Temporal similarity Major problems Similar queries, time gap (upper left) Merely a matter of opinion Diff. queries, same semantics (lower right) Incorporate semantics Criterion (adapted) Original geometric method if f temp <.8 or f lex >.4. Else: Goto Step 3. Hagen, Stein, Rüb Query Session Detection as a Cascade 17
25 Cascading Method Step 3: Explicit Semantic Analysis How ESA works [Gabrilovich and Markovitch, 27] Preprocessing tf idf -weighted inverted index of Wikipedia articles term-document matrix M For consecutive queries q and q f esa = cosine similarity of M T q and M T s s is session of q Criterion Consecutive queries q and q in same session if f esa.35. Else: Goto Step 4. Hagen, Stein, Rüb Query Session Detection as a Cascade 18
26 Cascading Method Step 4: Search results Even more semantics Idea Enrich the short query strings with the results of some web search engine. Criterion Consecutive queries q and q in same session iff they share at least one of the top 1 search results. Hagen, Stein, Rüb Query Session Detection as a Cascade 19
27 Cascading Method Step 4: Search results Even more semantics Idea Enrich the short query strings with the results of some web search engine. Criterion Consecutive queries q and q in same session iff they share at least one of the top 1 search results. Remark If q and q share no top 1 result, decision should be not sure. Hagen, Stein, Rüb Query Session Detection as a Cascade 19
28 Cascading Method Experimental Results That s the complete cascade Step 1: Subset tests & Step 2: Geometric method & Step 3: ESA similarity. Step 4: Search results source: [ Cascade 4 Tier GreenL.jpg] Hagen, Stein, Ru b Query Session Detection as a Cascade 2
29 Cascading Method Experimental Results That s the complete cascade Step 1: Subset tests & Step 2: Geometric method & Step 3: ESA similarity. Step 4: Search results source: [ Cascade 4 Tier GreenL.jpg] What about accuracy and performance? Hagen, Stein, Ru b Query Session Detection as a Cascade 2
30 Accuracy and runtime Cascading Method Experimental Results Accuracy on Gayo-Avello s corpus (11 queries, 2.7 per session) Precision Recall F-Measure (β = 1.5) Geometric Cascading Performance per step on Gayo-Avello s corpus affected F-Measure time factor Step % ms 1. Step % ms 2.5 Step 3 2.5% ms 3.4 Step 4.85% ms Hagen, Stein, Rüb Query Session Detection as a Cascade 21
31 Cascading Method Experimental Results Goal: high quality session test data Our own use case Sample sessions from the AOL log as test data. AOL log (cleaned): 35.4 million interactions from 47 users. Some figures Step 4 involved on 22.5% 8 million web queries 3 ms per search 1 month Hagen, Stein, Rüb Query Session Detection as a Cascade 22
32 Cascading Method Experimental Results Goal: high quality session test data Our own use case Sample sessions from the AOL log as test data. AOL log (cleaned): 35.4 million interactions from 47 users. Some figures Step 4 involved on 22.5% 8 million web queries 3 ms per search 1 month Way out Drop Step 4 and the sessions on which it would have been invoked Remaining sessions: F-Measure =.9755 Cleaned AOL log: 27 minutes Hagen, Stein, Rüb Query Session Detection as a Cascade 22
33 Conclusion Almost the end: The take-away messages! Hagen, Stein, Rüb Query Session Detection as a Cascade 23
34 Conclusion What we have done Results Cascading method Cheap features first Beats geometric Future Work Postprocessing for multi-tasking Postprocessing for goals/missions 3 step version: simple, fast, high quality sessions Hagen, Stein, Rüb Query Session Detection as a Cascade 24
35 Conclusion What we have (not) done Results Cascading method Cheap features first Beats geometric Future Work Postprocessing for multi-tasking Postprocessing for goals/missions 3 step version: simple, fast, high quality sessions Hagen, Stein, Rüb Query Session Detection as a Cascade 24
36 Conclusion What we have (not) done Results Cascading method Cheap features first Beats geometric 3 step version: simple, fast, high quality sessions Future Work Postprocessing for multi-tasking Postprocessing for goals/missions Thank you Hagen, Stein, Rüb Query Session Detection as a Cascade 24
Query Session Detection as a Cascade
Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino Rüb Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de CIKM 2011 Glasgow, Scotland October 25, 2011 Hagen, Stein, Rüb Query Session
More informationQuery Session Detection as a Cascade
Query Session Detection as a Cascade Extended Abstract Matthias Hagen, Benno Stein, and Tino Rüb Bauhaus-Universität Weimar .@uni-weimar.de Abstract We propose a cascading method
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationWebis at the TREC 2012 Session Track
Webis at the TREC 2012 Session Track Extended Abstract for the Conference Notebook Matthias Hagen, Martin Potthast, Matthias Busse, Jakob Gomoll, Jannis Harder, and Benno Stein Bauhaus-Universität Weimar
More informationQuery Expansion using Wikipedia and DBpedia
Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationSupporting Scholarly Search with Keyqueries
Supporting Scholarly Search with Keyqueries Matthias Hagen Anna Beyer Tim Gollub Kristof Komlossy Benno Stein Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de @matthias_hagen ECIR 2016 Padova, Italy
More informationCandidate Document Retrieval for Web-scale Text Reuse Detection
Candidate Document Retrieval for Web-scale Text Reuse Detection Matthias Hagen Benno Stein Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de SPIRE 2011 Pisa, Italy October 19, 2011 Matthias Hagen,
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationNew Issues in Near-duplicate Detection
New Issues in Near-duplicate Detection Martin Potthast and Benno Stein Bauhaus University Weimar Web Technology and Information Systems Motivation About 30% of the Web is redundant. [Fetterly 03, Broder
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationOverview of the 6th International Competition on Plagiarism Detection
Overview of the 6th International Competition on Plagiarism Detection Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, and Benno Stein Bauhaus-Universität Weimar www.webis.de
More informationOverview of the 5th International Competition on Plagiarism Detection
Overview of the 5th International Competition on Plagiarism Detection Martin Potthast, Matthias Hagen, Tim Gollub, Martin Tippmann, Johannes Kiesel, and Benno Stein Bauhaus-Universität Weimar www.webis.de
More informationWord Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas
Word Embeddings in Search Engines, Quality Evaluation Eneko Pinzolas Neural Networks are widely used with high rate of success. But can we reproduce those results in IR? Motivation State of the art for
More informationExploratory Search Missions for TREC Topics
Exploratory Search Missions for TREC Topics EuroHCIR 2013 Dublin, 1 August 2013 Martin Potthast Matthias Hagen Michael Völske Benno Stein Bauhaus-Universität Weimar www.webis.de Dataset for studying text
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More information68A8 Multimedia DataBases Information Retrieval - Exercises
68A8 Multimedia DataBases Information Retrieval - Exercises Marco Gori May 31, 2004 Quiz examples for MidTerm (some with partial solution) 1. About inner product similarity When using the Boolean model,
More informationIdentifying Web Search Query Reformulation using Concept based Matching
Identifying Web Search Query Reformulation using Concept based Matching Ahmed Hassan Microsoft Research One Microsoft Way Redmond, WA 98053, USA hassanam@microsoft.com Abstract Web search users frequently
More informationIs Brad Pitt Related to Backstreet Boys? Exploring Related Entities
Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, and Gabriela Vulcu Unit for Natural Language Processing Insight-centre, National University
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationEnd-to-End Neural Ad-hoc Ranking with Kernel Pooling
End-to-End Neural Ad-hoc Ranking with Kernel Pooling Chenyan Xiong 1,Zhuyun Dai 1, Jamie Callan 1, Zhiyuan Liu, and Russell Power 3 1 :Language Technologies Institute, Carnegie Mellon University :Tsinghua
More informationCMSC 476/676 Information Retrieval Midterm Exam Spring 2014
CMSC 476/676 Information Retrieval Midterm Exam Spring 2014 Name: You may consult your notes and/or your textbook. This is a 75 minute, in class exam. If there is information missing in any of the question
More informationAvailable online at ScienceDirect. Procedia Computer Science 35 (2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 474 483 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationEfficient query processing
Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking functions In general, document scoring functions are of the form The BM25 function, is one of the best performing:
More informationSentiment analysis under temporal shift
Sentiment analysis under temporal shift Jan Lukes and Anders Søgaard Dpt. of Computer Science University of Copenhagen Copenhagen, Denmark smx262@alumni.ku.dk Abstract Sentiment analysis models often rely
More informationEECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling
EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project progress report
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationQuery Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4
Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects
More informationChrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO
Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO INDEX Proposal Recap Implementation Evaluation Future Works Proposal Recap Keyword Visualizer (chrome
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationClustering (COSC 488) Nazli Goharian. Document Clustering.
Clustering (COSC 488) Nazli Goharian nazli@ir.cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationA Security Model for Multi-User File System Search. in Multi-User Environments
A Security Model for Full-Text File System Search in Multi-User Environments Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada December 15, 2005 1 Introduction and Motivation 2 3 4 5
More informationAnalyzing a Large Corpus of Crowdsourced Plagiarism
Analyzing a Large Corpus of Crowdsourced Plagiarism Master s Thesis Michael Völske Fakultät Medien Bauhaus-Universität Weimar 06.06.2013 Supervised by: Prof. Dr. Benno Stein Prof. Dr. Volker Rodehorst
More informationSimilarity search in multimedia databases
Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:
More informationExploiting Symmetry in Relational Similarity for Ranking Relational Search Results
Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results Tomokazu Goto, Nguyen Tuan Duc, Danushka Bollegala, and Mitsuru Ishizuka The University of Tokyo, Japan {goto,duc}@mi.ci.i.u-tokyo.ac.jp,
More informationOverview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components
Overview Lecture 3: Index Representation and Tolerant Retrieval Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group 1 Recap 2
More informationLetter Pair Similarity Classification and URL Ranking Based on Feedback Approach
Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India
More informationXML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson
Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationAutomatic Cluster Number Selection using a Split and Merge K-Means Approach
Automatic Cluster Number Selection using a Split and Merge K-Means Approach Markus Muhr and Michael Granitzer 31st August 2009 The Know-Center is partner of Austria's Competence Center Program COMET. Agenda
More informationSession 2: Models, part 1
Cerca i Anàlisi d'informació Massiva (CAIM) Grau en Enginyeria Informàtica (GEI) Session 2: Models, part 1 Exercise List, Fall 2016 Basic comprehension questions. Check that you can answer them before
More informationHeterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts
Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, Jiawei Han University of Illinois, at Urbana Champaign MicrosoD
More informationAnalysis of Trail Algorithms for User Search Behavior
Analysis of Trail Algorithms for User Search Behavior Surabhi S. Golechha, Prof. R.R. Keole Abstract Web log data has been the basis for analyzing user query session behavior for a number of years. Web
More informationBi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track
Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track Jeffrey Dalton University of Massachusetts, Amherst jdalton@cs.umass.edu Laura
More informationInformation Retrieval
Information Retrieval Dictionaries & Tolerant Retrieval Gintarė Grigonytė gintare@ling.su.se Department of Linguistics and Philology Uppsala University Slides based on previous IR course given by Jörg
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationA Distributional Approach for Terminological Semantic Search on the Linked Data Web
A Distributional Approach for Terminological Semantic Search on the Linked Data Web André Freitas Digital Enterprise Research Institute (DERI) National University of Ireland, Galway andre.freitas@deri.org
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationRecommender Systems 6CCS3WSN-7CCSMWAL
Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:
More informationContext Sensitive Search Engine
Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the
More informationLearning to Reweight Terms with Distributed Representations
Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215 Outline Goal: Assign weights to query terms for better retrieval results
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 7: Scores in a Complete Search System Paul Ginsparg Cornell University, Ithaca,
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationA Dynamic Bayesian Network Click Model for Web Search Ranking
A Dynamic Bayesian Network Click Model for Web Search Ranking Olivier Chapelle and Anne Ya Zhang Apr 22, 2009 18th International World Wide Web Conference Introduction Motivation Clicks provide valuable
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationOptimizing Search Engines using Click-through Data
Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 6: Index Compression Paul Ginsparg Cornell University, Ithaca, NY 15 Sep
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationData-analysis and Retrieval Boolean retrieval, posting lists and dictionaries
Data-analysis and Retrieval Boolean retrieval, posting lists and dictionaries Hans Philippi (based on the slides from the Stanford course on IR) April 25, 2018 Boolean retrieval, posting lists & dictionaries
More informationUsing Temporal Bayesian Networks to Model user Profile Evolution
Using Temporal Bayesian Networks to Model user Profile Evolution Farida Achemoukh 1, Rachid Ahmed-Ouamer 2 1 LARI Laboratory, Department of Computer Science, University of Mouloud Mammeri 15000 Tizi-Ouzou,
More informationInterpreting Document Collections with Topic Models. Nikolaos Aletras University College London
Interpreting Document Collections with Topic Models Nikolaos Aletras University College London Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research Talk Outline Introduction
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationLinking FRBR Entities to LOD through Semantic Matching
Linking FRBR Entities to through Semantic Matching Naimdjon Takhirov, Fabien Duchateau, Trond Aalberg Department of Computer and Information Science Norwegian University of Science and Technology Theory
More informationAI for Smart Cities Workshop AI*IA th Year Anniversary XIII Conference Turin (Italy), December 5, 2013
Fedelucio Narducci*, Matteo Palmonari*, Giovanni Semeraro *DISCO, University of Milan-Bicocca, Italy Department of Computer Science, University of Bari Aldo Moro, Italy!!"#$%&' "!('' #&&!))'*''''''''''''''''''
More informationClustering (COSC 416) Nazli Goharian. Document Clustering.
Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationBirkbeck (University of London)
Birkbeck (University of London) MSc Examination for Internal Students Department of Computer Science and Information Systems Information Retrieval and Organisation (COIY64H7) Credit Value: 5 Date of Examination:
More informationClean Living: Eliminating Near-Duplicates in Lifetime Personal Storage
Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage Zhe Wang Princeton University Jim Gemmell Microsoft Research September 2005 Technical Report MSR-TR-2006-30 Microsoft Research Microsoft
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationPreference Elicitation for Single Crossing Domain
Preference Elicitation for Single Crossing Domain joint work with Neeldhara Misra (IIT Gandhinagar) March 6, 2017 Appeared in IJCAI 2016 Motivation for Preference Elicitation One often wants to learn how
More informationInformation Retrieval
Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationInformation Retrieval
Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid
More informationEfficient Top-k Algorithms for Fuzzy Search in String Collections
Efficient Top-k Algorithms for Fuzzy Search in String Collections Rares Vernica Chen Li Department of Computer Science University of California, Irvine First International Workshop on Keyword Search on
More informationIntroduction to Information Retrieval
Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural
More informationSemantic Estimation for Texts in Software Engineering
Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationQuery-Free News Search
Query-Free News Search by Monika Henzinger, Bay-Wei Chang, Sergey Brin - Google Inc. Brian Milch - UC Berkeley presented by Martin Klein, Santosh Vuppala {mklein, svuppala}@cs.odu.edu ODU, Norfolk, 03/21/2007
More informationLarge Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao
Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese
More informationScalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining
Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining Guan-Long Wu, Yin-Hsi Kuo, Tzu-Hsuan Chiu, Winston H. Hsu, and Lexing Xie 1 Abstract Retrieving relevant videos from
More informationTABLE OF CONTENTS PAGE TITLE NO.
TABLE OF CONTENTS CHAPTER PAGE TITLE ABSTRACT iv LIST OF TABLES xi LIST OF FIGURES xii LIST OF ABBREVIATIONS & SYMBOLS xiv 1. INTRODUCTION 1 2. LITERATURE SURVEY 14 3. MOTIVATIONS & OBJECTIVES OF THIS
More informationStatic Pruning of Terms In Inverted Files
In Inverted Files Roi Blanco and Álvaro Barreiro IRLab University of A Corunna, Spain 29th European Conference on Information Retrieval, Rome, 2007 Motivation : to reduce inverted files size with lossy
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationLexical and Syntax Analysis. Bottom-Up Parsing
Lexical and Syntax Analysis Bottom-Up Parsing Parsing There are two ways to construct derivation of a grammar. Top-Down: begin with start symbol; repeatedly replace an instance of a production s LHS with
More informationLink Analysis in Web Mining
Problem formulation (998) Link Analysis in Web Mining Hubs and Authorities Spam Detection Suppose we are given a collection of documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained
More informationIntro to Peer-to-Peer Search
Intro to Peer-to-Peer Search (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Outline Peer-to-peer historical perspective Problem definition Local client data processing Ranking functions Metadata copying
More informationAdvanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016
Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full
More informationInverted List Caching for Topical Index Shards
Inverted List Caching for Topical Index Shards Zhuyun Dai and Jamie Callan Language Technologies Institute, Carnegie Mellon University {zhuyund, callan}@cs.cmu.edu Abstract. Selective search is a distributed
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationModels for Document & Query Representation. Ziawasch Abedjan
Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview
More informationComputer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm
Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationFaster Or-join Enactment for BPMN 2.0
Faster Or-join Enactment for BPMN 2.0 Hagen Völzer, IBM Research Zurich Joint work with Beat Gfeller and Gunnar Wilmsmann Contribution: BPMN Diagram Enactment Or-join Tokens define the control state Execution
More informationIMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM
IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT
More information