Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur

Similar documents
Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India

Rishiraj Saha Roy. Ph.D. Student Computer Science and Engineering IIT Kharagpur, India

Improving Document Ranking for Long Queries with Nested Query Segmentation

Lecture 14: Annotation

Improving Document Ranking for Long Queries with Nested Query Segmentation

Tools for Annotating and Searching Corpora Practical Session 1: Annotating

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Rishiraj Saha Roy Ph.D. Student

Complex Linguistic Annotation No Easy Way Out! A Case from Bangla and Hindi POS Labeling Tasks

Retrieval Evaluation. Hongning Wang

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

GEOFLUENT VS. STRAIGHT MACHINE TRANSLATION: UNDERSTANDING THE DIFFERENCES

Pull the Plug? Predicting If Computers or Humans Should Segment Images Supplementary Material

Multiclass Classification

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure

Getting Started with DKPro Agreement

Current areas: Question-answering over knowledge bases and text, User-oriented online privacy and transparency

Modern Retrieval Evaluations. Hongning Wang

Knowledge Engineering with Semantic Web Technologies

GOOGLE ADDS 4 NEW FEATURES TO ITS MY BUSINESS DASHBOARD HTTPS WEBSITES ARE DOMINATING THE FIRST PAGE

Creating and Managing Autodesk Inventor Nesting Utility Templates

Question Answering Systems

ebook library PAGE 1 HOW TO OPTIMIZE TRANSLATIONS AND ACCELERATE TIME TO MARKET

Alberto Messina, Maurizio Montagnuolo

TTIC 31190: Natural Language Processing

Semantics Isn t Easy Thoughts on the Way Forward

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation

A Quick Guide to Localizing Games for Global Markets. Learn how to adapt your game for gamers worldwide

Most keyword search tools would place a capacitive touch panel further down the results list.

Lightly Supervised Learning of Procedural Dialog Systems. Svitlana Volkova Pallavi Choudhury, Chris Quirk, Bill Dolan, Luke Zettlemoyer

Chapter 6 Evaluation Metrics and Evaluation

Cognos: Crowdsourcing Search for Topic Experts in Microblogs

Data and the Environment: Impacts on Cost and Success

Large-Scale Validation and Analysis of Interleaved Search Evaluation

Streaming Massive Environments From Zero to 200MPH

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Supplemental Material: Multi-Class Open Set Recognition Using Probability of Inclusion

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown

FUNctions. Lecture 03 Spring 2018

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Bagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu

Extracting Information from Social Networks

Examining the Limits of Crowdsourcing for Relevance Assessment

Error annotation in adjective noun (AN) combinations

PRIVACY THROUGH SOLIDARITY Asia J. Biega Rishiraj Saha Roy Gerhard Weikum

Multilingual Image Search from a user s perspective

The Internet. Search Engines Rank Results According To Relevance

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE

Overfitting, Model Selection, Cross Validation, Bias-Variance

Making Sense Out of the Web

UX Research in the Product Lifecycle

Triangles, Squares, & Segregation

Comparative Analysis of Clicks and Judgments for IR Evaluation

It is been used to calculate the score which denotes the chances that given word is equal to some other word.

Supervised Ranking for Plagiarism Source Retrieval

in the NTU Multilingual Corpus (NTU-MC) January 15, 2016

Context-Free Grammars

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CSE 546 Machine Learning, Autumn 2013 Homework 2


Multi-objective Optimization

Chapter 13: CODING DESIGN, CODING PROCESS, AND CODER RELIABILITY STUDIES

Neuromarketing. Report to accompany the Zotero database. 4 December Nicoline Beun MSc. Department of Marketing Management

Heading-Based Sectional Hierarchy Identification for HTML Documents

Contents. 1. Survey Background and Methodology. 2. Summary of Key Findings. 3. Survey Results. 4. Appendix

Audit Report. The Chartered Institute of Personnel and Development (CIPD)

What is this Song About?: Identification of Keywords in Bollywood Lyrics

Trees, Trees and More Trees

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

1 Machine Learning System Design

Efficient Load Balancing and Fault tolerance Mechanism for Cloud Environment

09200 Business English - II Certificate in Accounting and Business II Examination September 2012 THE INSTITUTE OF CHARTERED ACCOUNTANTS OF SRI LANKA

Top 20 Data Quality Solutions for Data Science

Web-based Question Answering: Revisiting AskMSR

A tool for Cross-Language Pair Annotations: CLPA

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

The FreeSearch System

Final Project Discussion. Adam Meyers Montclair State University

ACE (Automatic Content Extraction) English Annotation Guidelines for Values. Version

Agenda. ! Efficient Coding Hypothesis. ! Response Function and Optimal Stimulus Ensemble. ! Firing-Rate Code. ! Spike-Timing Code

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

PRIS at TAC2012 KBP Track

Annotation and Evaluation

A conceptual model of trademark retrieval based on conceptual similarity

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India

CMU-UKA Syntax Augmented Machine Translation

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Information Retrieval

Meridian Linguistics Testing Service 2018 English Style Guide

Standardization in Assessment and Reporting of Intercoder Reliability in Content Analyses

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Discriminative Training for Phrase-Based Machine Translation

Domain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons

The English House of Commas

Takuya FUNAHASHI and Hayato YAMANA Computer Science and Engineering Div., Waseda Univ. JAPAN

Transcription:

Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur monojitc@microsoft.com

new york times square dance scottish country dancing clubs melbourne tony hawk american wasteland ps2 cheats what causes swollen lymph nodes

new york times square dance scottish country dancing clubs melbourne tony hawk american wasteland ps2 cheats what causes swollen lymph nodes Similar to CHUNKING of NL Text

Query Accuracy: 0.58 0.61 Segment F-score: 0.69 0.72 Segment Accuracy: 0.84 0.85 (Tan and Peng, 2008)

new york times square dance new york times square dance scottish country dancing clubs Melbourne scottish country dancing clubs Melbourne tony hawk american wasteland ps2 cheats tony hawk american wasteland ps2 cheats what causes swollen lymph nodes what causes swollen lymph nodes

Maximal vs. Minimal segments Also observed for Text Chunking A series of happy thoughts came to mind A series of happy thoughts came to mind Annotators agree on major (clause or phrase) boundaries, but not on minor ones. (Abney, 1992,1995; Bali et al., 2009)

tony hawk american wasteland ps2 cheats tony hawk american wasteland ps2 cheats (((tony hawk) (american wasteland))(ps2 cheats))

what causes swollen lymph nodes what causes swollen lymph nodes Flat Segmentation Binary Nested Segmentation ((what causes) (swollen (lymph nodes)))

Does Nested Segmentation of Queries (& NL texts) lead to better agreement amongst expert annotators? Can crowdsourcing be used for obtaining reliable high quality annotations of this kind?

Nested Web Search Queries Flat Nested Flat Flat Sentences Nested

1200 queries from Bing 4-8 words long Nested Web Search Queries Flat Nested Flat Flat Sentences Nested

300 English sentences 5-15 words long Nested Web Search Queries Flat Nested Flat Flat Sentences Nested

3 very frequent search engine users, special training provided Nested Web Search Queries Flat Nested Flat Flat Sentences Nested

Amazon Mechanical Turk 10 annotations per item 1 min video for training Nested Web Search Queries Flat Nested Flat Flat Sentences Nested

Challenge 1 Given two flat/nested annotations, how to define the similarity? Challenge 2 What is the chance agreement?

tony hawk american wasteland ps2 cheats 0 0 0 1 1 tony hawk american wasteland ps2 cheats 0 1 0 1 0 (0+1+0+0+1)/5 = 2/5

2 1 0 0 ((what causes) (swollen (lymph nodes))) 0 2 1 0 0 1 2 0 (0+1+1+0)/4 = 2/4

2 1 0 0 ((what causes) (swollen (lymph nodes))) 0 2 1 0 0 1 2 0 (0+3+3+0)/4 = 6/4

Model 1: S All annotations are equally likely Model 2: Cohen s κ Every annotator has a different bias [doesn t apply to crowdsourcing] Model 3: Krippendorff s α The population has a bias

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Query-Expert Query-Crowd Sentence-Crowd Flat Nested-d1 Nested-d2

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Query-Expert Query-Crowd Sentence-Crowd Flat Nested-d1 Nested-d2

80% queries and 60% sentences have 2 segments The length of the two segments differ by 0 or 1 words power rangers operation overdrive multiplayer online game st francis of assisi primary school

80% queries and 60% sentences have 2 segments The length of the two segments differ by 0 or 1 words N-gram generated synthetic queries

500 internal server error internet explorer

Phrase structure drives segmentation only if reconcilable with Biases 1 and 2. Prepositions grouped with following word in NL sentences, but no such dominant trends in queries flights to, ideas for

Crowdsourcing unreliable for query segmentation Nested segmentation improves IAA for experts, but degrades it for the crowd (due to higher cognitive load) Crowd has strong bias towards balanced structures leading to apparently high IAA, but unreliable annotations The proposed IAA metric can correct for annotator biases in crowdsourcing

Data and supplementary material available from http://research.microsoft.com/apps/pubs/default.aspx?id=192002 Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes, Linguistic Annotation Workshop (8 th August, 11:40am)