Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Similar documents
Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )

Natural Language Processing. SoSe Question Answering

Question Answering Systems

Natural Language Processing

Parsing tree matching based question answering

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Introduction to Text Mining. Hongning Wang

CS674 Natural Language Processing

Artificial Intelligence User Support System for SAP Enterprise Resource Planning

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

International Journal of Scientific & Engineering Research Volume 7, Issue 12, December-2016 ISSN

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Unstructured Data. CS102 Winter 2019

Information Retrieval

A Comparison Study of Question Answering Systems

NUS-I2R: Learning a Combined System for Entity Linking

Information Retrieval

Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Text Mining for Software Engineering

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING. Sameer S. Pradhan, Valerie Krugler, Wayne Ward, Dan Jurafsky and James H.

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Classification and Comparative Analysis of Passage Retrieval Methods

All lecture slides will be available at CSC2515_Winter15.html

Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track

OPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame

SAPIENT Automation project

Answer Extraction. NLP Systems and Applications Ling573 May 13, 2014

Search Engines. Information Retrieval in Practice

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

University of Sheffield, NLP. Chunking Practical Exercise

Information Retrieval

CHAPTER 5 EXPERT LOCATOR USING CONCEPT LINKING

Syntax and Grammars 1 / 21

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

A bit of theory: Algorithms

Query Difficulty Prediction for Contextual Image Retrieval

Generating FrameNets of various granularities: The FrameNet Transformer

Question answering. CS474 Intro to Natural Language Processing. History LUNAR

Web-based Question Answering: Revisiting AskMSR

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online

Introduction to Lexical Analysis

Definitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Passage Reranking for Question Answering Using Syntactic Structures and Answer Types

Fine-Grained Semantic Class Induction via Hierarchical and Collective Classification

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

A Semantic Framework for the Retrieval and Execution of Open Source Code

CS54701: Information Retrieval

Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu

Making Sense Out of the Web

S A I. Semantic Search via XML Fragments: A High-Precision Approach to IR. Jennifer Chu-Carroll. Carroll IBM T.J. Watson Research Center.

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Named Entity Detection and Entity Linking in the Context of Semantic Web

Multi-label classification using rule-based classifier systems

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Question Answering Using XML-Tagged Documents

Graph-based Entity Linking using Shortest Path

Detection and Extraction of Events from s

Towards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation

Part I: Data Mining Foundations

Exam Marco Kuhlmann. This exam consists of three parts:

Search Engines Exercise 5: Querying. Dustin Lange & Saeedeh Momtazi 9 June 2011

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown

Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection

Search Engines. Information Retrieval in Practice

Topics for Today. The Last (i.e. Final) Class. Weakly Supervised Approaches. Weakly supervised learning algorithms (for NP coreference resolution)

Natural Language Processing

FAQ Retrieval using Noisy Queries : English Monolingual Sub-task

Cross Lingual Question Answering using CINDI_QA for 2007

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Knowledge Engineering with Semantic Web Technologies

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

PRIS at TAC2012 KBP Track

Maximum Entropy based Natural Language Interface for Relational Database

Introduction to Lexical Analysis

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

JU_CSE_TE: System Description 2010 ResPubliQA

Temporal Information Access (Temporalia-2)

CS 6120/CS4120: Natural Language Processing

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Grounded Compositional Semantics for Finding and Describing Images with Sentences

A Hybrid Neural Model for Type Classification of Entity Mentions

Text mining tools for semantically enriching the scientific literature

Transcription:

Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi)

Outline 2 Introduction History QA Architecture

Outline 3 Introduction History QA Architecture

Motivation Finding small segments of text which answer users questions (http://start.csail.mit.edu/) 4

Motivation 5

Motivation 6

Motivation How many presidents do the USA have? 44 List of names in chronological order (http://en.wikipedia.org/wiki/list_of_presidents_of_the_united_states_by_occupation) 7

Motivation 8 Information retrieval Keywords (short input) Document (long output) Question Answering Natural language question (long input) Short answer (short output)

QA Types Closed-domain Open-domain Answering any domain independent question Closed-domain vs. Open-domain 9 Answering questions from a specific domain Using a large collection of unstructured data (e.g., the Web) instead of databases

Open-domain QA 10 Advantages Many subjects are covered Information is constantly added and updated No manual work is required to build databases

Open-domain QA 11 Problems Information is not always up-to-date Wrong information cannot be avoided Much irrelevant information is found

Outline 12 Introduction History QA Architecture

History 13 BASEBALL [Green et al., 1963] One of the earliest question answering systems Developed to answer users questions about dates, locations, and the results of baseball matches LUNAR [Woods, 1977] Developed to answer natural language questions about the geological analysis of rocks returned by the Apollo moon missions Able to answer 90% of questions in its domain posed by people not trained on the system

History STUDENT PHLIQA Answered questions about the Unix operating system LILOG 14 Developed to answer the user s questions about European computer systems UC (Unix Consultant) Built to answer high-school students questions about algebraic exercises Was able to answer questions about tourism information of cities in Germany

Open-domain QA START [Katz, 1997] Utilized a knowledge-base to answer the user s questions The knowledge-base was first created automatically from unstructured Internet data Then it was used to answer natural language questions http://start.csail.mit.edu/index.php 15

Open-domain QA 16

IBM Watson Playing against two greatest champions of Jeopardy Challenges Knowledge Speed Confidence Final Jeopardy, Stephen Backer Building Watson: An Overview of the DeepQA Project. Ferrucci et al. AI Magazine, 2010. https://www.youtube.com/watch?v=wfr3lom_xhe 17

Outline 18 Introduction History QA Architecture

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 19 Information Retrieval Architecture

Question Analysis 20 Named Entity Recognition Surface Text Pattern Learning Syntactic Parsing Semantic Role Labeling

Question Analysis: Named Entity Recognition 21 Recognizing the named entities in the text to extract the target of the question Using the question s target in the query construction step Example: Question: In what country was Albert Einstein born? Target: Albert Einstein

Question Analysis: Pattern Learning 22 Extracting a pattern from the question Matching the pattern with a list of pre-defined question patterns Finding the corresponding answer pattern Realizing the position of the answer in the sentence in the answer extraction step

Question Analysis: Pattern Learning 23 Example: Question: In what country was Albert Einstein born? Question Pattern: In what country was X born? Answer Pattern: X was born in Y.

Question Analysis: Syntactic Parsing Using a dependency parser to extract the syntactic relations between question terms Using the dependency relation paths between question terms to extract the correct answer in the answer extraction step (http://nlp.stanford.edu:8080/corenlp/process) 24

Question Analysis: Semantic Role Labeling FrameNet: a lexical database for English More than 170,000 manually annotated sentences Frame Semantics: describes the type of event, relation, or entity and the participants in it. Example: 25 John[Cook] grills a fish[food] on an open fire [Heating-Instrument]

Question Analysis: Semantic Role Labeling (https://framenet2.icsi.berkeley.edu/fnreports/data/frameindex.xml?frame=apply_heat) 26

Question Analysis: Semantic Role Labeling Frame assignment Role labeling Jim[Driver] flew his plane[vehicle] to Texas[Goal] Alice[Destroyer] destroys the item[undergoer] with a plane[instrument] OPERATE VEHICLE DESTROYING 27

Question Analysis: Semantic Role Labeling Finding the question s head verb Example: Who[Buyer] purchased YouTube[Goods]? Commerce_buy (https://framenet2.icsi.berkeley.edu/fnreports/data/frameindex.xml?frame=commerce_buy) 28

Question Analysis: Semantic Role Labeling COMMERCE BUY Identifying candidate passages Buyer [Subj,NP] verb Goods [Obj,NP] Buyer [Subj,NP] verb Goods [Obj,NP] Seller [Dep,PP-from] Goods [Subj,NP] verb Buyer [Dep,PP-by]... Example: 29 buy (v), buyer (n), purchase (n), purchase (v), purchaser (n) In 2006, YouTube[Goods] was purchased by Google[Buyer] for $1.65 billion.

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 30 Information Retrieval Architecture

Question Classification Classifying the input question into a set of question types Mapping question types to the available named entity labels 31 Finding strings that have the same type as the input question in the answer extraction step Example: Question: In what country was Albert Einstein born? Type: LOCATION - Country

Question Classification 32 Example (NER): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family.

Question Classification Classification taxonomies BBN Pasca & Harabagiu Li & Roth 33 6 coarse- and 50 fine-grained classes ABBREVIATION ENTITY DESCRIPTION HUMAN LOCATION NUMERIC

Question Classification (http://cogcomp.cs.illinois.edu/data/qa/qc/definition.html) 34

Question Classification How did serfdom develop in and then leave Russia? What films featured the character Popeye Doyle? What fowl grabs the spotlight after the Chinese Year of the Monkey? What is the full form of.com? What team did baseball 's St. Louis Browns become? (http://cogcomp.cs.illinois.edu/data/qa/qc/train_1000.label) 35 DESC ENTY ENTY manner cremat animal ABBR HUM exp gr

Question Classification Coarse Classifier ABBREVIATION ENTITY DESCRIPTION HUMAN LOCATION Fine Classifier group 36 ind title description NUMERIC

Question Classification 37 Biomedical domain UMLS semantic groups and types

Question Classification 38 Using any kinds of supervised classifiers K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Logistic Regression Considering the confidence measure of the classification to filter the result

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 39 Information Retrieval Architecture

Query Construction Goal: 40 Formulating a query with a high chance of retrieving relevant documents Task: Assigning a higher weight to the question s target Using query expansion techniques to expand the query

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 41 Information Retrieval Architecture

Document Retrieval 42 Importance: QA components use computationally intensive algorithms Time complexity of the system strongly depends on the size of the to be processed corpus Task: Reducing the search space for the subsequent components Retrieving relevant documents from a large corpus Selecting top n retrieved document for the next steps

Document Retrieval 43 Using available information retrieval models Vector Space Model Probabilistic Model Language Model Using available information retrieval toolkits: Lucene, Lemur, SAP HANA, etc.

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 44 Information Retrieval Architecture

Sentence Retrieval Task: 45 Finding small segments of text that contain the answer Benefits beyond document retrieval: Documents can be very large Documents span different subject areas The relevant information is expressed locally Retrieving sentences simplifies the answer extraction step Main problem: sentence brevity

Sentence Retrieval 46 Information retrieval models for sentence retrieval Vector Space Model Probabilistic Model Language Model Jelinek-Mercer Linear Interpolation Bayesian Smoothing with Dirichlet Prior Absolute Discounting

Sentence Retrieval 47 Support from named-entity recognition and semantic role labeling Albert Einstein was born in 14 March 1879. Albert Einstein was born in Germany. Albert Einstein was born in a Jewish family. Buyer [Subj,NP] verb Goods [Obj,NP] Buyer [Subj,NP] verb Goods [Obj,NP] Seller [Dep,PP-from] Goods [Subj,NP] verb Buyer [Dep,PP-by]

Sentence Retrieval Supervised learning 48 Features Number of entity types of the right type Number of question keywords Longest exact sequence of question keywords Rank of the document Proximity of the keywords (shortest path) N-gram overlap between passage and question

Sentence Annotation 49 Similar to Question Analysis Annotating relevant sentences using linguistic analysis Named entity recognition Syntactic parsing Semantic role labeling

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 50 Information Retrieval Architecture

Answer Extraction Extracting candidate answers based on various information Question Question Analysis: patterns Question Analysis: syntactic parse Question Analysis: semantic roles Question Classification: question type Sentence 51 Sentence Annotation: all annotated data

Answer Extraction Using extracted patterns Example: 52 Question: In what country was Albert Einstein born? Question Pattern: In what country was X born? Answer Pattern: X was born in Y. Example (Pattern): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family.

Answer Extraction Using question type and entity type Example: 53 Question: In what country was Albert Einstein born? Question Type: LOCATION - Country Example (NER): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family.

Answer Extraction Using syntactic parsing Different wordings possible, but similar syntactic structure (based on the work by Dan Shen) 54

Answer Extraction Using syntactic parsing Many syntactic variations need robust matching approach (based on the work by Dan Shen) 55

Answer Extraction Using semantic roles Example: Example: 56 Who[Buyer] purchased YouTube?[Goods] In 2006, YouTube[Goods] was purchased by Google[Buyer] for $1.65 billion.

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 57 Information Retrieval Architecture

Answer Validation 58 Using Web as a knowledge resource for validating answers Required steps Query creation Answer rating

Answer Validation 59 Query creation Combining the answer with a subset of the question keywords Choosing different combinations of subsets Bag-of-Word Noun phrase chunks Declarative form

Answer Validation Example: Question: In what country was Albert Einstein born? Answer Candidate: Germany Queries: Bag-of-Word: Noun-Phrase-Chunks: Albert Einstein born Germany Declarative-Form: 60 Albert Einstein born Germany Albert Einstein born Germany

Answer Validation Answer rating Passing the query to a search engine Analyzing the result of the search engine Counting the results Parsing the result snippets Other possibilities: 61 Using knowledge bases to find relations between the question keywords and the answer

Further Reading Speech and Language Processing 62 Chapters 23.1, 23.2